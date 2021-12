A tic-tac-toe example to show how to write RL Q-learning algorithms when some actions are forbidden for particular states. One day I was watching my 18-month-old son learning how to eat with a spoon and I realized that he put down the spoon immediately and asked me for food once his bowl was empty. He adjusted the way to handle the spoon according to the position and the quantity of food in the bowl while it was natural for him not to take any useless effort as long as there is no more food. This reminded me of some RL problems that I came across: not all actions are allowed under some specific circumstances: e.g. an elevator can never go up when it is already on the top floor, an HVAC system would always stop cooling down when the lowest temperature is already reached.

COMPUTERS ・ 4 DAYS AGO