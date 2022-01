In the previous part of this series, I introduced counterfactuals and showed how to encode them in the POMDP framework. In this part, I will focus on how counterfactuals can be applied in the emerging field of Reward Learning. The article will first give a brief summary of the basic elements of Reward Learning. Using a running example, I will then demonstrate how Reward Learning can fail to produce the desired outcome. Ultimately, I will introduce counterfactual Reward Learning and show how it helps with the problem regular Reward Learning has in our example. If you have read the first part, the only requirement for understanding this article is that you are familiar with the basics of Reinforcement Learning.

COMPUTERS ・ 1 DAY AGO