Here we take a detailed view of policy gradient methods and their intuitions. This blog discuess how REINFORCE, baseline and actor-critic algorithms came into existence.
We closely follow chapter 13 of the classic textbook of Sutton and Barto (2nd edition) (Sutton and Barto 2018). Initially we visit the classic policy gradient theorem and later build on top of that to develop REINFORCE and actor-critic algorithms. As usual our goal is to develop better intuition on how and why these algorithms work.
Sutton, Richard S, and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.