RL 1.0: Policy gradient methods

Here we take a detailed view of policy gradient methods and their intuitions. This blog discuess how REINFORCE, baseline and actor-critic algorithms came into existence.

Introduction

We closely follow chapter 13 of the classic textbook of Sutton and Barto (2nd edition) (Sutton and Barto 2018). Initially we visit the classic policy gradient theorem and later build on top of that to develop REINFORCE and actor-critic algorithms. As usual our goal is to develop better intuition on how and why these algorithms work.

Policy based RL vs value based RL

Policy gradient theorem

REINFORCE algorithm

Using baseline

Actor-critic algorithms

Case study

Summary

Sutton, Richard S, and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.