I am currently reading Sutton and Barto (reference below). Along the way I decided to recreate certain experiments cited in each book chapter. This particular example includes the example from Figure 6.2.

I have added a notebook of the experiments:

Temporal Difference Learning with Batch Updates

Plots are here.

