I am currently reading Sutton and Barto (reference below). Along the way I decided to recreate certain experiments cited in each book chapter. This particular example includes the Blackjack problem definition.
The variation in this experiment is to compare importance sampling for MC with off and on policy - as well as with exploring starts
I have added a notebook of the experiements:
Plots 5.1 - 5.3 are here.
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.