From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Monte Carlo control

Monte Carlo control - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

Monte Carlo control

- [Instructor] Recall the Monte Carlo prediction is when we try to predict the expected reward from following any policy. The process of iterating between using the initial bad policy, which is the equal probable random policy, to construct the Q-table and then improve on the policy by selecting actions which lead to the best result is the Monte Carlo control. In more detail, the first step in Monte Carlo control is to construct the Q-table by using random policies. Then the policy is improved by selecting the best actions in each state, or better still, the Bellman equation is used. This new policy is then used to update the Q-table. The switching between these three steps is Monte Carlo control and completes the whole cycle of updating the policy.

Contents