From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Monte Carlo prediction

Monte Carlo prediction - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

Monte Carlo prediction

- [Instructor] Monte-Carlo predictions can be understood using the Q-table, which is used to start the state values of all actions performed by the agent. It answers the question, "Given a policy, how will the agents estimate "the value function or the expected cumulative reward?" The Monte-Carlo prediction is just the Monte Carlo method as explained in earlier lessons. It uses the Bellman equation to estimate the state and action value functions, otherwise known as the expected cumulative reward for following the policy. The Bellman equation for the Monte Carlo method is represented by this equation. In simple terms, the state value of a current state is the expected value of the reward of the next state plus the state value of the same next state. Where V pi S is the state value of a state following a policy pi, E here implies expectation. So the state value is an expected result not the actual. R represents the reward of moving to another state, while T plus one in place to be the…

Contents