Start free trial Sign in

From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

First visit and every visit MC prediction

First visit and every visit MC prediction - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

First visit and every visit MC prediction

“

- [Instructor] First-visit and every-visit Monte-Carlo prediction, splits the Monte-Carlo prediction into two types. For every policy when predicting the reward, we can encounter a state more than once. It is then our decision to make if the second, third or fourth time we encounter a state should be taken into consideration when estimating the reward. For example, if the agent visit state two for the first time at the second time step and then passes state two again at the eighth time step, if we take into consideration the second time it's passes, we get a different reward from if it isn't considered. So, if the agent decides to go with the first-visit Monte-Carlo prediction, the expected reward will be the cumulative reward from the second time step to the goal without minding the second visit to the state at the eighth time step. But if the agent says to follow the every-visit Monte-Carlo prediction, it's also is the expected reward, which is the cumulative reward from the second…

Contents