Start free trial Sign in

From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Monte Carlo method

Monte Carlo method - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

Monte Carlo method

“

- [Instructor] The Monte Carlo method is used mostly in episodic tasks, tasks that have a definite end. This method is one way an agents can get the best policy, path or trajectory, so as to get the best cumulative reward. Recall the room full of fruits, but now we'll simplify to have just one fruit, an orange. My goal is to get that orange and I am done, meaning that's the end of that episode. I perform many episodes that I believe might have better policies and lead me to better cumulative rewards and then compare them so as to get the actions that work best in all those episodes. So here's how it works for an episode. Once I step into the room, I'm on the first tile which is my first state. At this point I have four possible actions, I could decide to move left, right, forward or backward. None of these actions have a preference because this is the first time I am stepping into the room and I don't know which states will lead miss the best reward. There is an equal chance of…

Contents