From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Additional modifications

Additional modifications - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

Additional modifications

- [Instructor] I'm going to put some official terms to some of the things I mentioned in previous lessons, and also introduce you to some new ones. One is the greedy policy. A policy is greedy when it only selects the best action, for a given state, all the time. Like we saw earlier, when an agent selects the best action from all the episodes to improve its policy. An improvement over the greedy policy, is the epsilon-greedy policy. When a policy is epsilon greedy, the agent sometimes gives the opportunity to explore other actions that are not the best as explained, when I talked about exploration, and exploitation. Another new term is the incremental mean. This provides a bit of improvement over the regular Monte Carlo method, tending towards the temporal difference method. Instead of updating the policy after a set of episodes, which normally helps decide the best action, the incremental mean helps us update the policy…

Contents