From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

The setting

The setting - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

The setting

- [Instructor] Temporal difference methods solve the action values of every state at every time step, so it doesn't wait until after an episode or multiple episodes. It has the same setting as the Monte Carlo method, having states, actions, rewards, and the environment. The difference is how frequently the policy is updated. Besides these obvious similarities, the Monte Carlo method has a high variance due to its randomness but isn't biased, while the temporal difference methods have low variance but are biased. This temporal difference methods exploit more of the Markov property, which holds when the conditional probability of future states depend only on the immediate past states. By doing this, they seem to update the Q-table immediately only with information from the past state without waiting for a complete episode. In addition, notes that they are used mostly for continuing tasks, because they update the policies immediately. You'll learn more about them later on in this course.

Contents