From the course: Reinforcement Learning Foundations
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
The setting - Python Tutorial
From the course: Reinforcement Learning Foundations
The setting
- [Instructor] Temporal difference methods solve the action values of every state at every time step, so it doesn't wait until after an episode or multiple episodes. It has the same setting as the Monte Carlo method, having states, actions, rewards, and the environment. The difference is how frequently the policy is updated. Besides these obvious similarities, the Monte Carlo method has a high variance due to its randomness but isn't biased, while the temporal difference methods have low variance but are biased. This temporal difference methods exploit more of the Markov property, which holds when the conditional probability of future states depend only on the immediate past states. By doing this, they seem to update the Q-table immediately only with information from the past state without waiting for a complete episode. In addition, notes that they are used mostly for continuing tasks, because they update the policies immediately. You'll learn more about them later on in this course.