From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Expected SARSA

Expected SARSA - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

Expected SARSA

- [Instructor] The third form of the temporal difference method is the expected SARSA. This form has no major difference with SARSAMAX. Remember, with SARSAMAX, the greedy policy is used to select the action from the second state. However, in expected SARSA, it uses the expected value of the next state-action pair, where this expected value takes into account the probability that the agent selects each possible action from the next state. That is, there is an equal probability of selecting every action in the next state. Expected SARSA also uses the same policy to select the action of a current state and that of the next state, which makes it similar to SARSA in this regard. All these temporal difference methods have different situations they favor, even though they all converged to the optimal action value function, leading to an optimal policy. SARSA and expected SARSA are both known as on-policy temporal difference algorithms because they use the same policy to pick an action for…

Contents