From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

SARSA

SARSA - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

SARSA

- [Instructor] SARSA is the first form of temporary difference methods and is the acronym for, State Action Reward next State and next Action. Which is the process taken by the agent to update the queue table and also mix up a complete reinforcement learning cycle. Let's go back to the room with the orange fruits and see how the reward is updated in the queue table. This time around, you're the agents trying to reach your goal of getting the orange. With every step you take, you get a reward and because you wouldn't wait until the end of an episode before getting your reward, as this is temporary difference learning, your reward has to be updated immediately. Note that initially, you take random policies to update the queue table, which was originally empty at the very start. So for your new step or change in states, he gets a reward. This reward say negative one because you haven't reached your goal yet, together with the state action value of the new states you just stepped into…

Contents