From the course: Reinforcement Learning Foundations
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
A basic RL solution - Python Tutorial
From the course: Reinforcement Learning Foundations
A basic RL solution
- [Instructor] An equation called bellman equation is used to solve the markov decision process. Before we proceed you need to understand some new terms, the state value function and the actual value function. They both represent the same thing which is the bellman equation. The state value function is the expected value of the reward in a particular state. Did you get that? It is the total reward gained by the agent from its current state to the goal state. Let's go back to the room containing fruits for my fruit salad. I decided to move forward, left, forward and forward again and that policy takes me to my first fruit the banana that's one policy I can take. My cumulative reward from the first state to the fourth state is the state value of the four states because that's where I started. If I want to get the state value of the second state I would sum up my rewards from the second state to the fourth state and so on. Note that different policies or paths would result in different…