Start free trial Sign in

From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

A basic RL solution

A basic RL solution - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

A basic RL solution

“

- [Instructor] An equation called bellman equation is used to solve the markov decision process. Before we proceed you need to understand some new terms, the state value function and the actual value function. They both represent the same thing which is the bellman equation. The state value function is the expected value of the reward in a particular state. Did you get that? It is the total reward gained by the agent from its current state to the goal state. Let's go back to the room containing fruits for my fruit salad. I decided to move forward, left, forward and forward again and that policy takes me to my first fruit the banana that's one policy I can take. My cumulative reward from the first state to the fourth state is the state value of the four states because that's where I started. If I want to get the state value of the second state I would sum up my rewards from the second state to the fourth state and so on. Note that different policies or paths would result in different…

Contents