Start free trial Sign in

From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Markov decision process

Markov decision process - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

Markov decision process

“

- [Instructor] One very important topic left to discuss when describing a reinforcement learning problem, is the Markov decision process. You might have been wondering how everything discussed in the previous lesson is even possible mathematically, or even in code. Markov decision process, MDP in short, is how reinforcement learning problems are represented mathematically. Its variables include states, actions, rewards, one step dynamics of the environment, which is the states transition probability, and the discount factor. I know, I didn't mention discount factor before, so I'll do justice to that. Let's go back to the race example, but this time around we are running forever, definitely not possible. We are assuming this is a continuing task, as opposed to the hundred meter sprint, which is an episodic task. Now, because we're running forever, we are not very sure what the future holds, or whether our future rewards will be any better than the present. Due to this, we favor and…

Contents