From the course: Reinforcement Learning Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

The setting

The setting - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

The setting

- [Instructor] Taking a deep dive into the Monte Carlo Method, we mentioned the Q-Table initially, which is a table used to store the state values for every action taken by an agent. This Q-Table will contains rows and columns and columns where the rules represent the states and the columns represent the actions. This could be interchanged. So for an action taken in any state, the expected reward also known as the state values are recorded here in the Q-Table. For example, when the agent is in state two, we assume that for taking an action right it gets a state value of positive seven. The status of this reward will be the cumulative reward of starting from state two taking action, right, and following the policy until it reaches its goal. For any policy taken in an episode, the cumulative reward will be the sum of the reward of the stats state, the reward of the second state, and all the other rewards in that policy to the end of the policy the goal. For future policies, the agent…

Contents