From the course: Reinforcement Learning Foundations

Exploration and exploitation - Python Tutorial

From the course: Reinforcement Learning Foundations

Start my 1-month free trial

Exploration and exploitation

- [Instructor] Exploration and exploitation are very important terms in reinforcement learning. When an agent explores an environment, it tries to understand as much of the environment as possible. It tries to reach many states and exploits information from them. By exploiting, it learns as much from the state as possible. Exploration and exploitation are two different but simultaneous activities performed by a reinforcement learning agent. They are most useful in model-free systems where the agent doesn't have any model of the environment and actively tries to understand it. They are also useful in model-based systems to help navigate the environment. As mentioned initially about learning optimal policies, the agent selects an action in the state that has resulted in the best cumulative reward over different episodes. However, this is not always correct because it's choosing actions at random and then action which led to a lesser cumulative reward now might actually be better if it's followed a different policy five seconds later. To avoid missing actions that will lead to better rewards, the agent would not only select the best action to follow as it sees in the Q-table. If the Q-table says action right would yield the highest reward, action right will be selected with a 70% probability while other actions, left, forward and backwards will be selected with 10% probability each. This way, the agents is exploring those other actions in that state more, and it might end up learning something better that will lead to a higher reward. These new actions if they lead to a better cumulative reward would replace the previous best actions. A strategy for exploration and exploitation that has proved to give optimal results is when exploration is initially favored over exploitation and then leans towards exploitation. This is how exploring and exploiting new and previous states helps us get better rewards.

Contents