Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Policies: Strategies for Decision-Making | The Learning Loop: Foundations of Reinforcement Learning
Reinforcement Learning Theory for Beginners

Policies: Strategies for Decision-Making

Sveip for å vise menyen

A policy in reinforcement learning is the agent’s strategy for making decisions. Formally, a policy is a mapping from situations (also called "states") to actions. You can think of a policy as the agent’s internal rulebook: whenever the agent finds itself in a particular state, the policy tells it what action to take next.

Imagine you are controlling a robot in a maze. If you use a random policy, the robot chooses its next move without any plan - sometimes going left, sometimes right, sometimes forward, with no consideration of where it is or where it’s been. This often leads to wandering in circles or getting stuck at dead ends. In contrast, a learned policy is developed through experience. Over time, the agent notices which actions help it reach the goal more quickly and starts to favor those choices. The result is a more direct, efficient path through the maze, as the agent’s decisions become guided by what it has learned works best in each situation.

12345678910111213141516171819
# Simple policy pseudocode for maze navigation def simple_policy(state): """ Given the current state, decide the next action. If at a dead end, turn around. Otherwise, move forward. """ if state == "dead_end": action = "turn_around" else: action = "move_forward" return action # Example usage: state = "dead_end" print(simple_policy(state)) state = "corridor" print(simple_policy(state))

Policies are not fixed - they can be improved over time as the agent interacts with its environment and receives feedback. At first, an agent may act randomly, but as it gathers experience, it updates its policy to favor actions that lead to higher rewards. This process is at the heart of reinforcement learning: the agent continually refines its policy, learning from both successes and mistakes, to make better decisions in the future.

question mark

What best describes a policy in reinforcement learning?

Velg det helt riktige svaret

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 3

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 3
some-alt