Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Rewards: Shaping Agent Behavior | The Learning Loop: Foundations of Reinforcement Learning
Reinforcement Learning Theory for Beginners

Rewards: Shaping Agent Behavior

Sveip for å vise menyen

In reinforcement learning, rewards are the main way an agent learns what is good or bad in its environment. Imagine you are guiding an agent through a maze. Each time the agent reaches the exit, it receives a positive reward, like +10 points. If the agent bumps into a wall, it gets a negative reward, such as -1 point. These rewards are the signals that tell the agent which actions are desirable and which to avoid. Over time, the agent tries to maximize its total reward by learning the best way to reach the exit while minimizing mistakes like hitting walls. This process of receiving and responding to rewards is what shapes the agent's behavior and learning strategy.

12345678910111213141516
# Simple reward system: updating score based on agent's actions score = 0 # Initial score actions = ["move_forward", "hit_wall", "move_forward", "reach_exit"] for action in actions: if action == "reach_exit": reward = 10 # Positive reward for reaching the exit elif action == "hit_wall": reward = -1 # Negative reward for hitting a wall else: reward = 0 # No reward for normal movement score += reward print(f"Action: {action}, Reward: {reward}, Total Score: {score}")

You can think of the reward signal as immediate feedback after each action. The agent uses this feedback to learn which choices are beneficial. If you design the reward structure differently, the agent's behavior will change. For instance:

  • If you give a small positive reward for exploring new paths and a larger reward for reaching the exit, the agent may be more willing to try unknown routes;
  • If you penalize repeated actions or give large negative rewards for mistakes, the agent will quickly learn to avoid those behaviors.

The way you set up rewards directly shapes how the agent learns and what strategies it develops.

question mark

What is the primary purpose of rewards in reinforcement learning?

Velg det helt riktige svaret

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 2

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 2
some-alt