Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Rewards: Shaping Agent Behavior | The Learning Loop: Foundations of Reinforcement Learning
Reinforcement Learning Theory for Beginners

Rewards: Shaping Agent Behavior

Swipe um das Menü anzuzeigen

In reinforcement learning, rewards are the main way an agent learns what is good or bad in its environment. Imagine you are guiding an agent through a maze. Each time the agent reaches the exit, it receives a positive reward, like +10 points. If the agent bumps into a wall, it gets a negative reward, such as -1 point. These rewards are the signals that tell the agent which actions are desirable and which to avoid. Over time, the agent tries to maximize its total reward by learning the best way to reach the exit while minimizing mistakes like hitting walls. This process of receiving and responding to rewards is what shapes the agent's behavior and learning strategy.

12345678910111213141516
# Simple reward system: updating score based on agent's actions score = 0 # Initial score actions = ["move_forward", "hit_wall", "move_forward", "reach_exit"] for action in actions: if action == "reach_exit": reward = 10 # Positive reward for reaching the exit elif action == "hit_wall": reward = -1 # Negative reward for hitting a wall else: reward = 0 # No reward for normal movement score += reward print(f"Action: {action}, Reward: {reward}, Total Score: {score}")

You can think of the reward signal as immediate feedback after each action. The agent uses this feedback to learn which choices are beneficial. If you design the reward structure differently, the agent's behavior will change. For instance:

  • If you give a small positive reward for exploring new paths and a larger reward for reaching the exit, the agent may be more willing to try unknown routes;
  • If you penalize repeated actions or give large negative rewards for mistakes, the agent will quickly learn to avoid those behaviors.

The way you set up rewards directly shapes how the agent learns and what strategies it develops.

question mark

What is the primary purpose of rewards in reinforcement learning?

Wählen Sie die richtige Antwort aus

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 2

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Abschnitt 1. Kapitel 2
some-alt