Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Rewards: Shaping Agent Behavior | The Learning Loop: Foundations of Reinforcement Learning
Reinforcement Learning Theory for Beginners

Rewards: Shaping Agent Behavior

Swipe to show menu

In reinforcement learning, rewards are the main way an agent learns what is good or bad in its environment. Imagine you are guiding an agent through a maze. Each time the agent reaches the exit, it receives a positive reward, like +10 points. If the agent bumps into a wall, it gets a negative reward, such as -1 point. These rewards are the signals that tell the agent which actions are desirable and which to avoid. Over time, the agent tries to maximize its total reward by learning the best way to reach the exit while minimizing mistakes like hitting walls. This process of receiving and responding to rewards is what shapes the agent's behavior and learning strategy.

12345678910111213141516
# Simple reward system: updating score based on agent's actions score = 0 # Initial score actions = ["move_forward", "hit_wall", "move_forward", "reach_exit"] for action in actions: if action == "reach_exit": reward = 10 # Positive reward for reaching the exit elif action == "hit_wall": reward = -1 # Negative reward for hitting a wall else: reward = 0 # No reward for normal movement score += reward print(f"Action: {action}, Reward: {reward}, Total Score: {score}")

You can think of the reward signal as immediate feedback after each action. The agent uses this feedback to learn which choices are beneficial. If you design the reward structure differently, the agent's behavior will change. For instance:

  • If you give a small positive reward for exploring new paths and a larger reward for reaching the exit, the agent may be more willing to try unknown routes;
  • If you penalize repeated actions or give large negative rewards for mistakes, the agent will quickly learn to avoid those behaviors.

The way you set up rewards directly shapes how the agent learns and what strategies it develops.

question mark

What is the primary purpose of rewards in reinforcement learning?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 2
some-alt