Summary  
This chapter explains how to implement a reward-based feedback mechanism in code to update an agent’s cumulative score and shape its behavior through positive and negative rewards.  

General domain of usage  
Maze navigation

Watch an animation of an agent navigating a maze. As the agent moves, it receives positive rewards (shown in green) for reaching the exit and negative rewards (shown in red) for hitting walls. Each action updates the agent's total score, visually reinforcing how rewards influence its choices and learning over time. Each code task should contain no more than 10 lines. 

In reinforcement learning, **rewards** are the main way an agent learns what is good or bad in its environment. Imagine you are guiding an agent through a maze. Each time the agent reaches the exit, it receives a positive reward, like `+10` points. If the agent bumps into a wall, it gets a negative reward, such as `-1` point. These rewards are the signals that tell the agent which actions are desirable and which to avoid. Over time, the agent tries to maximize its total reward by learning the best way to reach the exit while minimizing mistakes like hitting walls. This process of receiving and responding to rewards is what shapes the agent's behavior and learning strategy.

# Simple reward system: updating score based on agent's actions

score = 0  # Initial score

actions = ["move_forward", "hit_wall", "move_forward", "reach_exit"]

for action in actions:
    if action == "reach_exit":
        reward = 10  # Positive reward for reaching the exit
    elif action == "hit_wall":
        reward = -1  # Negative reward for hitting a wall
    else:
        reward = 0   # No reward for normal movement

    score += reward
    print(f"Action: {action}, Reward: {reward}, Total Score: {score}")

You can think of the **reward signal** as immediate feedback after each action. The agent uses this feedback to learn which choices are beneficial. If you design the reward structure differently, the agent's behavior will change. For instance:

- If you give a small positive reward for exploring new paths and a larger reward for reaching the exit, the agent may be more willing to try unknown routes;
- If you penalize repeated actions or give large negative rewards for mistakes, the agent will quickly learn to avoid those behaviors.

The way you set up **rewards** directly shapes how the agent learns and what strategies it develops.

What is the primary purpose of rewards in reinforcement learning?

Explore the theoretical foundations of reinforcement learning through a 'Learning Loop' narrative. Explore agent-environment interaction, rewards, policies, Markov Decision Processes, and the exploration vs. exploitation dilemma through intuitive analogies and thought experiments.

Explore the conceptual building blocks of reinforcement learning through a narrative-driven approach, focusing on the agent's learning loop, rewards, policies, Markov Decision Processes, and the exploration-exploitation dilemma.

Rewards: Shaping Agent Behavior