Summary  
This chapter covers the reinforcement learning agent-environment loop, showing how code observes states, selects actions, receives rewards, and updates knowledge in a continuous learning cycle.

General domain of usage  
Autonomous navigation systems

**Animated visualization of a robot navigating a simple maze.** The animation highlights the ongoing cycle: the robot (agent) observes its current position (state), decides which direction to move (action), receives feedback from the environment (reward and new state), and repeats this process. Key elements such as the agent, environment, actions, and feedback are visually emphasized to illustrate the agent-environment loop central to reinforcement learning.

At the heart of reinforcement learning is the **Learning Loop**: a continuous cycle where an agent interacts with its environment, takes actions, receives feedback, and gradually learns to make better decisions. Imagine a robot placed in a maze. The robot is the **agent**, and the maze is its **environment**. At every step, the robot looks around to see where it is, decides which way to move, and then finds out what happens as a result - whether it hits a wall, moves closer to the exit, or even finds the way out. Each outcome provides **feedback**, helping the robot adjust its future choices. Over time, by repeating this loop, the robot becomes better at navigating the maze.

```
initialize agent_knowledge

while not done:
    state = observe_environment()
    action = agent_selects_action(state)
    new_state, reward = environment_responds(action)
    agent_updates_knowledge(state, action, reward, new_state)
    state = new_state
```

Here is a breakdown of each part of this loop using the maze analogy:

- The agent observes the environment: the robot checks its current position in the maze and notes any nearby walls or open paths;
- The agent selects an action: based on what it sees, the robot decides whether to move forward, turn left, turn right, or stay still;
- The environment responds: after the robot moves, the maze provides feedback - maybe the robot bumps into a wall (**negative feedback**), moves closer to the exit (**positive feedback**), or finds the exit itself (**reward**);
- The agent updates its knowledge: the robot remembers what happened as a result of its action, so next time it faces a similar situation, it can make a better decision.

This loop repeats, allowing the robot to **learn from its successes and mistakes**, improving its ability to solve the maze.

In the context of the agent-environment loop described in the chapter, what is the primary role of the agent?

Explore the theoretical foundations of reinforcement learning through a 'Learning Loop' narrative. Explore agent-environment interaction, rewards, policies, Markov Decision Processes, and the exploration vs. exploitation dilemma through intuitive analogies and thought experiments.

Explore the conceptual building blocks of reinforcement learning through a narrative-driven approach, focusing on the agent's learning loop, rewards, policies, Markov Decision Processes, and the exploration-exploitation dilemma.

The Agent-Environment Loop: A Maze Analogy