Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Markov Decision Processes: The Mathematical Framework | The Learning Loop: Foundations of Reinforcement Learning
Reinforcement Learning Theory for Beginners

Markov Decision Processes: The Mathematical Framework

Scorri per mostrare il menu

To understand how reinforcement learning agents make decisions, you need a clear framework that defines the environment, the agent's choices, and the consequences of those choices. This is where Markov Decision Processes (MDPs) come in.

An MDP is made up of:

  • States: the different situations or positions the agent can be in (each spot in the maze);
  • Actions: the choices available to the agent at each state (which direction to move);
  • Rewards: the feedback the agent gets after taking an action (like +1 for the exit, -1 for a trap);
  • Transitions: the rules or probabilities that determine what next state results from each action (moving right from one cell leads to the next cell to the right);

Using the maze analogy, you can think of the agent as someone trying to find the best way out, learning which moves lead to the goal and which lead to dead ends.

1234567891011121314151617181920212223242526272829303132333435363738394041
# Define a simple maze as an MDP using Python dictionaries states = ["A", "B", "C", "D"] # Maze positions actions = { "A": ["right", "down"], "B": ["left", "down"], "C": ["up", "right"], "D": ["up", "left"] } # Rewards for taking an action from a state rewards = { ("A", "right"): 0, ("A", "down"): -1, ("B", "left"): 0, ("B", "down"): 1, # Reaching the goal ("C", "up"): 0, ("C", "right"): -1, ("D", "up"): 0, ("D", "left"): -1 } # Transitions: (current_state, action) -> next_state transitions = { ("A", "right"): "B", ("A", "down"): "C", ("B", "left"): "A", ("B", "down"): "D", ("C", "up"): "A", ("C", "right"): "D", ("D", "up"): "B", ("D", "left"): "C" } # Example: What happens if the agent is in state "A" and takes action "right"? current_state = "A" action = "right" next_state = transitions[(current_state, action)] reward = rewards[(current_state, action)] print(f"From {current_state}, take '{action}'  {next_state}, reward: {reward}")

The key feature that makes an MDP special is the Markov property. In simple terms, this means that the agent's future is determined only by its current state and the action it chooses, not by the sequence of states and actions that came before. In the maze, if you know your current position, you have all the information you need to decide your next move and predict the consequences. You do not need to remember the entire path you took to get there.

question mark

Which statement best describes the Markov property in the context of Markov Decision Processes (MDPs)?

Seleziona la risposta corretta

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 4

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Sezione 1. Capitolo 4
some-alt