Implementing SARSA in Python
Veeg om het menu te tonen
import numpy as np
import gymnasium as gym
# Initialize grid-world environment
env = gym.make("FrozenLake-v1", is_slippery=False) # Deterministic grid-world
num_states = env.observation_space.n
num_actions = env.action_space.n
# Initialize Q-table for SARSA
Q = np.zeros((num_states, num_actions))
# Set SARSA hyperparameters
alpha = 0.1 # Learning rate
gamma = 0.99 # Discount factor
epsilon = 0.1 # Exploration rate
episodes = 1000 # Number of training episodes
def epsilon_greedy(Q, state, epsilon):
if np.random.rand() < epsilon:
return np.random.choice(Q.shape[1])
else:
return np.argmax(Q[state])
for episode in range(episodes):
state, _ = env.reset()
action = epsilon_greedy(Q, state, epsilon)
done = False
while not done:
next_state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
next_action = epsilon_greedy(Q, next_state, epsilon)
# SARSA update rule
Q[state, action] += alpha * (reward + gamma * Q[next_state, next_action] - Q[state, action])
state = next_state
action = next_action
In SARSA, the agent learns the value of the policy it is actually following, including the exploration steps. The Q-table is updated using the action chosen by the current policy (which may be exploratory), rather than the greedy action used in Q-learning. This on-policy approach can make SARSA more robust in environments where exploratory actions could lead to poor outcomes, as it directly incorporates the effects of exploration into the learning process. In grid-world, you may notice that SARSA's policy tends to avoid risky paths more often than Q-learning, especially when the environment has negative rewards or traps.
To evaluate and visualize the performance of SARSA versus Q-learning, you can record the total rewards obtained in each episode and plot them using matplotlib. This allows you to compare how quickly each algorithm learns and how consistent their performance is over time.
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.