Seksjon 1. Kapittel 7
single
Challenge: Q-table Update with SARSA
Sveip for å vise menyen
Oppgave
Sveip for å begynne å kode
Given a sequence of state-action pairs, update the Q-table using the SARSA rule.
You are provided with a Q-table, a sequence of (state, action) pairs, a learning rate (alpha), a discount factor (gamma), and a list of rewards received after each transition.
- For each consecutive pair in the state-action sequence, update the Q-value for the current (state, action) using the SARSA update rule.
- Use the corresponding reward for each state-action transition.
- Do not update the final state-action pair, as there is no next state-action following it.
- Apply the SARSA update:
Q[state, action] = Q[state, action] + alpha * (reward + gamma * Q[next_state, next_action] - Q[state, action]).
Løsning
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 1. Kapittel 7
single
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår