Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Q-table Update with SARSA | Classic RL Algorithms: Q-learning & SARSA
Hands-On Classic RL Algorithms with Python
Seksjon 1. Kapittel 7
single

single

Challenge: Q-table Update with SARSA

Sveip for å vise menyen

Oppgave

Sveip for å begynne å kode

Given a sequence of state-action pairs, update the Q-table using the SARSA rule.

You are provided with a Q-table, a sequence of (state, action) pairs, a learning rate (alpha), a discount factor (gamma), and a list of rewards received after each transition.

  • For each consecutive pair in the state-action sequence, update the Q-value for the current (state, action) using the SARSA update rule.
  • Use the corresponding reward for each state-action transition.
  • Do not update the final state-action pair, as there is no next state-action following it.
  • Apply the SARSA update: Q[state, action] = Q[state, action] + alpha * (reward + gamma * Q[next_state, next_action] - Q[state, action]).

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 7
single

single

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

some-alt