Policy Iteration
Swipe to show menu
The idea behind policy iteration is simple:
- Take some initial ฯ and v;
- Use policy evaluation to update v until it's consistent with ฯ;
- Use policy improvement to update ฯ until it's greedy with respect to v;
- Repeat steps 2-3 until convergence.
In this method, there are no partial updates:
- During policy evaluation, values are updated for each state, until they are consistent with current policy;
- During policy improvement, policy is made greedy with respect to value function.
Pseudocode
Everything was clear?
Thanks for your feedback!
Sectionย 3. Chapterย 7
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Sectionย 3. Chapterย 7