Policy Iteration
The idea behind policy iteration is simple:
- Take some initial Ο and v;
- Use policy evaluation to update v until it's consistent with Ο;
- Use policy improvement to update Ο until it's greedy with respect to v;
- Repeat steps 2-3 until convergence.
In this method, there are no partial updates:
- During policy evaluation, values are updated for each state, until they are consistent with current policy;
- During policy improvement, policy is made greedy with respect to value function.
Pseudocode
Everything was clear?
Thanks for your feedback!
SectionΒ 3. ChapterΒ 7
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 2.7
Policy Iteration
Swipe to show menu
The idea behind policy iteration is simple:
- Take some initial Ο and v;
- Use policy evaluation to update v until it's consistent with Ο;
- Use policy improvement to update Ο until it's greedy with respect to v;
- Repeat steps 2-3 until convergence.
In this method, there are no partial updates:
- During policy evaluation, values are updated for each state, until they are consistent with current policy;
- During policy improvement, policy is made greedy with respect to value function.
Pseudocode
Everything was clear?
Thanks for your feedback!
SectionΒ 3. ChapterΒ 7