Policy Iteration
The idea behind policy iteration is simple:
- Take some initial Ο and v;
- Use policy evaluation to update v until it's consistent with Ο;
- Use policy improvement to update Ο until it's greedy with respect to v;
- Repeat steps 2-3 until convergence.
In this method, there are no partial updates:
- During policy evaluation, values are updated for each state, until they are consistent with current policy;
- During policy improvement, policy is made greedy with respect to value function.
Pseudocode
Everything was clear?
Thanks for your feedback!
SectionΒ 3. ChapterΒ 7
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Suggested prompts:
Can you explain what policy evaluation and policy improvement mean in this context?
How does policy iteration differ from value iteration?
Can you walk me through the pseudocode step by step?
Awesome!
Completion rate improved to 2.7
Policy Iteration
Swipe to show menu
The idea behind policy iteration is simple:
- Take some initial Ο and v;
- Use policy evaluation to update v until it's consistent with Ο;
- Use policy improvement to update Ο until it's greedy with respect to v;
- Repeat steps 2-3 until convergence.
In this method, there are no partial updates:
- During policy evaluation, values are updated for each state, until they are consistent with current policy;
- During policy improvement, policy is made greedy with respect to value function.
Pseudocode
Everything was clear?
Thanks for your feedback!
SectionΒ 3. ChapterΒ 7