Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Policy Iteration | Dynamic Programming
Introduction to Reinforcement Learning

bookPolicy Iteration

The idea behind policy iteration is simple:

  1. Take some initial Ο€\pi and vv;
  2. Use policy evaluation to update vv until it's consistent with Ο€\pi;
  3. Use policy improvement to update Ο€\pi until it's greedy with respect to vv;
  4. Repeat steps 2-3 until convergence.

In this method, there are no partial updates:

  • During policy evaluation, values are updated for each state, until they are consistent with current policy;
  • During policy improvement, policy is made greedy with respect to value function.

Pseudocode

question mark

Based on the pseudocode, what condition causes the outer loop of policy iteration to stop?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 7

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 2.7

bookPolicy Iteration

Swipe to show menu

The idea behind policy iteration is simple:

  1. Take some initial Ο€\pi and vv;
  2. Use policy evaluation to update vv until it's consistent with Ο€\pi;
  3. Use policy improvement to update Ο€\pi until it's greedy with respect to vv;
  4. Repeat steps 2-3 until convergence.

In this method, there are no partial updates:

  • During policy evaluation, values are updated for each state, until they are consistent with current policy;
  • During policy improvement, policy is made greedy with respect to value function.

Pseudocode

question mark

Based on the pseudocode, what condition causes the outer loop of policy iteration to stop?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 7
some-alt