Calculate the final V* values for the given grid world. Fill in all missing cells. All given values are terminal states. Use value iteration/policy iteration, any method that you like.
Assume 50% discount (that is, gamma = 0.5). Assume 0.8, 0.1, 0.1 noise, that is, probability of going to the intended direction is 0.8, and probability of going left/right is 0.1 each.
GridWorld10 | 10 | 10 | 10 | 10 |
10 | | | | 10 |
10 | | | | 10 |
10 | | | | 10 |
10 | 10 | 10 | 10 | 10 |