Solve the V* values for this MDP - 5x5

Question

Solve the V* values for this MDP - 5x5

1 Answer

Best answer

At first glace, with its 17 empty spaces, this problem looks long. However, we can take advantage of the symmetry to reduce the number of variables to solve for to four.

01 x1 00 x1 01
x1 x2 x3 x2 x1
00 x3 x4 x3 00
x1 x2 x3 x2 x1
01 x1 00 x1 01

We can also see the optimal policies for each cell and avoid the need to do multiple calculations and calculate a maximum.

- Spaces with x1 should aim toward the terminal states with 1
- Spaces with x2 should move toward the x1 spaces
- Spaces with x3 should move toward x2 (since x4 in the middle is the farthest away from the terminals)
- Spaces with x4 should move toward x3 (no real choice)

Now, the value given a policy pi and the state s is

V(s) = sum (over all states s) of T(s,pi(s),s')*[R(s,pi(s),s') + gamma*V(s')]

x1 = 0.8(0 + 0.9*1) + 0.1(0 + 0.9*x1) + 0.1(0+ 0.9*x2)
x1 = 0.72 + 0.09*x1 + 0.09*x2

x2 = 0.8(0 + 0.9*x1) + 0.1(0 + 0.9*x1) + 0.1(0 + 0.9*x3)
x2 = 0.81*x1 + 0.09*x3

x3 = 0.8(0 + 0.9*x2) + 0.1(0 + 0.9*0) + 0.1(0 + 0.9*x4)
x3 = 0.72*x2 + 0.09*x4

x4 = 0.8(0 + 0.9*x3) + 0.1(0 + 0.9*x3) + 0.1(0 + 0.9*x3)
x4 = 0.90*x3

answered May 3, 2023 by bulldozer070 AlgoMeister (568 points)
selected May 4, 2023 by Amrinder Arora

Categories

Most popular tags

Solve the V* values for this MDP - 5x5

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions