Solve the V* values for this grid world MDP - 3 x 4

Question

Solve the V* values for this grid world MDP - 3 x 4

1 Answer

Best answer

Here I have formed solution by using policy iteration approach. I am assuming it will move to the right always. Only one action per state. We are given a discount factor (γ) of 0.9 and noise of 0.8, 0.1, 0.1. The first column has terminal states with a value of -1, and the last column has terminal states with a value of 1. There are no living rewards.

a - go to the right always, fixed policy.

We have the following grid world:

-1	0	0	1
-1	0	0	1
-1	0	0	1

It can be shown as :

-1	A	B	1
-1	C	D	1
-1	E	F	1

Policy Iteration:

A = 0.9* (0.8 * B + 0.1 * A + 0.1 * C)

B = 0.9* (0.8 * 1 + 0.1 * B + 0.1 * D )

C = 0.9* (0.8 * D + 0.1 * A + 0.1 * E )

D = 0.9* (0.8 * 1 + 0.1 * B + 0.1 * F )

E = 0.9* (0.8 * F + 0.1 * C + 0.1 * E )

F = 0.9* (0.8 * 1 + 0.1 * D + 0.1 * F )

--------------------------------------------------

A = 0.72B + 0.09A + 0.09C

B = 0.72 + 0.09B + 0.09D

C = 0.72D + 0.09A + 0.09E

D = 0.72 + 0.09B + 0.09F

E = 0.72F + 0.09C + 0.09E

F = 0.72 + 0.09D + 0.09F

-------------------------------------------------

0.72B + 0.09A + 0.09C - A = 0

0.72 + 0.09B + 0.09D - B = 0

0.72D + 0.09A + 0.09E - C = 0

0.72 + 0.09B + 0.09F - D = 0

0.72F + 0.09C + 0.09E - E = 0

0.72 + 0.09D + 0.09F - F = 0

------------------------------------------------

Now we have :

0.72B - 0.91A + 0.09C = 0

0.72 - 0.91B + 0.09D = 0

0.72D + 0.09A + 0.09E - C = 0

0.72 + 0.09B + 0.09F - D = 0

0.72F + 0.09C - 0.91E = 0

0.72 + 0.09D - 0.91F = 0

We can see that :

0.72 - 0.91F + 0.09D = 0

0.72 - 0.91B + 0.09D = 0

It means that B = F

---------------------------------

Now we have :

0.72B - 0.91A + 0.09C = 0

0.72 - 0.91B + 0.09D = 0

0.72D + 0.09A + 0.09E - C = 0

0.72 + 0.09B + 0.09B - D = 0

0.72B + 0.09C - 0.91E = 0

We can see that :

- 0.09C + 0.91E = 0.91A - 0.09C

It means that : A = E

----------------------------------

Now we have :

0.72B - 0.91A + 0.09C = 0

0.72 - 0.91B + 0.09D = 0

0.72D + 0.18A - C = 0

0.72 + 0.18B - D = 0

0.72B + 0.09C - 0.91A = 0

We can see that

0.91B - 0.09D = -0.18B + D

1.09B = 1.09D

B = D = F

----------------------------------

Now we have :

0.72B - 0.91A + 0.09C = 0

0.72 - 0.91B + 0.09D = 0

0.72B + 0.18A - C = 0

0.72 + 0.18B - B = 0

We can see that:

- 0.91A + 0.09C = 0.18A - C

A = C = E

So it can be shown as :

0.72B - 0.82A = 0

0.72 - 0.82B = 0

--------------------------------------------------------

B = D = F = 0.72/0.82 ≈ 0.88

A = C = E = ( 0.72 * B ) / 0.82 = (0.72 * 0.878) / 0.82 ≈ 0.77

So it can be shown as :

-1	0.77	0.88	1
-1	0.77	0.88	1
-1	0.77	0.88	1

answered May 2, 2023 by aliasgarovs (248 points)
selected May 4, 2023 by Amrinder Arora

Categories

Most popular tags

Solve the V* values for this grid world MDP - 3 x 4

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions