0 votes

You are traveling on a straight road, but have a jumpy car.  The car sometimes "jumps" (moves double).  At other times, it doesn't move at all.  The following MDP has been created to model this behavior and the landscape.

Estimate the V* values (optimal values for the states) for this MDP:

S1S2S3S4S5S6S7S8S9
sqrt(3)100sqrt(3)

MDP is defined as follows: There are two actions L (Left) and R (Right).  When moving left, there is a 40% chance moving left, 50% chance of moving DOUBLE (2 spots left), and 10% chance of not moving at all.  Similarly, when moving right, there is a 40% chance moving right, 50% chance of moving DOUBLE (2 spots right), and 10% chance of not moving at all.

S1, S5 and S9 are terminal states with values sqrt(3), 100 and sqrt(3) respectively.  

Discount factor is 0.9 (so, gamma = 0.9).

There is no living reward, that is R(s,a,s’) = 0.

in MDP by AlgoMeister (1.6k points)
edited by

1 Answer

0 votes

By using symmetry, we can intuit that at S6, optimal policy is to go left, and S4, the optimal policy is to go right.  Suppose we call this policy p. Then, we can write that Vp(S2) = a = Vp(S8).   Vp(S3) = b = Vp(S7).   Vp(s4) = c = Vp(S6)

In terms of equations, we can write as:

c = 0.9 * (0.4 * 100 + 0.5 * c + 0.1 * c).   That is, c = 36 + 0.54c.  That is, c = 36/0.46 = 78.26.

Similarly, we can write: 

b = 0.9 * (0.4 * c + 0.5 * 100 + 0.1 * b).   That is, b = 45 + 0.36 c + 0.09 b.  That is, b * 0.91 = 45 + 0.36 * c.

Using c = 78.26, we get b = 80.41

Similarly, we can solve for a:

a = 0.9 * (0.4 * b + 0.5 * c + 0.1 * a).   That is, a = (0.36 b + 0.45 c)/0.91.  That is, a = 70.51129.

This is policy evaluation.  We still need to check that this policy is optimal, and if that is correct, we can claim that V*(S2) = Vp(S2).

by AlgoMeister (1.6k points)

Related questions

0 votes
1 answer
asked Feb 23, 2021 in MDP by Amrinder Arora AlgoMeister (1.6k points)
0 votes
1 answer
asked Mar 30, 2021 in MDP by Amrinder Arora AlgoMeister (1.6k points)
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
The Book: Analysis and Design of Algorithms | Presentations on Slideshare | Lecture Notes, etc
...