Jumpy Car - Estimate/Solve the V* values for the following MDP

Question

Jumpy Car - Estimate/Solve the V* values for the following MDP

You are traveling on a straight road, but have a jumpy car. The car sometimes "jumps" (moves double). At other times, it doesn't move at all. The following MDP has been created to model this behavior and the landscape.

Estimate the V* values (optimal values for the states) for this MDP:

S1	S2	S3	S4	S5	S6	S7	S8	S9
sqrt(3)				100				sqrt(3)

MDP is defined as follows: There are two actions L (Left) and R (Right). When moving left, there is a 40% chance moving left, 50% chance of moving DOUBLE (2 spots left), and 10% chance of not moving at all. Similarly, when moving right, there is a 40% chance moving right, 50% chance of moving DOUBLE (2 spots right), and 10% chance of not moving at all.

S1, S5 and S9 are terminal states with values sqrt(3), 100 and sqrt(3) respectively.

Discount factor is 0.9 (so, gamma = 0.9).

There is no living reward, that is R(s,a,s’) = 0.

asked Mar 1, 2021 in MDP by Amrinder Arora AlgoMeister (1.6k points)
edited Mar 1, 2021 by Amrinder Arora

1 Answer

Related questions

0 votes

1 answer

Solve the V* values for this MDP

asked Feb 23, 2021 in MDP by Amrinder Arora AlgoMeister (1.6k points)

0 votes

1 answer

Solve the V* values for this MDP - 5x5

asked Mar 30, 2021 in MDP by Amrinder Arora AlgoMeister (1.6k points)

0 votes

1 answer

Solve the V* values for this grid world MDP - 3 x 4

asked Apr 16, 2023 in Informed Search by Amrinder Arora AlgoMeister (1.6k points)

0 votes

1 answer

Evaluate an MDP given several observed episodes

asked May 11, 2023 in MDP by bulldozer070 AlgoMeister (568 points)

0 votes

1 answer

Stationary Distribution for this conditional probability table

asked Apr 4, 2020 in Informed Search by Amrinder Arora AlgoMeister (1.6k points)

Amrinder Arora · Answer 1 · 2021-03-07T19:16:01+0000

By using symmetry, we can intuit that at S6, optimal policy is to go left, and S4, the optimal policy is to go right. Suppose we call this policy p. Then, we can write that Vp(S2) = a = Vp(S8). Vp(S3) = b = Vp(S7). Vp(s4) = c = Vp(S6)

In terms of equations, we can write as:

c = 0.9 * (0.4 * 100 + 0.5 * c + 0.1 * c). That is, c = 36 + 0.54c. That is, c = 36/0.46 = 78.26.

Similarly, we can write:

b = 0.9 * (0.4 * c + 0.5 * 100 + 0.1 * b). That is, b = 45 + 0.36 c + 0.09 b. That is, b * 0.91 = 45 + 0.36 * c.

Using c = 78.26, we get b = 80.41

Similarly, we can solve for a:

a = 0.9 * (0.4 * b + 0.5 * c + 0.1 * a). That is, a = (0.36 b + 0.45 c)/0.91. That is, a = 70.51129.

This is policy evaluation. We still need to check that this policy is optimal, and if that is correct, we can claim that V*(S2) = Vp(S2).

Categories

Most popular tags

Jumpy Car - Estimate/Solve the V* values for the following MDP

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions