0 votes

For the given MDP, find the values for the states S2, S3 and S4. States S1 and S5 are terminal states with values 0 and 1 respectively. Living Reward (R) is 0. Transition function is defined as follows: When going Left or Right, there is a 90% probability that the move goes as planned and 10% probability that no move occurs. Discount rate gamma is 0.9

S1S2S3S4S5
0x2x3x41

in MDP by AlgoMeister (1.6k points)
recategorized by

1 Answer

+1 vote
 
Best answer
x4 = 0 + 0.9(0.9 * 1 + 0.1 *  x4) => 0.91*x4 = 0.81 => x4 = 81/91 = 0.890109

x3 = 0 + 0.9(0.9 * x4 + 0.1* x3) => 0.91 * x3 = 0.81 * x4 => x3 = (81/91)^2 = 0.792295

x2 = 0 + 0.9(0.9 * x3 + 0.1 * x2) => 0.91 *x2 = 0.81 * x3 => x2 = (81/91)^3 = 0.7052301
by Active (264 points)
selected by

Related questions

0 votes
2 answers
0 votes
1 answer
asked Mar 30, 2021 in MDP by Amrinder Arora AlgoMeister (1.6k points)
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
The Book: Analysis and Design of Algorithms | Presentations on Slideshare | Lecture Notes, etc
...