0 votes

Calculate the final V* values for the given grid world.  Fill in all missing cells.  All given values are terminal states.  Use value iteration/policy iteration, any method that you like.

Assume 50% discount (that is, gamma = 0.5).  Assume 0.8, 0.1, 0.1 noise, that is, probability of going to the intended direction is 0.8, and probability of going left/right is 0.1 each.

GridWorld
1010101010
1010
1010
1010
1010101010

  

in Informed Search by AlgoMeister (1.6k points)

1 Answer

0 votes
1010101010
10aba10
10bcb10
10aba10
1010101010

a=0.5*(0.9*10+0.1*b)              (1

b=0.5*(0.8*10+0.2*a)              (2

c=0.5*b

because of (1) and (2)

//a=4.5+0.05*b

//b=4+0.1*a

a=4.7+0.005a

a≈4.72361809

b≈4.47236181

c≈2.2361809

by (116 points)
The Book: Analysis and Design of Algorithms | Presentations on Slideshare | Lecture Notes, etc
...