0 votes

Calculate the final V* values for the given grid world.  Fill in all missing cells.  All given values are terminal states.  Use value iteration/policy iteration, any method that you like.

Assume 50% discount (that is, gamma = 0.5).  Assume 0.8, 0.1, 0.1 noise, that is, probability of going to the intended direction is 0.8, and probability of going left/right is 0.1 each.

GridWorld
1010101010
1010
1010
1010
1010101010

  

in Informed Search by AlgoMeister (1.6k points)

2 Answers

0 votes
1010101010
10aba10
10bcb10
10aba10
1010101010

a=0.5*(0.9*10+0.1*b)              (1

b=0.5*(0.8*10+0.2*a)              (2

c=0.5*b

because of (1) and (2)

//a=4.5+0.05*b

//b=4+0.1*a

a=4.7+0.005a

a≈4.72361809

b≈4.47236181

c≈2.2361809

by (116 points)
0 votes
10    10    10    10    10
10    a       b       a    10
10    b       c       b    10
10    a       b       a    10
10    10    10    10    10

a = 0.5*(0.8*10 + 0.1*b + 0.1*10) = 0.5*(9+0.1*b)
b = 0.5*(0.8*10 + 0.1*a + 0.1*b) = 0.5*(8 + 0.1*a + 0.1*b)
c = 0.5*(0.8*b + 0.1*b + 0.1*b) = 0.5*(b)
On solving, a = 4.72 b = 4.459 c = 2.23
by AlgoMeister (900 points)
The Book: Analysis and Design of Algorithms | Presentations on Slideshare | Lecture Notes, etc
...