### Setup
Since there is no initial distribution, it can be assigned arbitrarily:
P(+r) = P(-r) = 0.5
What are the transition probabilities?
P( +r | +r ) = 0.7
P( +r | -r ) = 0.3
P( -r | +r ) = 0.3
P( -r | -r ) = 0.7
What are the emission probabilities?
P( +u | +r ) = 0.9
P( +u | -r ) = 0.2
P( -u | +r ) = 0.1
P( -u | -r ) = 0.8
The Viterbi tables can now be initialized. Table T1 indicates the probability of a particular state occurring at a particular point in the sequence while Table T2 indicates the previous state for a given point in a sequence that led to that point with the highest probability.
One thing to note is that the entries in Table T1 can be scaled. For instance, many implementations store the logarithms of the probabilities instead of the probabilities themselves. This helps avoid underflow problems as a given probability becomes very small. Another thing that implementations do is scale the first probabilities of the first state in Table T1 so that they add up to 1. This rescaling is used in the trellis associated with this problem in Russell & Norvig (3rd ed.) Figure 15.5 (pg 577).
For t = 1, o1 = +u:
T1( o1, +r ) = 0.5 * 0.9 = 0.45
T1( o1, -r ) = 0.5 * 0.2 = 0.10
To match the textbook, these values can be scaled by dividing by their sum. In some implementations, this scaling factor alpha would be defined as 1 / (0.45 + 0.10) and multiplied by the initial probabilities.
T1( o1, +r ) = 0.45 / 0.55 = 0.8182
T1( o1, -r ) = 0.10 / 0.55 = 0.1818
T2( t=1, +r ) = T2( t=1, -r ) = null since no previous state would exist
The main Viterbi update can be considered as follows:
T1( state, observation at t = i ) = max (over all states) of T1( previous state, observation at t = i-1 ) * P( transitioning from prev state to current state ) * P( emission of current observation given current state )
### Calculations
For T1's t=2 (with o2 = +u and o1 = +u):
T1( o2, +r ) is the maximum of probabilities given all the possible previous states:
P( previous state +r ) = T1( t=1, +r ) * P( +r | +r ) * P( +u | +r ) = 0.8182 * 0.7 * 0.9 = 0.5155
P( previous state -r ) = T1( t=1, -r ) * P( +r | -r ) * P( +u | +r ) = 0.1818 * 0.3 * 0.9 = 0.0491
Thus, T1( t=2, +r ) = 0.5155 with T2( t=2, +r ) = +r
T1( t=2, -r ):
P( previous state +r ) = T1( t=1, +r ) * P( -r | +r ) * P( +u | -r ) = 0.8182 * 0.3 * 0.2 = 0.0491
P( previous state -r ) = T1( t=1, -r ) * P( -r | -r ) * P( +u | -r ) = 0.1818 * 0.7 * 0.2 = 0.025
Thus, T1( t=2, -r ) = 0.0491 with T2( t=2, -r ) = +r
For T1's t=3 (with o3 = -u and o2 = +u):
T1( t=3, +r ):
P( previous state +r ) = T1( t=2, +r ) * P( +r | +r ) * P( -u | +r ) = 0.5155 * 0.7 * 0.1 = 0.0361
P( previous state -r ) = T1( t=2, -r ) * P( +r | -r ) * P( -u | +r ) = 0.0491 * 0.3 * 0.1 = 1.473E-3
Thus, T1( t=3, +r ) = 0.0361 and T2( t=3, +r ) = +r
T1( t=3, -r ):
P( previous state +r ) = T1( t=2, +r ) * P( -r | +r ) * P( -u | -r ) = 0.5155 * 0.3 * 0.8 = 0.1237
P( previous state -r ) = T1( t=2, -r ) * P( -r | -r ) * P( -u | -r ) = 0.0491 * 0.7 * 0.8 = 0.0275
Thus, T1( t=3, -r ) = 0.1237 with T2( t=3, -r ) = +r
For T1's t=4 (with o4 = +u and o3 = -u):
T1( t=4, +r ):
P( previous state +r ) = T1( t=3, +r ) * P( +r | +r ) * P( +u | +r ) = 0.0361 * 0.7 * 0.9 = 0.0227
P( previous state -r ) = T1( t=3, -r ) * P( +r | -r ) * P( +u | +r ) = 0.1237 * 0.3 * 0.9 = 0.0334
Thus, T1( t=4, +r ) = 0.0334 with T2( t=4, +r ) = -r
T1( t=4, -r ):
P( previous state +r ) = T1( t=3, +r ) * P( -r | +r ) * P( +u | -r ) = 0.0361 * 0.3 * 0.2 = 2.166E-3
P( previous state -r ) = T1( t=3, -r ) * P( -r | -r ) * P( +u | -r ) = 0.1237 * 0.7 * 0.2 = 0.0173
Thus, T1( t=4, -r ) = 0.0173 with T2( t=4, -r ) = -r
For T1's t=5 (with o5 = +u and o4 = +u):
T1( t=5, +r ):
P( previous state +r ) = T1( t=4, +r ) * P( +r | +r ) * P( +u | +r ) = 0.0334 * 0.7 * 0.9 = 0.0210
P( previous state -r ) = T1( t=4, -r ) * P( +r | -r ) * P( +u | +r ) = 0.0173 * 0.3 * 0.9 = 4.671E-3
Thus, T1( t=5, +r ) + 0.0210 with T2( t=5, +r ) = +r
T1( t=5, -r ):
P( previous state +r ) = T1( t=4, +r ) * P( -r | +r ) * P( +u | -r ) = 0.0334 * 0.3 * 0.2 = 2.004E-3
P( previous state -r ) = T1( t=4, -r ) * P( -r | -r ) * P( +u | -r ) = 0.0173 * 0.7 * 0.2 = 2.422E-3
Thus, T1( t=5, -r ) = 2.422E-3 with T2( t=5, -r ) = -r
### Resulting Table T1
+r -r
------ ------
t=1 (+u) 0.8182 0.1818
t=2 (+u) 0.5155 0.0491
t=3 (-u) 0.0361 0.1237
t=4 (+u) 0.0334 0.0173
t=5 (+u) 0.0210 0.0024
### Resulting Table T2
+r -r
---- ----
t=1 (+u) null null
t=2 (+u) +r +r
t=3 (-u) +r +r
t=4 (+u) -r -r
t=5 (+u) +r -r
### Most Likely Estimate
To find the most likely estimate, one starts with the most likely state for the maximum t. This process is repeated for each t-1 until t=1.
For here, @t=5, +r is most likely
@t=4, +r
@t=3, -r
@t=2, +r
@t=1, +r
Thus, the most likely weather pattern when the umbrella observations are [+u, +u, -u, +u, +u] is [+r, +r, -r, +r, +r].