0 votes
Consider the classic Viterbi example problem of determining the weather based on whether an individual carried an umbrella. The weather on a particular day matches the previous day 70% of the time. The individual will carry an umbrella on 90% of rainy days and 20% of non-rainy days.

Given an umbrella sequence of [+u, +u, -u, +u, +u], what is the most likely explanation for the weather?
in Artificial Intelligence by AlgoMeister (568 points)

1 Answer

0 votes
### Setup

Since there is no initial distribution, it can be assigned arbitrarily:

P(+r) = P(-r) = 0.5

What are the transition probabilities?

P( +r | +r ) = 0.7
P( +r | -r ) = 0.3
P( -r | +r ) = 0.3
P( -r | -r ) = 0.7

What are the emission probabilities?

P( +u | +r ) = 0.9
P( +u | -r ) = 0.2
P( -u | +r ) = 0.1
P( -u | -r ) = 0.8

The Viterbi tables can now be initialized. Table T1 indicates the probability of a particular state occurring at a particular point in the sequence while Table T2 indicates the previous state for a given point in a sequence that led to that point with the highest probability.

One thing to note is that the entries in Table T1 can be scaled. For instance, many implementations store the logarithms of the probabilities instead of the probabilities themselves. This helps avoid underflow problems as a given probability becomes very small. Another thing that implementations do is scale the first probabilities of the first state in Table T1 so that they add up to 1. This rescaling is used in the trellis associated with this problem in Russell & Norvig (3rd ed.) Figure 15.5 (pg 577).

For t = 1, o1 = +u:
T1( o1, +r ) = 0.5 * 0.9 = 0.45
T1( o1, -r ) = 0.5 * 0.2 = 0.10

To match the textbook, these values can be scaled by dividing by their sum. In some implementations, this scaling factor alpha would be defined as 1 / (0.45 + 0.10) and multiplied by the initial probabilities.

T1( o1, +r ) = 0.45 / 0.55 = 0.8182
T1( o1, -r ) = 0.10 / 0.55 = 0.1818

T2( t=1, +r ) = T2( t=1, -r ) = null since no previous state would exist

The main Viterbi update can be considered as follows:

T1( state, observation at t = i ) = max (over all states) of T1( previous state, observation at t = i-1 ) * P( transitioning from prev state to current state ) * P( emission of current observation given current state )

### Calculations

For T1's t=2 (with o2 = +u and o1 = +u):

T1( o2, +r ) is the maximum of probabilities given all the possible previous states:
P( previous state +r ) = T1( t=1, +r ) * P( +r | +r ) * P( +u | +r ) = 0.8182 * 0.7 * 0.9 = 0.5155
P( previous state -r ) = T1( t=1, -r ) * P( +r | -r ) * P( +u | +r ) = 0.1818 * 0.3 * 0.9 = 0.0491

Thus, T1( t=2, +r ) = 0.5155 with T2( t=2, +r ) = +r

T1( t=2, -r ):
P( previous state +r ) = T1( t=1, +r ) * P( -r | +r ) * P( +u | -r ) = 0.8182 * 0.3 * 0.2 = 0.0491
P( previous state -r ) = T1( t=1, -r ) * P( -r | -r ) * P( +u | -r ) = 0.1818 * 0.7 * 0.2 = 0.025

Thus, T1( t=2, -r ) = 0.0491 with T2( t=2, -r ) = +r

For T1's t=3 (with o3 = -u and o2 = +u):

T1( t=3, +r ):
P( previous state +r ) = T1( t=2, +r ) * P( +r | +r ) * P( -u | +r ) = 0.5155 * 0.7 * 0.1 = 0.0361
P( previous state -r ) = T1( t=2, -r ) * P( +r | -r ) * P( -u | +r ) = 0.0491 * 0.3 * 0.1 = 1.473E-3

Thus, T1( t=3, +r ) = 0.0361 and T2( t=3, +r ) = +r

T1( t=3, -r ):
P( previous state +r ) = T1( t=2, +r ) * P( -r | +r ) * P( -u | -r ) = 0.5155 * 0.3 * 0.8 = 0.1237
P( previous state -r ) = T1( t=2, -r ) * P( -r | -r ) * P( -u | -r ) = 0.0491 * 0.7 * 0.8 = 0.0275

Thus, T1( t=3, -r ) = 0.1237 with T2( t=3, -r ) = +r

For T1's t=4 (with o4 = +u and o3 = -u):

T1( t=4, +r ):
P( previous state +r ) = T1( t=3, +r ) * P( +r | +r ) * P( +u | +r ) = 0.0361 * 0.7 * 0.9 = 0.0227
P( previous state -r ) = T1( t=3, -r ) * P( +r | -r ) * P( +u | +r ) = 0.1237 * 0.3 * 0.9 = 0.0334

Thus, T1( t=4, +r ) = 0.0334 with T2( t=4, +r ) = -r

T1( t=4, -r ):
P( previous state +r ) = T1( t=3, +r ) * P( -r | +r ) * P( +u | -r ) = 0.0361 * 0.3 * 0.2 = 2.166E-3
P( previous state -r ) = T1( t=3, -r ) * P( -r | -r ) * P( +u | -r ) = 0.1237 * 0.7 * 0.2 = 0.0173

Thus, T1( t=4, -r ) = 0.0173 with T2( t=4, -r ) = -r

For T1's t=5 (with o5 = +u and o4 = +u):

T1( t=5, +r ):
P( previous state +r ) = T1( t=4, +r ) * P( +r | +r ) * P( +u | +r ) = 0.0334 * 0.7 * 0.9 = 0.0210
P( previous state -r ) = T1( t=4, -r ) * P( +r | -r ) * P( +u | +r ) = 0.0173 * 0.3 * 0.9 = 4.671E-3

Thus, T1( t=5, +r ) + 0.0210 with T2( t=5, +r ) = +r

T1( t=5, -r ):
P( previous state +r ) = T1( t=4, +r ) * P( -r | +r ) * P( +u | -r ) = 0.0334 * 0.3 * 0.2 = 2.004E-3
P( previous state -r ) = T1( t=4, -r ) * P( -r | -r ) * P( +u | -r ) = 0.0173 * 0.7 * 0.2 = 2.422E-3

Thus, T1( t=5, -r ) = 2.422E-3 with T2( t=5, -r ) = -r

### Resulting Table T1

            +r      -r
            ------  ------
t=1 (+u)    0.8182  0.1818
t=2 (+u)    0.5155  0.0491
t=3 (-u)    0.0361  0.1237
t=4 (+u)    0.0334  0.0173
t=5 (+u)    0.0210  0.0024

### Resulting Table T2

            +r      -r
            ----    ----
t=1 (+u)    null    null
t=2 (+u)    +r      +r
t=3 (-u)    +r      +r
t=4 (+u)    -r      -r
t=5 (+u)    +r      -r

### Most Likely Estimate

To find the most likely estimate, one starts with the most likely state for the maximum t. This process is repeated for each t-1 until t=1.

For here, @t=5, +r is most likely
@t=4, +r
@t=3, -r
@t=2, +r
@t=1, +r

Thus, the most likely weather pattern when the umbrella observations are [+u, +u, -u, +u, +u] is [+r, +r, -r, +r, +r].
by AlgoMeister (568 points)

Related questions

0 votes
2 answers
asked Jul 31, 2023 in Math Basics by Amrinder Arora AlgoMeister (1.6k points)
0 votes
1 answer
asked May 2, 2021 in HMM by Amrinder Arora AlgoMeister (1.6k points)
0 votes
2 answers
The Book: Analysis and Design of Algorithms | Presentations on Slideshare | Lecture Notes, etc
...