choosing the “most probable state path”

Choosing the “most probable state path”The Decoding Problem

If we had to choose just one possible state path as a basis for further inference about some sequence of interest, we might reasonably choose the most probable state path:

How can we efficiently find the most probable state path??

p*= argmax P(x, p)p

Read as: “choose the path p that maximizes the joint probability of the observed sequence and the path”

“Big O” notation – an upper bound on asymptotic growthRuntime of algorithms

Order• O(c)

• O(log n)

• O(n)

• O(n log n)

• O(n2)

• O(nc) c>1

• O(cn) c>1

• O(n!)

Name

Constant Log Linear Loglinear Quadratic Polynomial Exponential Factorial

In the context of problems related to sequences, n usually corresponds to the sequence length (which we often designate L)

“Big O” notation – an upper bound on asymptotic growthRuntime of algorithms

Polynomial problems are tractable for reasonable c, but problems with superpolynomial runtime are intractable except for surprisingly small n

O(c)

O(log n)

O(n)

O(n log n)

O(n2)

O(nc) c>1

O(cn) c>1

O(n!)

Image from http://www.cs.odu.edu/~toida/nerzic/content/function/growth.html

A naïve approach might recursively explore all possible paths

Most Probable State Path

The naïve approach ends up solving the same subproblem (and subproblems of those subproblems) numerous times!

Start

++

+

+ -

-

+ -

-+

+ -

-

+ -

-+

+

+ -

-

+ -

-+

+ -

-

+ -

01234

The Viterbi algorithm is a standard method for finding the most probable state path

The Viterbi Algorithm

This time, let’s look at a practical example of Viterbi in action before getting carried away with an attempt to formalize the equations

vk(i) = P(x1…xi , p*i = k) “The Viterbi variable for state k at position i”

“the probability of the sequence from the beginning to the symbol at position i, requiring that the most probable partial

path ending with state k at position i”


The parameters shown here are highly artificial and intended to favour state switching

_ A C G

S0.1 0.9

State “+” State “-”

A: 0.30

C: 0.25

G: 0.15

T: 0.300.5 0.4

0.5

0.6

A: 0.20

C: 0.35

G: 0.25

T: 0.20

Candidate Best paths

1

0

0

1 · 0.1· 0.3 0 · 0.5· 0.30 · 0.6· 0.3

Finding the most probable state path

0.03



_ A C G

S0.1 0.9


A: 0.30

C: 0.25

G: 0.15

T: 0.300.5 0.4

0.5

0.6

A: 0.20

C: 0.35

G: 0.25

T: 0.20


1

0

0


0.03

1 · 0.9· 0.2 0 · 0.5· 0.20 · 0.4· 0.2

0.18

S

S



_ A C G

S0.1 0.9


A: 0.30

C: 0.25

G: 0.15

T: 0.300.5 0.4

0.5

0.6

A: 0.20

C: 0.35

G: 0.25

T: 0.20


1

0

0


0.03 0.03 · 0.5· 0.25 0.18 · 0.6· 0.25

0.18

S

S

0.027

← -



_ A C G

S0.1 0.9


A: 0.30

C: 0.25

G: 0.15

T: 0.300.5 0.4

0.5

0.6

A: 0.20

C: 0.35

G: 0.25

T: 0.20


1

0

0


0.03

0.18

S

S

0.027

← -

0.03 · 0.5· 0.35 0.18 · 0.4· 0.35

0.0252

← -



_ A C G

S0.1 0.9


A: 0.30

C: 0.25

G: 0.15

T: 0.300.5 0.4

0.5

0.6

A: 0.20

C: 0.35

G: 0.25

T: 0.20


1

0

0


0.03

0.18

S

S

0.027

← -

0.0252

← -

0.0022680.0270· 0.5· 0.15 0.0252· 0.6· 0.15

← +



_ A C G

S0.1 0.9


A: 0.30

C: 0.25

G: 0.15

T: 0.300.5 0.4

0.5

0.6

A: 0.20

C: 0.35

G: 0.25

T: 0.20


1

0

0


0.03

0.18

S

S

0.027

← -

0.0252

← -

0.002268

← +

0.0270· 0.5· 0.25 0.0252· 0.4· 0.250.003375

← +

← + ← -

The Viterbi algorithm is the standard method for finding the most probable state path


Did the example illustrate how this helps?

vk(i) = P(x1…xi , p*i = k) “The Viterbi variable for state k at position i”

“the probability of the sequence from the beginning to the symbol at position i, requiring the most probable partial path

ending with state k at position i”

A recursive definition for Viterbi variablesThe Viterbi Algorithm

Again, we recursively define the Viterbi variables in terms of their own values at prior positions in the sequence…

vk(i) = P(x1…xi , p*i = k)

Note that a maximization step replaces the summation across states that we had in the forward algorithm. Termination is

again assured by the fact that all paths must begin with Start

vl(i+1) = el(xi+1 ) max (vk(i)k

Equations of this latter form are sometimes known as Bellman equations.

What if we had in our possession all of the Viterbi variables for L, the last position of the sequence?


Two problems:• We don’t yet know those “final position” Viterbi variables• This doesn’t tell us what the state path actually was...

That would be very useful indeed, since to get the probability of the observed sequence and the most probable state path

we would need only ask which state k was associated with the maximum-valued Viterbi variable at the final position

P(x,p*) =

This term disappears if ends are not modelled!

Putting it all togetherThe Viterbi Algorithm

Initialization: • vstart(0) = 1 and vk(0) = 0 for all other initial states k

Note the similarity to forward, but with the addition of a traceback

vl(i) = el(xi)

Recursion (i = 1 … L):

Termination:

ptri(l)=

P(x,p*) = pL*=

Traceback (i = L .. 1):p*

i-1 =ptr𝒊(p𝒊∗)

Reduction to Python codeThe Viterbi Algorithm

The max function, along with list comprehensions might come in handy for certain parts of your Viterbi implementation. For

instance, here’s a one-liner that fulfills the termination conditions and uses both language features:

List comprehensions perform iterative operations on lists, and result in new lists. Here, the probability and the path list are handled together as a tuple. The max function will

operate on just the first element in a tuple, so the state path “comes along for the ride”

return ( max (

[(vtable[self.sequence_length][state], possible_paths[state]) for state in self.transitions)] )

Dynamic ProgrammingAn anecdote from Richard Bellman

“I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, Where did the name, dynamic programming, come from? The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, “programming” I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying I thought, lets kill two birds with one stone. Lets take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is its impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. Its impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities”

Excerpt from Richard Bellman “Eye of the Hurricane: an autobiography”, 1984

Dynamic ProgrammingDynamic programming algorithms solve problems by solving

simpler subproblems and storing and reusing these subsolutions

• The problem must have the property of an optimal substructure. • Having an optimal substructure means that an optimal

solution can be constructed efficiently from optimal solutions of its subproblems

• The problem must have overlapping subproblems so that we benefit from the storage and reuse of earlier subproblems

Why are dynamic programming approaches so common when dealing with DNA or amino acid sequences?

What kinds of problems are solved with dynamic programming?

A substring of a string is still a string

AGCTCAATTAGGACAGCTCAATTAGGAAGCTCAATTAGGAGCTCAATTAGAGCTCAATTAAGCTCAATTAGCTCAATAGCTCAA...A

Dynamic Programming

Problems relating to DNA sequences very frequently naturally posses the optimum substructure property

Problems relating to biological sequences can often be structured in terms of “overlapping subproblems”

AGCTCAAT

Dynamic Programming

Biological sequences very often satisfy both the ideal optimal substructure and overlapping subproblem criteria and are therefore

natural candidates for dynamic programming approaches

Dynamic programming problems often have a natural representation as a trellis-like graph rather than as a tree

Dynamic Programming

By exploiting the repetitive structure of the problem, dynamic programming converted our most probable state path problem from one with naïve complexity of O(|states|L) to one with complexity O(L ∙ |states|2)

Start

1 2 3 4 5 6

*the direction of arrows shown here corresponds to the possible traceback paths, not to the main forward iteration

0

General procedure for dynamic programming

Dynamic Programming

Can you identify each step in the dynamic programming approach in our descriptions of the forward and the Viterbi algorithms?

1. Characterize the structure of an optimal solution

2. Recursively define the value of an optimal solution

3. Compute the value of an optimal solution in a bottom-up fashion

4. (Optionally) construct the optimal solution from the computed information

choosing the “most probable state path”

Documents

state state

state switching

probable state path0

possible state path

c g s0

reasonable c

probable partial path

viterbi variable