advanced dynamic programming in...

156
Advanced Dynamic Programming in CL: Theory, Algorithms, and Applications Liang Huang University of Pennsylvania (S, 0, n) w 0 w 1 ... w n-1

Upload: others

Post on 24-Jun-2020

31 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Advanced Dynamic Programming in CL:

Theory, Algorithms, and Applications

Liang HuangUniversity of Pennsylvania

(S, 0, n)

w0 w1 ... wn-1

Page 2: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A Little Bit of History...

2

Page 3: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A Little Bit of History...• Who invented Dynamic Programming?

and when was it invented?

2

Page 4: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A Little Bit of History...• Who invented Dynamic Programming?

and when was it invented?

• R. Bellman (1940s-50s)

• A. Viterbi (1967)

• E. Dijkstra (1959)

• Hart, Nilsson, and Raphael (1968)

• Dijkstra => A* Algorithm

• D. Knuth (1977)

• Dijkstra on Grammar (Hypergraph)

2

Andrew Viterbi

Richard Bellman

Page 5: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A Little Bit of History...• Who invented Dynamic Programming?

and when was it invented?

• R. Bellman (1940s-50s)

• A. Viterbi (1967)

• E. Dijkstra (1959)

• Hart, Nilsson, and Raphael (1968)

• Dijkstra => A* Algorithm

• D. Knuth (1977)

• Dijkstra on Grammar (Hypergraph)

2

Andrew Viterbi

Richard BellmanA. Turing

Page 6: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dynamic Programming• Dynamic Programming is everywhere in NLP

• Viterbi Algorithm for Hidden Markov Models

• CKY Algorithm for Parsing and Machine Translation

• Forward-Backward and Inside-Outside Algorithms

• Also everywhere in AI/ML

• Reinforcement Learning, Planning (POMDP)

• AI Search: Uniform-cost, A*, etc.

• This tutorial: a unified theoretical view of DP

• Focusing on Optimization Problems3

Page 7: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Review: DP Basics• DP = Divide-and-Conquer + Two Principles:

• [required] Optimal Subproblem Property

• [recommended] Sharing of Common Subproblems

• Structure of the Search Space

• Incremental

• Graph

• Knapsack, Edit Dist., Sequence Alignment

• Branching

• Hypergraph

• Matrix-Chain, Polygon Triangulation, Optimal BST4

Page 8: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Two Dimensional Survey

5

topological(acyclic)

best-first(superior)

graphs with semirings

hypergraphs with weight functions

Viterbi Dijkstra

Generalized Viterbi Knuth

traversing order

sear

ch s

pace

Page 9: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Graphs in NLP

6

part-of-speech tagging

lattice in speech

Page 10: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Semirings on Graphs• in a weighted graph, we need two operators:

• extension (multiplicative) and summary (additive)

• the weight of a path is the product of edge weights

• the weight of a vertex is the summary of path weights

7

s

ue1 v

t

...

...

e2e3

d(!1) =!

ei!!1

w(ei) = w(e1) ! w(e2) ! w(e3)

d(t) =!

!i

w(!i)

= w(p1) ! w(p2) ! · · ·

Page 11: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Semiring Definitions

8

A monoid is a triple (A,!, 1) where

1. ! is a closed associative binary operator on the set A,

2. 1 is the identity element for !, i.e., for all a " A, a ! 1 = 1 ! a = a.

A monoid is commutative if ! is commutative.

Page 12: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Semiring Definitions

8

A monoid is a triple (A,!, 1) where

1. ! is a closed associative binary operator on the set A,

2. 1 is the identity element for !, i.e., for all a " A, a ! 1 = 1 ! a = a.

A monoid is commutative if ! is commutative. ([0, 1], +, 0)([0, 1], ×, 1)

([0, 1], max, 0)

Page 13: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Semiring Definitions

8

A monoid is a triple (A,!, 1) where

1. ! is a closed associative binary operator on the set A,

2. 1 is the identity element for !, i.e., for all a " A, a ! 1 = 1 ! a = a.

A monoid is commutative if ! is commutative.

A semiring is a 5-tuple R = (A,!,", 0, 1) such that

1. (A,!, 0) is a commutative monoid.

2. (A,", 1) is a monoid.

3. " distributes over !: for all a, b, c in A,

(a ! b) " c = (a " c) ! (b " c),

c " (a ! b) = (c " a) ! (c " b).

4. 0 is an annihilator for ": for all a in A, 0 " a = a " 0 = 0.

([0, 1], +, 0)([0, 1], ×, 1)

([0, 1], max, 0)

Page 14: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Semiring Definitions

8

A monoid is a triple (A,!, 1) where

1. ! is a closed associative binary operator on the set A,

2. 1 is the identity element for !, i.e., for all a " A, a ! 1 = 1 ! a = a.

A monoid is commutative if ! is commutative.

A semiring is a 5-tuple R = (A,!,", 0, 1) such that

1. (A,!, 0) is a commutative monoid.

2. (A,", 1) is a monoid.

3. " distributes over !: for all a, b, c in A,

(a ! b) " c = (a " c) ! (b " c),

c " (a ! b) = (c " a) ! (c " b).

4. 0 is an annihilator for ": for all a in A, 0 " a = a " 0 = 0.

([0, 1], +, 0)([0, 1], ×, 1)

([0, 1], max, 0)

([0, 1], max, ×, 0, 1)([0, 1], +, ×, 0, 1)

Page 15: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Semiring Definitions

8

A monoid is a triple (A,!, 1) where

1. ! is a closed associative binary operator on the set A,

2. 1 is the identity element for !, i.e., for all a " A, a ! 1 = 1 ! a = a.

A monoid is commutative if ! is commutative.

A semiring is a 5-tuple R = (A,!,", 0, 1) such that

1. (A,!, 0) is a commutative monoid.

2. (A,", 1) is a monoid.

3. " distributes over !: for all a, b, c in A,

(a ! b) " c = (a " c) ! (b " c),

c " (a ! b) = (c " a) ! (c " b).

4. 0 is an annihilator for ": for all a in A, 0 " a = a " 0 = 0.

([0, 1], +, 0)([0, 1], ×, 1)

([0, 1], max, 0)

([0, 1], max, ×, 0, 1)([0, 1], +, ×, 0, 1)

Page 16: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples

9

Semiring Set ! " 0 1 intuition/applicationBoolean {0, 1} # $ 0 1 logical deduction, recognition

Viterbi [0, 1] max % 0 1 prob. of the best derivation

Inside R+ & {+'} + % 0 1 prob. of a string

Real R & {+'} min + +' 0 shortest-distance

Tropical R+ & {+'} min + +' 0 with non-negative weights

Counting N + % 0 1 number of paths

Page 17: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Ordering

10

Page 18: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Ordering

• idempotent

10

A semiring (A,!,", 0, 1) is idempotent if for all a in A, a ! a = a.

Page 19: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Ordering

• idempotent

• comparison

• examples: boolean, viterbi, tropical, real, ...

10

A semiring (A,!,", 0, 1) is idempotent if for all a in A, a ! a = a.

(a ! b) " (a # b = a) defines a partial ordering.

({0, 1},!,", 0, 1) (R+ # {+$},min,+,+$, 0)

([0, 1],max,%, 0, 1) (R # {+$},min,+,+$, 0)

Page 20: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Ordering

• idempotent

• comparison

• examples: boolean, viterbi, tropical, real, ...

• total-order for optimization problems

• examples: all of the above10

A semiring (A,!,", 0, 1) is idempotent if for all a in A, a ! a = a.

(a ! b) " (a # b = a) defines a partial ordering.

A semiring is totally-ordered if ! defines a total ordering.

({0, 1},!,", 0, 1) (R+ # {+$},min,+,+$, 0)

([0, 1],max,%, 0, 1) (R # {+$},min,+,+$, 0)

Page 21: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity

11

Page 22: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

11

Page 23: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

11

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is monotonic if for all a, b, c $ A

(a # b) % (a " c # b " c) (a # b) % (c " a # c " b)

Page 24: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

• optimal substructure in dynamic programming

11

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is monotonic if for all a, b, c $ A

(a # b) % (a " c # b " c) (a # b) % (c " a # c " b)

Page 25: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

• optimal substructure in dynamic programming

11

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is monotonic if for all a, b, c $ A

(a # b) % (a " c # b " c) (a # b) % (c " a # c " b)

B: b C: c

A: b ⊗ c

Page 26: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

• optimal substructure in dynamic programming

11

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is monotonic if for all a, b, c $ A

(a # b) % (a " c # b " c) (a # b) % (c " a # c " b)

B: b C: c

A: b ⊗ c

B: b’ ≤ b C: c

A: b’ ⊗ c≤ b ⊗ c

Page 27: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

• optimal substructure in dynamic programming

• idempotent => monotone (from distributivity)

11

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is monotonic if for all a, b, c $ A

(a # b) % (a " c # b " c) (a # b) % (c " a # c " b)

B: b C: c

A: b ⊗ c

B: b’ ≤ b C: c

A: b’ ⊗ c≤ b ⊗ c

Page 28: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

• optimal substructure in dynamic programming

• idempotent => monotone (from distributivity)

• (a+b)⊗c = (a⊗c)+(b⊗c); if a≤b, (a⊗c)=(a⊗c)+(b⊗c)

11

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is monotonic if for all a, b, c $ A

(a # b) % (a " c # b " c) (a # b) % (c " a # c " b)

B: b C: c

A: b ⊗ c

B: b’ ≤ b C: c

A: b’ ⊗ c≤ b ⊗ c

Page 29: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Monotonicity• monotonicity

• optimal substructure in dynamic programming

• idempotent => monotone (from distributivity)

• (a+b)⊗c = (a⊗c)+(b⊗c); if a≤b, (a⊗c)=(a⊗c)+(b⊗c)

• by def. of comparison, a⊗c ≤ b⊗c11

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is monotonic if for all a, b, c $ A

(a # b) % (a " c # b " c) (a # b) % (c " a # c " b)

B: b C: c

A: b ⊗ c

B: b’ ≤ b C: c

A: b’ ⊗ c≤ b ⊗ c

Page 30: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

DP on Graphs

• optimization problems on graphs=> generic shortest-path problem

• weighted directed graph G=(V, E) with a function w that assigns each edge a weight from a semiring

• compute the best weight of the target vertex t

• generic update along edge (u, v)

• how to avoid cyclic updates?

• only update when d(u) is fixed

12

vu w(u, v) d(v) ! = d(u) " w(u, v)

d(v) ! d(v) " (d(u) # w(u, v))

Page 31: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Two Dimensional Survey

13

topological(acyclic)

best-first(superior)

graphs with semirings

(e.g., FSMs)

hypergraphs with weight functions

(e.g., CFGs)

Viterbi Dijkstra

Generalized Viterbi Knuth

traversing order

sear

ch s

pace

Page 32: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi Algorithm for DAGs1. topological sort

2. visit each vertex v in sorted order and do updates

• for each incoming edge (u, v) in E

• use d(u) to update d(v):

• key observation: d(u) is fixed to optimal at this time

• time complexity: O( V + E )14

v

u w(u, v)

d(v) ! = d(u) " w(u, v)

Page 33: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Variant 1: forward-update1. topological sort

2. visit each vertex v in sorted order and do updates

• for each outgoing edge (v, u) in E

• use d(v) to update d(u):

• key observation: d(v) is fixed to optimal at this time

• time complexity: O( V + E )15

d(u) ! = d(v) " w(v, u)

v

uw(v, u)

Page 34: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples

16

Page 35: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples• [Number of Paths in a DAG]

16

Page 36: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples• [Number of Paths in a DAG]

• just use the counting semiring (N, +, ×, 0, 1)

• note: this is not an optimization problem!

16

Page 37: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples• [Number of Paths in a DAG]

• just use the counting semiring (N, +, ×, 0, 1)

• note: this is not an optimization problem!

• [Longest Path in a DAG]

16

Page 38: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples• [Number of Paths in a DAG]

• just use the counting semiring (N, +, ×, 0, 1)

• note: this is not an optimization problem!

• [Longest Path in a DAG]

• just use the semiring

16

(R ! {"#},max,+,"#, 0)

Page 39: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples• [Number of Paths in a DAG]

• just use the counting semiring (N, +, ×, 0, 1)

• note: this is not an optimization problem!

• [Longest Path in a DAG]

• just use the semiring

• [Part-of-Speech Tagging with a Hidden Markov Model]

16

(R ! {"#},max,+,"#, 0)

Page 40: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Examples• [Number of Paths in a DAG]

• just use the counting semiring (N, +, ×, 0, 1)

• note: this is not an optimization problem!

• [Longest Path in a DAG]

• just use the semiring

• [Part-of-Speech Tagging with a Hidden Markov Model]

16

(R ! {"#},max,+,"#, 0)

Page 41: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: Speech Alignment

17

time complexity: O(n2)

also used in:edit distance

biological sequence alignment

Page 42: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: Word Alignment

18

• key difference

• reorderings in translation!

• sequence/speech alignment is always monotonic

• complexity under HMM

• word alignment is O(n3)

• for every (i, j)

• enumerate all (i-1, k)

• sequence alignment O(n2)

I love you .

Je

t’

aime

.

ii-1

j

k

Page 43: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Chinese Word Segmentation

19

下 雨 天 地 面 积 水 xia yu tian di mian ji shui

Page 44: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Chinese Word Segmentation

19

民主min-zhu

people-dominate

“democracy”

下 雨 天 地 面 积 水 xia yu tian di mian ji shui

Page 45: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Chinese Word Segmentation

19

民主min-zhu

people-dominate

“democracy”

江泽民 主席jiang-ze-min zhu-xi

... - ... - people dominate-podium

“President Jiang Zemin”

下 雨 天 地 面 积 水 xia yu tian di mian ji shui

Page 46: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Chinese Word Segmentation

19

民主min-zhu

people-dominate

“democracy”

江泽民 主席jiang-ze-min zhu-xi

... - ... - people dominate-podium

“President Jiang Zemin”

this was 5 years ago.

now Google isgood at segmentation!

下 雨 天 地 面 积 水 xia yu tian di mian ji shui

Page 47: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Chinese Word Segmentation

19

民主min-zhu

people-dominate

“democracy”

江泽民 主席jiang-ze-min zhu-xi

... - ... - people dominate-podium

“President Jiang Zemin”

this was 5 years ago.

now Google isgood at segmentation!

下 雨 天 地 面 积 水 xia yu tian di mian ji shui

Page 48: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Chinese Word Segmentation

19

民主min-zhu

people-dominate

“democracy”

江泽民 主席jiang-ze-min zhu-xi

... - ... - people dominate-podium

“President Jiang Zemin”

this was 5 years ago.

now Google isgood at segmentation!

下 雨 天 地 面 积 水 xia yu tian di mian ji shui

graph search

Page 49: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Phrase-based Decoding

20

yu Shalong juxing le huitan

与 沙龙 举行 了 会谈

held a talk with Sharon

yu Shalong juxing le huitanwith Sharon held a talk

talksSharon heldwith

Page 50: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Phrase-based Decoding

20

yu Shalong juxing le huitan

与 沙龙 举行 了 会谈

held a talk with Sharon

yu Shalong juxing le huitanwith Sharon held a talk

talksSharon heldwith

_ _ _ _ _

Page 51: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Phrase-based Decoding

20

yu Shalong juxing le huitan

与 沙龙 举行 了 会谈

held a talk with Sharon

yu Shalong juxing le huitanwith Sharon held a talk

talksSharon heldwith

_ _ _ _ _ _ _●●●

Page 52: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Phrase-based Decoding

20

yu Shalong juxing le huitan

与 沙龙 举行 了 会谈

held a talk with Sharon

yu Shalong juxing le huitanwith Sharon held a talk

talksSharon heldwith

_ _ _ _ _ _ _●●● ●●●●●

Page 53: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Phrase-based Decoding

21

yu Shalong juxing le huitan

与 沙龙 举行 了 会谈

held a talk with Sharon

yu Shalong juxing le huitanwith Sharon held a talk

talksSharon heldwith

_ _ _ _ _ _ _●●● ●●●●●

Page 54: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Phrase-based Decoding

21

yu Shalong juxing le huitan

与 沙龙 举行 了 会谈

held a talk with Sharon

yu Shalong juxing le huitanwith Sharon held a talk

talksSharon heldwith

_ _ _ _ _ _ _●●● ●●●●●

●_●●●

Page 55: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Phrase-based Decoding

22

yu Shalong juxing le huitan

与 沙龙 举行 了 会谈

held a talk with Sharon

_ _●●●held a talk held a talk with Sharon

_ _ _ _ _

...

...

...

●●●●●

...

_ _●●●held a talk

source-side: coverage vector

target-side: grow hypotheses strictly left-to-right

space: O(2n), time: O(2n n2) -- cf. traveling salesman problem

Page 56: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Traveling Salesman Problem & MT

• a classical NP-hard problem

• goal: visit each city once and only once

• exponential-time dynamic programming

• state: cities visited so far (bit-vector)

• search in this O(2n) transformed graph

• MT: each city is a source-language word

• restrictions in reordering can reduce complexity => distortion limit

• => syntax-based MT

23(Held and Karp, 1962; Knight, 1999)

Page 57: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Traveling Salesman Problem & MT

• a classical NP-hard problem

• goal: visit each city once and only once

• exponential-time dynamic programming

• state: cities visited so far (bit-vector)

• search in this O(2n) transformed graph

• MT: each city is a source-language word

• restrictions in reordering can reduce complexity => distortion limit

• => syntax-based MT

23(Held and Karp, 1962; Knight, 1999)

Page 58: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Adding a Bigram Model• “refined” graph: annotated with language model words

• still dynamic programming, just larger search space

24

_ _●●● ... talk_ _ _ _ _ ●●●●● ... Sharon

_ _●●● ... talks

_ _●●● ... meeting ●●●●● ... Shalong

Page 59: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Adding a Bigram Model• “refined” graph: annotated with language model words

• still dynamic programming, just larger search space

24

_ _●●● ... talk_ _ _ _ _ ●●●●● ... Sharon

_ _●●● ... talks

_ _●●● ... meeting ●●●●● ... Shalong

with Sharon

Page 60: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Adding a Bigram Model• “refined” graph: annotated with language model words

• still dynamic programming, just larger search space

24

_ _●●● ... talk_ _ _ _ _ ●●●●● ... Sharon

_ _●●● ... talks

_ _●●● ... meeting ●●●●● ... Shalong

with Sharon

bigram

Page 61: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Huang and Chiang Forest Rescoring

Adding a Bigram Model• “refined” graph: annotated with language model words

• still dynamic programming, just larger search space

24

_ _●●● ... talk_ _ _ _ _ ●●●●● ... Sharon

_ _●●● ... talks

_ _●●● ... meeting ●●●●● ... Shalong

with Sharon

bigram

space: O(2n), time: O(2n n2) => space: O(2n Vm-1), time: O(2n Vm-1 n2)

for m-gram language models

Page 62: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Two Dimensional Survey

25

topological(acyclic)

best-first(superior)

graphs with semirings

(e.g., FSMs)

hypergraphs with weight functions

(e.g., CFGs)

Viterbi Dijkstra

Generalized Viterbi Knuth

traversing order

sear

ch s

pace

Page 63: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm

26

d(u) d(u) ⊗ w(e)w(e)

Page 64: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm• Dijkstra does not require acyclicity

• instead of topological order, we use best-first order

• but this requires superiority of the semiring

• intuition: combination always gets worse

26

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is superior if for all a, b $ A

a # a " b, b # a " b.

d(u) d(u) ⊗ w(e)w(e)

Page 65: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm• Dijkstra does not require acyclicity

• instead of topological order, we use best-first order

• but this requires superiority of the semiring

• intuition: combination always gets worse

• contrast: monotonicity: combination preserves order

26

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is superior if for all a, b $ A

a # a " b, b # a " b.

d(u) d(u) ⊗ w(e)w(e)

Page 66: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm• Dijkstra does not require acyclicity

• instead of topological order, we use best-first order

• but this requires superiority of the semiring

• intuition: combination always gets worse

• contrast: monotonicity: combination preserves order

26

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is superior if for all a, b $ A

a # a " b, b # a " b.

d(u) d(u) ⊗ w(e)w(e)

({0, 1},!,", 0, 1)([0, 1],max,#, 0, 1)

(R+ $ {+%},min,+,+%, 0)(R $ {+%},min,+,+%, 0)

Page 67: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm• Dijkstra does not require acyclicity

• instead of topological order, we use best-first order

• but this requires superiority of the semiring

• intuition: combination always gets worse

• contrast: monotonicity: combination preserves order

26

Let K = (A,!,", 0, 1) be a semiring, and # a partial ordering over A.We say K is superior if for all a, b $ A

a # a " b, b # a " b.

d(u) d(u) ⊗ w(e)w(e)

({0, 1},!,", 0, 1)([0, 1],max,#, 0, 1)

(R+ $ {+%},min,+,+%, 0)(R $ {+%},min,+,+%, 0)

Page 68: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

27

S V - S

s ...

d(u) ! = d(v) " w(v, u)

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

v

Page 69: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

27

S V - S

vs ...

d(u) ! = d(v) " w(v, u)

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

Page 70: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

27

uw(v, u)

S V - S

vs ...

d(u) ! = d(v) " w(v, u)

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

Page 71: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi vs. Dijkstra

• structural vs. algebraic constraints

• Dijkstra only applicable to optimization problems

28

monotonic optimization problems

Page 72: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi vs. Dijkstra

• structural vs. algebraic constraints

• Dijkstra only applicable to optimization problems

28

monotonic optimization problems

acyclic: Viterbi

Page 73: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi vs. Dijkstra

• structural vs. algebraic constraints

• Dijkstra only applicable to optimization problems

28

monotonic optimization problems

acyclic: Viterbi

superior: Dijkstra

Page 74: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi vs. Dijkstra

• structural vs. algebraic constraints

• Dijkstra only applicable to optimization problems

28

monotonic optimization problems

acyclic: Viterbi

superior: Dijkstra

many NLP

problems

Page 75: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi vs. Dijkstra

• structural vs. algebraic constraints

• Dijkstra only applicable to optimization problems

28

monotonic optimization problems

acyclic: Viterbi

superior: Dijkstra

many NLP

problems

forward-backward(Inside semiring)

Page 76: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi vs. Dijkstra

• structural vs. algebraic constraints

• Dijkstra only applicable to optimization problems

28

monotonic optimization problems

acyclic: Viterbi

superior: Dijkstra

many NLP

problems

forward-backward(Inside semiring) non-probabilistic

models

Page 77: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi vs. Dijkstra

• structural vs. algebraic constraints

• Dijkstra only applicable to optimization problems

28

monotonic optimization problems

acyclic: Viterbi

superior: Dijkstra

many NLP

problems

forward-backward(Inside semiring) non-probabilistic

models

cyclic FSMs/grammars

Page 78: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

What if both fail?

29

monotonic optimization problems

acyclic: Viterbi

superior: Dijkstra

many NLP

problems

generalized Bellman-Ford(CLR, 1990; Mohri, 2002)

or, first do strongly-connected components (SCC)which gives a DAG; use Viterbi globally on this SCC-DAG;

use Bellman-Ford locally within each SCC

Page 79: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

What if both work?

30

monotonic optimization problems

acyclic: Viterbi

superior: Dijkstra

many NLP

problems

full Dijkstra is slower than ViterbiO((V + E) lgV) vs. O(V + E)

but it can finish as early as the target vertex is poppeda (V + E) lgV vs. V + E

Q: how to (magically) reduce a?

Page 80: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A* Search: Intuition

• Dijkstra is “blind” about how far the target is

• may get “trapped” by obstacles

• can we be more intelligent about the future?

• idea: prioritize by s-v distance + v-t estimate

31

s

v

ut

Page 81: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A* Search: Intuition

• Dijkstra is “blind” about how far the target is

• may get “trapped” by obstacles

• can we be more intelligent about the future?

• idea: prioritize by s-v distance + v-t estimate

31

s

v

ut

Page 82: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A* Search: Intuition

• Dijkstra is “blind” about how far the target is

• may get “trapped” by obstacles

• can we be more intelligent about the future?

• idea: prioritize by s-v distance + v-t estimate

31

s

v

ut

Page 83: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

A* Heuristic

• h(v): the distance from v to target t

• ĥ(v) must be an optimistic estimate of h(v): ĥ(v)≤ h(v)

• Dijkstra is a special case where ĥ(v) = ī (0 for dist.)

• now, prioritize the queue by d(v) ⊗ ĥ(v)

• can stop when target gets popped -- why?

• optimal subpaths should pop earlier than non-optimal

• d(v) ⊗ ĥ(v) ≤ d(v) ⊗ h(v) ≤ d(t) ≤ non-optimal paths of t32

s v t

d(v) h(v)

ĥ(v)

Page 84: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

How to design a heuristic?• more of an art than science

• basic idea: projection into coarser space

• cluster: w’(U, V) = min { w(u, v) | u ∈ U, v ∈ V }

• exact cost in coarser graph is estimate of finer graph

33 (Raphael, 2001)

Page 85: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

How to design a heuristic?• more of an art than science

• basic idea: projection into coarser space

• cluster: w’(U, V) = min { w(u, v) | u ∈ U, v ∈ V }

• exact cost in coarser graph is estimate of finer graph

33

U V U V

(Raphael, 2001)

Page 86: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi or A*?• A* intuition: d(t) ⊗ ĥ(t) ranks higher among d(v) ⊗ ĥ(v)

• can finish early if lucky

• actually, d(t) ⊗ ĥ(t) = d(t) ⊗ h(t) = d(t) ⊗ ī = d(t)

• with the price of maintaining priority queue - O(log V)

• Q: how early? worth the price?

• if the rank is r, then A* is better when r/V log V < 1

34Dijkstra

d(v) pool

d(t)A*

d(v) ⊗ ĥ(v) pool

d(t) r

1

V

Page 87: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi or A*?• A* intuition: d(t) ⊗ ĥ(t) ranks higher among d(v) ⊗ ĥ(v)

• can finish early if lucky

• actually, d(t) ⊗ ĥ(t) = d(t) ⊗ h(t) = d(t) ⊗ ī = d(t)

• with the price of maintaining priority queue - O(log V)

• Q: how early? worth the price?

• if the rank is r, then A* is better when r/V log V < 1

34

r < V / log V

Dijkstra

d(v) pool

d(t)A*

d(v) ⊗ ĥ(v) pool

d(t) r

1

V

Page 88: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Two Dimensional Survey

35

topological(acyclic)

best-first(superior)

graphs with semirings

(e.g., FSMs)

hypergraphs with weight functions

(e.g., CFGs)

Viterbi Dijkstra

Generalized Viterbi Knuth

traversing order

sear

ch s

pace

Page 89: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Two Dimensional Survey

35

topological(acyclic)

best-first(superior)

graphs with semirings

(e.g., FSMs)

hypergraphs with weight functions

(e.g., CFGs)

Viterbi Dijkstra

Generalized Viterbi Knuth

traversing order

sear

ch s

pace

Page 90: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Background: CFG and Parsing

36

(S, 0, n)

w0 w1 ... wn-1

Page 91: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Background: CFG and Parsing

36

(S, 0, n)

w0 w1 ... wn-1

Page 92: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Background: CFG and Parsing

37

(S, 0, n)

w0 w1 ... wn-1

Page 93: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Background: CFG and Parsing

37

(S, 0, n)

w0 w1 ... wn-1

Page 94: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

(Directed) Hypergraphs• a generalization of graphs

• edge => hyperedge: several vertices to one vertex

• e = (T(e), h(e), fe). arity |e| = |T(e)|

• a totally-ordered weight set R

• we borrow the ⊕ operator to be the comparison

• weight function fe : R|e| to R

• generalizes the ⊗ operator in semirings

38

vu1

u2

fe

tails

headd(v) ! = fe(d(u1), d(u2))

simple case: fe(a, b) = a ⊗ b ⊗ w(e)

Yi,j e

Zj,kXi,k

Page 95: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Hypergraphs and Deduction

39

(Nederhof, 2003)

: b

v

u1 u2

fe

: a

: a × b × Pr(A → B C)

(A, i, j)

(C, k, j)(B, i, k) (B, i, k) (C, k, j)

(A, i, j)A→B C

Page 96: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Hypergraphs and Deduction

39

(Nederhof, 2003)

: b

v

u1 u2

fe

: a

: a × b × Pr(A → B C)

(A, i, j)

(C, k, j)(B, i, k) (B, i, k) (C, k, j)

(A, i, j)A→B C

v

u1 u2tails

head

fe

: a

: fe (a,b) v

u1 u2

fe

: a : b

: fe (a,b)

antecedents

consequent

: b

Page 97: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Related Formalisms

40

v

u1 u2

e

v

u1 u2

e AND-node

OR-node

OR-nodes

Page 98: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Packed Forests• a compact representation of many parses

• by sharing common sub-derivations

• polynomial-space encoding of exponentially large set

41

(Klein and Manning, 2001; Huang and Chiang, 2005)

0 I 1 saw 2 him 3 with 4 a 5 mirror 6

Page 99: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Packed Forests• a compact representation of many parses

• by sharing common sub-derivations

• polynomial-space encoding of exponentially large set

41

(Klein and Manning, 2001; Huang and Chiang, 2005)

0 I 1 saw 2 him 3 with 4 a 5 mirror 6

nodes hyperedges

a hypergraph

Page 100: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Weight Functions and Semirings

42

v

u1

u2

tails head

uk

fe

...

fe(a1, ..., ak)

Page 101: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Weight Functions and Semirings

42

v

u1

u2

tails head

uk

fe

...

fe(a1, ..., ak) = a1 ⊗ ... ⊗ ak ⊗ w(e)special case

Page 102: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Weight Functions and Semirings

42

d(u) d(u) ⊗ w(e)w(e)

d(u) fe(d(u))fe

fe(a) = a ⊗ w(e)

v

u1

u2

tails head

uk

fe

...

fe(a1, ..., ak) = a1 ⊗ ... ⊗ ak ⊗ w(e)special case

Page 103: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Weight Functions and Semirings

42

d(u) d(u) ⊗ w(e)w(e)

d(u) fe(d(u))fe

fe(a) = a ⊗ w(e)

v

u1

u2

tails head

uk

fe

...

fe(a1, ..., ak)

semiring-composed

= a1 ⊗ ... ⊗ ak ⊗ w(e)special case

Page 104: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Weight Functions and Semirings

42

d(u) d(u) ⊗ w(e)w(e)

d(u) fe(d(u))fe

fe(a) = a ⊗ w(e)

v

u1

u2

tails head

uk

fe

...

fe(a1, ..., ak)

semiring-composed

can also extend monotonicity and superiority to general weight functions

= a1 ⊗ ... ⊗ ak ⊗ w(e)special case

Page 105: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Generalizing Semiring Properties• monotonicity

• semiring: a ≤ b => a x c ≤ b x c

• for all weight function f, for all a1... ak, for all i, if a’i ≤ ai then f(a1... a’i ... ak) ≤ f(a1... ai ... ak)

• superiority

• semiring: a ≤ a x b, b ≤ a x b

• for all f, for all a1... ak, for all i, ai ≤ f(a1, ..., ak)

• acyclicity

• degenerate a hypergraph back into a graph43

Page 106: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Two Dimensional Survey

44

topological(acyclic)

best-first(superior)

graphs with semirings

(e.g., FSMs)

hypergraphs with weight functions

(e.g., CFGs)

Viterbi Dijkstra

Generalized Viterbi Knuth

traversing order

sear

ch s

pace

Page 107: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi Algorithm for DAGs1. topological sort

2. visit each vertex v in sorted order and do updates

• for each incoming edge (u, v) in E

• use d(u) to update d(v):

• key observation: d(u) is fixed to optimal at this time

• time complexity: O( V + E )45

v

u w(u, v)

d(v) ! = d(u) " w(u, v)

Page 108: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi Algorithm for DAHs1. topological sort

2. visit each vertex v in sorted order and do updates

• for each incoming hyperedge e = ((u1, .., u|e|), v, fe)

• use d(ui)’s to update d(v)

• key observation: d(ui)’s are fixed to optimal at this time

• time complexity: O( V + E ) (assuming constant arity)46

vu1

u2

fed(v) ! = fe(d(u1), · · · , d(u|e|))

Page 109: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: CKY Parsing• parsing with CFGs in Chomsky Normal Form (CNF)

• typical instance of the generalized Viterbi for DAHs

• many variants of CKY ~ various topological ordering

47

O(n3|P|)

(S, 0, n) (S, 0, n)

Page 110: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: CKY Parsing• parsing with CFGs in Chomsky Normal Form (CNF)

• typical instance of the generalized Viterbi for DAHs

• many variants of CKY ~ various topological ordering

47

O(n3|P|)

bottom-up

(S, 0, n) (S, 0, n)

Page 111: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: CKY Parsing• parsing with CFGs in Chomsky Normal Form (CNF)

• typical instance of the generalized Viterbi for DAHs

• many variants of CKY ~ various topological ordering

47

O(n3|P|)

bottom-up left-to-right

(S, 0, n) (S, 0, n)

Page 112: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: CKY Parsing• parsing with CFGs in Chomsky Normal Form (CNF)

• typical instance of the generalized Viterbi for DAHs

• many variants of CKY ~ various topological ordering

48

O(n3|P|)

bottom-up left-to-right

(S, 0, n) (S, 0, n) (S, 0, n)

Page 113: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: CKY Parsing• parsing with CFGs in Chomsky Normal Form (CNF)

• typical instance of the generalized Viterbi for DAHs

• many variants of CKY ~ various topological ordering

48

O(n3|P|)

bottom-up left-to-right right-to-left

(S, 0, n) (S, 0, n) (S, 0, n)

Page 114: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: Syntax-based MT

49

• synchronous context-free grammars (SCFGs)

• context-free grammar in two dimensions

• generating pairs of strings/trees simultaneously

• co-indexed nonterminal further rewritten as a unit

VP

PP

yu Shalong

VP

juxing le huitan

VP

VP

held a meeting

PP

with Sharon

VP ! PP(1) VP(2), VP(2) PP(1)

VP ! juxing le huitan, held a meetingPP ! yu Shalong, with Sharon

Page 115: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Translation as Parsing

50

• translation with SCFGs => monolingual parsing

• parse the source input with the source projection

• build the corresponding target sub-strings in parallel

PP1, 3 VP3, 6

VP1, 6

yu Shalong juxing le huitan

VP ! PP(1) VP(2), VP(2) PP(1)

VP ! juxing le huitan, held a meetingPP ! yu Shalong, with Sharon

Page 116: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Translation as Parsing

50

• translation with SCFGs => monolingual parsing

• parse the source input with the source projection

• build the corresponding target sub-strings in parallel

PP1, 3 VP3, 6

VP1, 6

yu Shalong juxing le huitan

VP ! PP(1) VP(2), VP(2) PP(1)

VP ! juxing le huitan, held a meetingPP ! yu Shalong, with Sharon

Page 117: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Translation as Parsing

50

• translation with SCFGs => monolingual parsing

• parse the source input with the source projection

• build the corresponding target sub-strings in parallel

PP1, 3 VP3, 6

VP1, 6

yu Shalong juxing le huitan

with Sharon held a talk

held a talk with Sharon

VP ! PP(1) VP(2), VP(2) PP(1)

VP ! juxing le huitan, held a meetingPP ! yu Shalong, with Sharon

Page 118: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Translation as Parsing

50

• translation with SCFGs => monolingual parsing

• parse the source input with the source projection

• build the corresponding target sub-strings in parallel

PP1, 3 VP3, 6

VP1, 6

yu Shalong juxing le huitan

with Sharon held a talk

held a talk with Sharon

VP ! PP(1) VP(2), VP(2) PP(1)

VP ! juxing le huitan, held a meetingPP ! yu Shalong, with Sharon

complexity: same as CKY parsing -- O(n3)

Page 119: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Adding a Bigram Model

51

PP1, 3 VP3, 6

VP1, 6

_ _●●● ... talk_ _ _ _ _ ●●●●● ... Sharon

_ _●●● ... talks

_ _●●● ... meeting ●●●●● ... Shalong

with ... Sharon

along ... Sharonwith ... Shalong

held ... talkheld ... meeting

hold ... talks

with Sharon

bigram

held ... talk

VP3, 6

with ... Sharon

PP1, 3

bigram

Page 120: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Adding a Bigram Model

51

PP1, 3 VP3, 6

VP1, 6

_ _●●● ... talk_ _ _ _ _ ●●●●● ... Sharon

_ _●●● ... talks

_ _●●● ... meeting ●●●●● ... Shalong

with ... Sharon

along ... Sharonwith ... Shalong

held ... talkheld ... meeting

hold ... talks

with Sharon

bigram

held ... talk

VP3, 6

with ... Sharon

PP1, 3

bigram

held ... Sharon

VP1, 6

Page 121: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Adding a Bigram Model

51

PP1, 3 VP3, 6

VP1, 6

_ _●●● ... talk_ _ _ _ _ ●●●●● ... Sharon

_ _●●● ... talks

_ _●●● ... meeting ●●●●● ... Shalong

with ... Sharon

along ... Sharonwith ... Shalong

held ... talkheld ... meeting

hold ... talks

with Sharon

bigram

complexity: O(n3 V4(m-1) )

held ... talk

VP3, 6

with ... Sharon

PP1, 3

bigram

held ... Sharon

VP1, 6

Page 122: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Two Dimensional Survey

52

topological(acyclic)

best-first(superior)

graphs with semirings

(e.g., FSMs)

hypergraphs with weight functions

(e.g., CFGs)

Viterbi Dijkstra

Generalized Viterbi Knuth

traversing order

sear

ch s

pace

Page 123: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Viterbi Algorithm for DAHs1. topological sort

2. visit each vertex v in sorted order and do updates

• for each incoming hyperedge e = ((u1, .., u|e|), v, fe)

• use d(ui)’s to update d(v)

• key observation: d(ui)’s are fixed to optimal at this time

• time complexity: O( V + E ) (assuming constant arity)53

vu1

u2

fed(v) ! = fe(d(u1), · · · , d(u|e|))

Page 124: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Forward Variant for DAHs1. topological sort

2. visit each vertex v in sorted order and do updates

• for each outgoing hyperedge e = ((u1, .., u|e|), h(e), fe)

• if d(ui)’s have all been fixed to optimal

• use d(ui)’s to update d(h(e))

• time complexity: O( V + E )54

v = ui

h(e)u1

v

fe

u2 =

h(e)fe

Page 125: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Forward Variant for DAHs1. topological sort

2. visit each vertex v in sorted order and do updates

• for each outgoing hyperedge e = ((u1, .., u|e|), h(e), fe)

• if d(ui)’s have all been fixed to optimal

• use d(ui)’s to update d(h(e))

• time complexity: O( V + E )54

v = ui

h(e)u1

v

fe

u2 =

h(e)fe

Page 126: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Forward Variant for DAHs1. topological sort

2. visit each vertex v in sorted order and do updates

• for each outgoing hyperedge e = ((u1, .., u|e|), h(e), fe)

• if d(ui)’s have all been fixed to optimal

• use d(ui)’s to update d(h(e))

• time complexity: O( V + E )54

v = ui

h(e)u1

v

fe

u2 =

Q: how to avoid repeated checking?maintain a counter r[e] for each e: how many tails yet to be fixed?fire this hyperedge only if r[e]=0

h(e)fe

Page 127: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

55

S V - S

s ...

d(u) ! = d(v) " w(v, u)

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

v

Page 128: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

55

S V - S

vs ...

d(u) ! = d(v) " w(v, u)

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

Page 129: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Dijkstra Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

55

uw(v, u)

S V - S

vs ...

d(u) ! = d(v) " w(v, u)

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

Page 130: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Knuth (1977) Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

56

S V - S

vs ...

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

u1

v

Page 131: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Knuth (1977) Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

56

S V - S

vs ...

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

u1

v

Page 132: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Knuth (1977) Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

56

S V - S

vs ...

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

u1

vh(e)

fe

Page 133: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Knuth (1977) Algorithm

• keep a cut (S : V - S) where S vertices are fixed

• maintain a priority queue Q of V - S vertices

• each iteration choose the best vertex v from Q

• move v to S, and use d(v) to forward-update others

56

S V - S

vs ...

time complexity:O((V+E) lgV) (binary heap)

O(V lgV + E) (fib. heap)

u1

vh(e)

fe

Page 134: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: Best-First/A* Parsing

• Knuth for parsing: best-first (Caraballo & Charniak, 1998)

• further speed-up: use A* heuristics

• showed significant speed up with carefully designed heuristic functions (Klein and Manning, 2003)

• heuristic function: an estimate of outside cost

57

[open problem] can you still define heuristic function if weight functions are not semiring-composed?

(S, 0, n)

Page 135: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Example: Best-First/A* Parsing

• Knuth for parsing: best-first (Caraballo & Charniak, 1998)

• further speed-up: use A* heuristics

• showed significant speed up with carefully designed heuristic functions (Klein and Manning, 2003)

• heuristic function: an estimate of outside cost

57

[open problem] can you still define heuristic function if weight functions are not semiring-composed?

(S, 0, n)

Page 136: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Outside Cost in Hypergraph• outside cost: yet to pay to reach goal

• let’s only consider semiring-composed case

• and only acyclic hypergraphs

• after computing d(v) for all v from bottom-up

• backwards Viterbi from top-down (outside-in)

58

e

...

...h(S0,n) = īh(v) ⊕= h(u)⊗w(e)⊗d(v’)

v v’

u

d(v)

s

v

t

d(v)

h(v)

S0,n

h(v)

d(v)

Page 137: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Outside Cost in Hypergraph• outside cost: yet to pay to reach goal

• let’s only consider semiring-composed case

• and only acyclic hypergraphs

• after computing d(v) for all v from bottom-up

• backwards Viterbi from top-down (outside-in)

58

e

...

...h(S0,n) = īh(v) ⊕= h(u)⊗w(e)⊗d(v’)

v v’

u

d(v)

Q: d(v)⊗h(v) = ?

s

v

t

d(v)

h(v)

S0,n

h(v)

d(v)

Page 138: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Projection-based Heuristics

• how to guess? project onto a coarser-grained space

• and parse with the coarser grammar

• outside cost of of the coarser item as heuristics

59

(Klein and Manning, 2003)

Page 139: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Projection-based Heuristics

• how to guess? project onto a coarser-grained space

• and parse with the coarser grammar

• outside cost of of the coarser item as heuristics

59

(Klein and Manning, 2003)

Page 140: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Projection-based Heuristics

• how to guess? project onto a coarser-grained space

• and parse with the coarser grammar

• outside cost of of the coarser item as heuristics

60

(Klein and Manning, 2003)

Page 141: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Projection-based Heuristics

• how to guess? project onto a coarser-grained space

• and parse with the coarser grammar

• outside cost of of the coarser item as heuristics

61

(Klein and Manning, 2003)

Page 142: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Projection-based Heuristics

• how to guess? project onto a coarser-grained space

• and parse with the coarser grammar

• outside cost of of the coarser item as heuristics

61

(Klein and Manning, 2003)

Page 143: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Projection-based Heuristics

• how to guess? project onto a coarser-grained space

• and parse with the coarser grammar

• outside cost of of the coarser item as heuristics

61

(Klein and Manning, 2003)ĥ (VBD2,3) = h’ (V2,3)

Page 144: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Analogy with Graphs

62

Page 145: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Analogy with Graphs

62

Page 146: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

More on Coarse-to-Fine• multilevel coarse-to-fine A*

• heuristic = exact outside cost in previous stage

• ĥi (v) = hi-1 (proj i-1(v))

• VBD>V>X. ĥi (VBD1,5) = hi-1 (V1,5); ĥi-1 (V1,5) = hi-2 (X1,5)

• multilevel coarse-to-fine Viterbi w/ beam-search

• Viterbi + beam pruning in each stage

• prune according to merit: d(v)⊗h(v) ⊘ d(TOP)

• hard to derive a provably correct threshold

• in practice: use a preset threshold (but works well!)63

Page 147: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

More on Coarse-to-Fine• multilevel coarse-to-fine A*

• heuristic = exact outside cost in previous stage

• ĥi (v) = hi-1 (proj i-1(v))

• VBD>V>X. ĥi (VBD1,5) = hi-1 (V1,5); ĥi-1 (V1,5) = hi-2 (X1,5)

• multilevel coarse-to-fine Viterbi w/ beam-search

• Viterbi + beam pruning in each stage

• prune according to merit: d(v)⊗h(v) ⊘ d(TOP)

• hard to derive a provably correct threshold

• in practice: use a preset threshold (but works well!)63

Page 148: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

More on Coarse-to-Fine• multilevel coarse-to-fine A*

• heuristic = exact outside cost in previous stage

• ĥi (v) = hi-1 (proj i-1(v))

• VBD>V>X. ĥi (VBD1,5) = hi-1 (V1,5); ĥi-1 (V1,5) = hi-2 (X1,5)

• multilevel coarse-to-fine Viterbi w/ beam-search

• Viterbi + beam pruning in each stage

• prune according to merit: d(v)⊗h(v) ⊘ d(TOP)

• hard to derive a provably correct threshold

• in practice: use a preset threshold (but works well!)63

Page 149: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Same Picture Again

64

monotonic optimization problems

acyclic: Viterbi

superior: Knuth

many NLP

problems

Page 150: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Same Picture Again

64

monotonic optimization problems

acyclic: Viterbi

superior: Knuth

many NLP

problems

PCFG parsing with CNF

Page 151: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Same Picture Again

64

monotonic optimization problems

acyclic: Viterbi

superior: Knuth

many NLP

problems

Inside-Outside Alg.(Inside semiring)

PCFG parsing with CNF

Page 152: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Same Picture Again

64

monotonic optimization problems

acyclic: Viterbi

superior: Knuth

many NLP

problems

Inside-Outside Alg.(Inside semiring) non-prob.

(discriminative) parsing

PCFG parsing with CNF

Page 153: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Same Picture Again

64

monotonic optimization problems

acyclic: Viterbi

superior: Knuth

many NLP

problems

Inside-Outside Alg.(Inside semiring) non-prob.

(discriminative) parsing

cyclic grammars

PCFG parsing with CNF

Page 154: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Same Picture Again

64

monotonic optimization problems

acyclic: Viterbi

superior: Knuth

many NLP

problems

Inside-Outside Alg.(Inside semiring) non-prob.

(discriminative) parsing

cyclic grammars

PCFG parsing with CNF

generalized generalized

Bellman-Ford (open)

Page 155: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

Liang Huang (Penn) Dynamic Programming

Take Home Message

• Dynamic Programming is cool, easy, and universal!

• two frameworks and two types of algorithms

• monotonicity; acyclicity and/or superiority

• topological (Viterbi) vs. best-first style (Dijkstra/Knuth/A*)

• when to choose which: A* can finish early if lucky

• graph (lattice) vs. hypergraph (forest)

• incremental, finite-state vs. branching, context-free

• covered many typical NLP applications

• a better understanding of theory helps in practice65

Page 156: Advanced Dynamic Programming in CLweb.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdfLiang Huang (Penn) Dynamic Programming Dynamic Programming • Dynamic Programming

THE END - Thanks!Thanks!

66final slides will be available on my website.

(S, 0, n)

w0 w1 ... wn-1

Questions?Comments?