structured prediction: a large margin approach ben taskar university of pennsylvania joint work...

75
Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan, D. Klein, D. Koller, S. Lacoste-Julien, C. Manning

Upload: alvin-ramsey

Post on 26-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Structured Prediction:A Large Margin Approach

Ben TaskarUniversity of Pennsylvania

Joint work with:

V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan, D. Klein, D. Koller, S. Lacoste-Julien, C. Manning

Page 2: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

“Don’t worry, Howard. The big questions are multiple choice.”

Page 3: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Handwriting Recognition

brace

Sequential structure

x y

Page 4: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Object Segmentation

Spatial structure

x y

Page 5: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Natural Language Parsing

The screen was a sea of red

Recursive structure

x y

Page 6: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Bilingual Word Alignment

What is the anticipated cost of collecting fees under the new proposal?

En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?

x yWhat

is the

anticipated

costof

collecting fees

under the

new proposal

?

En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?Combinatorial structure

Page 7: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Protein Structure and Disulfide Bridges

Protein: 1IMT

AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRMHHTCPCAPNLACVQTSPKKFKCLSK

Page 8: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Local Prediction

Classify using local information Ignores correlations & constraints!

br ea c

Page 9: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Local Predictionbuildingtreeshrubground

Page 10: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Structured Prediction

Use local information Exploit correlations

br ea c

Page 11: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Structured Predictionbuildingtreeshrubground

Page 12: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Outline Structured prediction models

Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings

Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation

Page 13: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Structured Models

Mild assumption:

linear combination

space of feasible outputs

scoring function

Page 14: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Chain Markov Net (aka CRF*)

a-z

a-z

a-z

a-z

a-z

y

x

*Lafferty et al. 01

Page 15: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Chain Markov Net (aka CRF*)

a-z

a-z

a-z

a-z

a-z

y

x

*Lafferty et al. 01

Page 16: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Associative Markov Nets

Point featuresspin-images, point height

Edge featureslength of edge, edge orientation

yj

yk

jk

j

“associative” restriction

Page 17: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

CFG Parsing

#(NP DT NN)

#(PP IN NP)

#(NN ‘sea’)

Page 18: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Bilingual Word Alignment

position orthography association

Whatis

theanticipate

dcost

ofcollecting

fees under

the new

proposal?

En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?

j

k

Page 19: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Disulfide Bonds: Non-bipartite Matching

1

2 3

4

6 5

RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6

6

1

2

4

5

3

Fariselli & Casadio `01, Baldi et al. ‘04

Page 20: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Scoring Function

RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6

RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6

1

2 3

4

6 5

amino acid identities phys/chem properties

Page 21: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Structured Models

Mild assumptions:

linear combination

sum of part scores

space of feasible outputs

scoring function

Page 22: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Supervised Structured Prediction

Learning Prediction

Estimate w

Example:Weighted matching

Generally: Combinatorial

optimization

Data

Model:

Likelihood(intractable)

MarginLocal(ignores

structure)

Page 23: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Outline Structured prediction models

Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings

Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation

Page 24: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

We want:

Equivalently:

OCR Example

a lot!…

“brace”

“brace”

“aaaaa”

“brace” “aaaab”

“brace” “zzzzz”

Page 25: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

We want:

Equivalently:

‘It was red’

Parsing Example

a lot!

SA B

C D

SA BD F

SA B

C D

SE F

G H

SA B

C D

SA B

C D

SA B

C D

‘It was red’

‘It was red’

‘It was red’

‘It was red’

‘It was red’

‘It was red’

Page 26: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

We want:

Equivalently:

‘What is the’‘Quel est le’

Alignment Example

a lot!…

123

123

‘What is the’‘Quel est le’

123

123

‘What is the’‘Quel est le’

123

123

‘What is the’‘Quel est le’

123

123

123

123

123

123

123

123

‘What is the’‘Quel est le’

‘What is the’‘Quel est le’

‘What is the’‘Quel est le’

Page 27: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Structured Loss

b c a r e b r o r e b r o c eb r a c e

2 2 10

123

123

123

123

123

123

123

123

‘What is the’‘Quel est le’

0 1 2 2S

A EC D

SB E

A C

SB D

A C

SA B

C D‘It was red’

0 1 2 3

Page 28: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Large margin estimation Given training examples , we want:

Maximize margin

Mistake weighted margin:

# of mistakes in y

*Collins 02, Altun et al 03, Taskar 03

Page 29: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Large margin estimation

Eliminate

Add slacks for inseparable case

Page 30: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Large margin estimation Brute force enumeration

Min-max formulation

‘Plug-in’ linear program for inference

Page 31: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Min-max formulation

LP Inference

Structured loss (Hamming):

Inference

discrete optim.

Key step:

continuous optim.

Page 32: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Outline Structured prediction models

Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings

Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation

Page 33: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

y z Map for Markov Nets

0 1 . 0

0 0 . 0

. . . 0

0 0 0 0

1

0

:

0

0

1

:

0

1

0

:

0

0

1

:

0

0

1

:

0

a

b

:

z

0 0 . 0

1 0 . 0

. . . 0

0 0 0 0

0 1 . 0

0 0 . 0

. . . 0

0 0 0 0

0 0 . 0

0 1 . 0

. . . 0

0 0 0 0

a

b

:

z

a b . z a b . z a b . z a b . z

Page 34: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Markov Net Inference LP

0 0 0 0

0 0 0 0

0 1 0 0

0 0 0 0

0

0

1

0

0 1 0 0

Has integral solutions z for chains, treesCan be fractional for untriangulated networks

normalization

agreement

Page 35: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Associative MN Inference LP

For K=2, solutions are always integral (optimal) For K>2, within factor of 2 of optimal

“associative” restriction

0

1

0

0

0

1

0

0

0 1 0 0

Page 36: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

CFG Chart

CNF tree = set of two types of parts: Constituents (A, s, e) CF-rules (A B C, s, m, e)

Page 37: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

CFG Inference LP

inside

outside

Has integral solutions z

root

Page 38: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Matching Inference LP

Has integral solutions z

degree

Whatis

theanticipate

dcost

ofcollecting

fees under

the new

proposal?

En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?

j

k

Page 39: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

LP Duality Linear programming duality

Variables constraints Constraints variables

Optimal values are the same When both feasible regions are bounded

Page 40: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Min-max Formulation

LP duality

Page 41: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Min-max formulation summary

Formulation produces concise QP for Low-treewidth Markov networks Associative MNs (K=2) Context free grammars Bipartite matchings Approximate for untriangulated MNs, AMNs with K>2

*Taskar et al 04

Page 42: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Unfactored Primal/Dual

QP duality

Exponentially many constraints/variables

Page 43: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Factored Primal/Dual

By QP duality

Dual inherits structure from problem-specific inference LP

Variables correspond to a decomposition of variables of the flat case

Page 44: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

The Connection

b c a r e b r o r e b r o c eb r a c e

rc

ao

cr

.2.15.25

.4

.2 .35

.65.8

.4

.61b 1e

2 2 10

Page 45: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Duals and Kernels

Kernel trick works: Factored dual Local functions (log-potentials) can use

kernels

Page 46: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Simple iterative method

Unstable for structured output: fewer instances, big updates

May not converge if non-separable Noisy

Voted / averaged perceptron [Freund & Schapire 99, Collins 02]

Regularize / reduce variance by aggregating over iterations

Alternatives: Perceptron

Page 47: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Add most violated constraint

Handles more general loss functions Only polynomial # of constraints needed Need to re-solve QP many times Worst case # of constraints larger than

factored

Alternatives: Constraint Generation

[Collins 02; Altun et al, 03; Tsochantaridis et al, 04]

Page 48: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Handwriting Recognition

Length: ~8 charsLetter: 16x8 pixels 10-fold Train/Test5000/50000

letters600/6000 words

Models: Multiclass-SVMs* CRFs M3 nets

*Crammer & Singer 01

0

5

10

15

20

25

30

CRFsMC–SVMs M^3 nets

Te

st e

rro

r (a

vera

ge

pe

r-c

ha

ract

er) raw

pixelsquadratic

kernelcubickernel

45% error reduction over linear CRFs33% error reduction over multiclass

SVMs

better

Page 49: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

0

5

10

15

20

Tes

t Err

or

SVMs RMNS M^3Ns

Hypertext Classification WebKB dataset

Four CS department websites: 1300 pages/3500 links Classify each page: faculty, course, student, project, other Train on three universities/test on fourth

53% error reduction over SVMs

38% error reduction over RMNs

relaxed dual

*Taskar et al 02

better

loopy belief propagation

Page 50: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

3D Mapping

Laser Range Finder

GPS

IMU

Data provided by: Michael Montemerlo & Sebastian Thrun

Label: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million points

Page 51: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,
Page 52: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,
Page 53: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,
Page 54: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,
Page 55: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Segmentation results

Hand labeled 180K test pointsModel

Accuracy

SVM 68%

V-SVM

73%

M3N 93%

Page 56: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Fly-through

Page 57: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Word Alignment Results

Model *Error

Local learning+matching 10.0

Our approach 8.5

Data: [Hansards – Canadian Parliament] Features induced on 1 mil unsupervised sentences Trained on 100 sentences (10,000 edges) Tested on 350 sentences (35,000 edges)

[Taskar+al 05]

*Error: weighted combination of precision/recall [Lacoste-Julien+Taskar+al 06]

GIZA/IBM4 [Och & Ney 03] 6.5

+Our approach+QAP 4.5

+Local learning+matching 5.4

+Our approach 4.9

Page 58: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Outline Structured prediction models

Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings

Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation

Page 59: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Certificate formulation Non-bipartite matchings:

O(n3) combinatorial algorithm No polynomial-size LP known

Spanning trees No polynomial-size LP known Simple certificate of optimality

Intuition: Verifying optimality easier than optimizing

Compact optimality condition of wrt.

1

2 3

4

6 5

ijkl

Page 60: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Certificate for non-bipartite matching

Alternating cycle: Every other edge is in matching

Augmenting alternating cycle: Score of edges not in matching greater than edges in matching

Negate score of edges not in matching Augmenting alternating cycle = negative length alternating

cycle

Matching is optimal no negative alternating cycles

1

2 3

4

6 5

Edmonds ‘65

Page 61: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Certificate for non-bipartite matching

Pick any node r as root

= length of shortest alternating

path from r to j

Triangle inequality:

Theorem:

No negative length cycle distance function d exists

Can be expressed as linear constraints: O(n) distance variables, O(n2) constraints

1

2 3

4

6 5

Page 62: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Certificate formulation

Formulation produces compact QP for Spanning trees Non-bipartite matchings Any problem with compact optimality condition

*Taskar et al. ‘05

Page 63: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Disulfide Bonding Prediction Data [Swiss Prot 39]

450 sequences (4-10 cysteines) Features:

windows around C-C pair physical/chemical properties

[Taskar+al 05]

Model *Acc

Local learning+matching 41%

Recursive Neural Net [Baldi+al’04] 52%

Our approach (certificate) 55%

*Accuracy: % proteins with all correct bonds

Page 64: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Formulation summary

Brute force enumeration

Min-max formulation ‘Plug-in’ convex program for inference

Certificate formulation Directly guarantee optimality of

Page 65: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Omissions Kernels

Non-parametric models

Structured generalization bounds Bounds on hamming loss

Scalable algorithms (no QP solver needed) Structured SMO (works for chains, trees)

[Taskar 04] Structured ExpGrad (works for chains, trees)

[Bartlett+al 04] Structured ExtraGrad (works for matchings, AMNs)

[Taskar+al 06]

Page 66: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Open questions Statistical consistency

Hinge loss not consistent for non-binary output [See Tewari & Bartlett 05, McAllester 07]

Learning with approximate inference Does constant factor approximate inference

guarantee anything about learning? No [See Kulesza & Pereira 07] Perhaps other assumptions needed

Discriminative structure learning Using sparsifying priors

Page 67: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Conclusion Two general techniques for structured large-margin

estimation

Exact, compact, convex formulations

Allow efficient use of kernels

Tractable when other estimation methods are not

Efficient learning algorithms

Empirical success on many domains

Page 68: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

ReferencesY. Altun, I. Tsochantaridis, and T. Hofmann. Hidden

Markov support vector machines. ICML03.M. Collins. Discriminative training methods for hidden

Markov models: Theory and experiments with perceptron algorithms. EMNLP02

K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR01

J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML04

More papers at http://www.cis.upenn.edu/~taskar

Page 69: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,
Page 70: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Modeling First Order Effects

Monotonicity Local inversion Local fertility

QAP NP-complete Sentences (30 words, 1k vars) few seconds (Mosek) Learning: use LP relaxation Testing: using LP, 83.5% sentences, 99.85% edges integral

Page 71: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Segmentation Model Min-Cut

0 1

Local evidence

Spatial smoothness

Computing is hard in general, but if edge potentials attractive min-cut algorithmMultiway-cut for multiclass case use LP relaxation

[Greig+al 89, Boykov+al 99, Kolmogorov & Zabih 02, Taskar+al 04]

Page 72: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Scalable Algorithms Batch and online Linear in the size of the data

Iterate until convergence For each example in the training sample

Run inference using current parameters (varies by method) Online: Update parameters using computed example values

Batch: Update parameters using computed sample values

Structured SMO (Taskar et al, 03; Taskar 04) Structured Exponentiated Gradient (Bartlett et al, 04)Structured Extragradient (Taskar et al, 05)

Page 73: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Experimental Setup Standard Penn treebank split (2-21/22/23) Generative baselines

Klein & Manning 03 and Collins 99 Discriminative

Basic = max-margin version of K&M 03 Lexical & Lexical + Aux

Lexical features (on constituent parts only)ts-1 [ts … te] te+1 predicted tags

xs-1 [xs … xe] xe+1

Auxillary features Flat classifier using same features Prediction of K&M 03 on each span

Page 74: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Results for sentences ≤40 words

Model LP LR F1

Generative 86.37 85.27 85.82

Lexical+Aux* 87.56 86.85 87.20

Collins 99* 85.33 85.94 85.73

*Trained only on sentences ≤20 words

*Taskar et al 04

Page 75: Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan,

Example

The Egyptian president said he would visit Libya today to resume the talks.

Generative model: Libya today is base NP

Lexical model: today is a one word constituent