structured prediction cascades ben taskar david weiss, ben sapp, alex toshev university of...
TRANSCRIPT
![Page 1: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/1.jpg)
Structured Prediction CascadesBen Taskar
David Weiss, Ben Sapp, Alex Toshev
University of Pennsylvania
![Page 2: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/2.jpg)
Supervised Learning
Learn from
• Regression:
• Binary Classification:
• Multiclass Classification:
• Structured Prediction:
![Page 3: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/3.jpg)
Handwriting Recognition
`structured’
x y
![Page 4: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/4.jpg)
Machine Translation
x y
‘Ce n'est pas un autreproblème de classification.’
‘This is not another classification problem.’
![Page 5: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/5.jpg)
Pose Estimation
x y
© Arthur Gretton
![Page 6: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/6.jpg)
Structured Models
space of feasible outputs
scoring function
parts = cliques, productions
Complexity of inference depends on “part” structure
![Page 7: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/7.jpg)
Supervised Structured Prediction
Learning Prediction
Discriminative estimation of θ
Data
Model:
Intractable/impracticalfor complex models
Intractable/impractical for complex models
3rd order OCR model = 1 mil states * lengthBerkeley English grammar = 4 mil productions * length^3Tree-based pose model = 100 mil states * # joints
![Page 8: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/8.jpg)
Approximation vs. Computation
• Usual trade-off: approximation vs. estimation – Complex models need more data– Error from over-fitting
• New trade-off: approximation vs. computation– Complex models need more time/memory – Error from inexact inference
• This talk: enable complex models via cascades
![Page 9: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/9.jpg)
An Inspiration: Viola Jones Face Detector
Scanning window at every location, scale and orientation
![Page 10: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/10.jpg)
Classifier Cascade
• Most patches are non-face• Filter out easy cases fast!!• Simple features first• Low precision, high recall• Learned layer-by-layer • Next layer more complex
C1
C2
C3
Cn
Non-face
Non-face
Non-face
Non-face Face
![Page 11: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/11.jpg)
Related Work
• Global Thresholding and Multiple-Pass Parsing. J Goodman, 97• A maximum-entropy-inspired parser. E. Charniak, 00.• Coarse-to-fine n-best parsing and MaxEnt discriminative
reranking, E Charniak & M Johnson, 05• TAG, dynamic pro- gramming, and the perceptron for efficient,
feature-rich parsing, X. Carreras, M. Collins, and T. Koo, 08• Coarse-to-Fine Natural Language Processing, S. Petrov, 09• Coarse-to-fine face detection. Fleuret, F., Geman, D, 01• Robust real-time object detection. Viola, P., Jones, M, 02• Progressive search space reduction for human pose estimation,
Ferrari, V., Marin-Jimenez, M., Zisserman, A, 08
![Page 12: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/12.jpg)
What’s a Structured Cascade?
• What to filter?– Clique assignments
• What are the layers?– Higher order models– Higher resol. models
• How to learn filters?– Novel convex loss– Simple online algorithm– Generalization bounds
F1
F2
F3
Fn
??????
??????
??????
?????? ‘structured’
![Page 13: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/13.jpg)
Trade-off in Learning Cascades
• Accuracy: Minimize the number of errors incurred by each level
• Efficiency: Maximize the number of filtered assignments at each level
Filter 1
Filter 2
Filter D
Predict
![Page 14: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/14.jpg)
Max-marginals (Sequences)
a
b
c
d
a
b
c
d
a
b
c
d
a
b
c
d
Score of an output:
Computemax (*bc*)
Max marginal:
![Page 15: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/15.jpg)
Filtering with Max-marginals
• Set threshold• Filter clique assignment if
a
b
c
d
a
b
c
d
a
b
c
d
a
b
c
d
![Page 16: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/16.jpg)
Filtering with Max-marginals
• Set threshold• Filter clique assignment if
a
b
c
d
a
b
c
d
a
b
c
d
a
b
c
dRemove edge
bc
![Page 17: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/17.jpg)
Why Max-marginals?
• Valid path guarantee– Filtering leaves at least one valid global assignment
• Faster inference – No exponentiation, O(k) vs. O(k log k) in some cases
• Convex estimation– Simple stochastic subgradient algorithm
• Generalization bounds for error/efficiency– Guarantees on expected trade-off
![Page 18: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/18.jpg)
Choosing a Threshold
• Threshold must be specific to input x– Max-marginal scores are not normalized
• Keep top-K max-marginal assignments?– Hard(er) to handle and analyze
• Convex alternative: max-mean-max function:
max score mean max marginalm = #edges
![Page 19: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/19.jpg)
Choosing a Threshold (cont)
Max marginal scores
Mean max marginalMax score
Score of truth
Range of possible thresholdsα = efficiency level
![Page 20: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/20.jpg)
Example OCR Cascade
a b c d e f g h i j k l …144 269 -137 80 52 -42 49 360 -27 -774 368 -24
Mean ≈ 0 Max = 368 α = 0.55 Threshold ≈ 204
a c d e f g i j l144 -137 80 52 -42 49 -27 -774 -24
b h k …b h k
![Page 21: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/21.jpg)
Example OCR Cascade
b h k
a e n u r
a b d g
a c e g n o u
a e m n u w
b h k
![Page 22: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/22.jpg)
b h k
Example OCR Cascade
a e n u r
a b d g
a c e g n o u
▪b ▪h ▪k
ba ha ka be he …
aa ab ad ag ea …
aa ac ae ag an …
a e m n u waa ae am an au …
![Page 23: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/23.jpg)
Example OCR Cascade
▪b ▪h ▪k
ba ha ka be he …
aa ab ad ag ea …
aa ac ae ag an …
aa ae am an au …
1440 1558 1575
1440 1558 1575 1413 1553
1400 1480 1575 1397 1393
1257 1285 1302 1294 1356
1336 1306 1390 1346 1306
Mean ≈ 1412
Max = 1575
α = 0.44
Threshold ≈ 1502
![Page 24: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/24.jpg)
▪b
ba be
aa ab ag ea
aa ac ae ag an
aa ae am an au
▪b
ba be
aa ab ag ea
aa ac ae ag an
aa ae am an au
Example OCR Cascade
▪h ▪k
ha ka he …
ad …
…
…
1440 1558 1575
1440 1558 1575 1413 1553
1400 1480 1575 1397 1393
1257 1285 1302 1294 1356
1336 1306 1390 1346 1306
Mean ≈ 1412
Max = 15751440
1440 1413
1400 1480 1397 1393
1257 1285 1302 1294 1356
1336 1306 1390 1346 1306
α = 0.44
Threshold ≈ 1502
![Page 25: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/25.jpg)
Example OCR Cascade
ka
▪h ▪k
ha he …
ad …
…
…
Mean ≈ 1412
Max = 1575▪h ▪k
ha ka he ke
ad ed
kn
do
ow om
nd
α = 0.44
Threshold ≈ 1502
![Page 26: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/26.jpg)
Example OCR Cascade
▪h ▪k
ha ka he ke
ad ed
kn
do
ow om
nd
…
…
▪▪h ▪▪k
▪ha ▪ka ▪he ▪ke
had kad hed ked
ado edo ndo
dow dom
![Page 27: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/27.jpg)
10x10x12 10x10x24 40x40x24 80x80x24 110×122×24
Example: Pictorial Structure Cascade
Upperarm
Lowerarm
H
T ULA
LLA
URA
LRA
k ≈ 320,000k2 ≈ 100mil
![Page 28: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/28.jpg)
• Filter loss
If score(truth) > threshold, all true states are safe• Efficiency loss
Proportion of unfiltered clique assignments
Quantifying Loss
![Page 29: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/29.jpg)
Learning One Cascade Level
• Fix α, solve convex problem for θ
Minimize filter mistakes at efficiency level α
No filter mistakes Margin w/ slack
![Page 30: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/30.jpg)
An Online Algorithm
• Stochastic sub-gradient update:
Features of truth
Convex combination: Features of best guess+ Average features of max marginal “witnesses”
![Page 31: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/31.jpg)
Generalization Bounds
• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data
Expected loss on true distribution
![Page 32: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/32.jpg)
)-ramp(
Generalization Bounds
• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data
-Empirical -ramp upper bound
0
1
![Page 33: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/33.jpg)
Generalization Bounds
• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data
n number of examples m number of clique assignments number of cliquesB
![Page 34: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/34.jpg)
Generalization Bounds
• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data
Similar bound holds for Le and all ® 2 [0,1]
![Page 35: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/35.jpg)
OCR Experiments
Dataset: http://www.cis.upenn.edu/~taskar/ocr
Character Error Word Error0
10
20
30
40
50
60
70
80
22.5
73.35
14.31
50.56
12.05
26.17
7.7515.54
1st order2nd order3rd order4rd order
% E
rror
![Page 36: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/36.jpg)
Efficiency Experiments
• POS Tagging (WSJ + CONLL datasets)– English, Bulgarian, Portuguese
• Compare 2nd order model filters– Structured perceptron (max-sum)– SCP w/ ® 2 [0, 0.2,0.4,0.6,0.8] (max-sum)– CRF log-likelihood (sum-product marginals)
• Tightly controlled– Use random 40% of training set– Use development set to fit regularization parameter ¸
and α for all methods
![Page 37: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/37.jpg)
POS Tagging Results
Error Cap (%) on Development Set
Max-sum (SCP)Max-sum (0-1)Sum-product (CRF)
English
![Page 38: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/38.jpg)
POS Tagging Results
Error Cap (%) on Development Set
Max-sum (SCP)Max-sum (0-1)Sum-product (CRF)
Portuguese
![Page 39: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/39.jpg)
POS Tagging Results
Error Cap (%) on Development Set
Max-sum (SCP)Max-sum (0-1)Sum-product (CRF)
Bulgarian
![Page 40: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/40.jpg)
English POS Cascade (2nd order)
Full SCP CRF Taglist
Accuracy (%) 96.83 96.82 96.84 ---
Filter Loss (%) 0 0.12 0.024 0.118
Test Time (ms) 173.28 1.56 4.16 10.6
Avg. Num States 1935.7 3.93 11.845 95.39
The cascade is efficient.
DT NN VRB ADJ
![Page 41: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/41.jpg)
method torso head upper arms
lower arms total
Ferrari et al 08 -- -- -- -- 61.4%
Ferrari et al. 09 -- -- -- -- 74.5%
Andriluka et al. 09 98.3% 95.7% 86.8% 51.7% 78.8%
Eichner et al. 09 98.72% 97.87% 92.8% 59.79% 80.1%
CPS (ours) 100.00% 99.57% 96.81% 62.34% 86.31%
Pose Estimation (Buffy Dataset)
“Ferrari score:” percent of parts correct within radius
![Page 42: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/42.jpg)
Pose Estimation
![Page 43: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/43.jpg)
Cascade Efficiency vs. Accuracy
![Page 44: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/44.jpg)
Features: Shape/Segmentation Match
![Page 45: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/45.jpg)
Features: Contour Continuation
![Page 46: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/46.jpg)
Conclusions
• Novel loss significantly improves efficiency
• Principled learning of accurate cascades
• Deep cascades focus structured inference and allow rich models
• Open questions: – How to learn cascade structure– Dealing with intractable models– Other applications: factored dynamic models, grammars
![Page 47: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania](https://reader033.vdocuments.net/reader033/viewer/2022051621/56649cc95503460f94991cc4/html5/thumbnails/47.jpg)
Thanks!
Structured Prediction Cascades, Weiss & Taskar, AISTATS10
Cascaded Models for Articulated Pose Estimation, Sapp, Toshev & Taskar, ECCV10