efficient large-scale structured learning

29
Efficient Large-Scale Structured Learning Steve Branson Oscar Beijbom Serge Belongie CVPR 2013, Portland, Oregon UC San Diego UC San Diego Caltech

Upload: bruis

Post on 24-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Efficient Large-Scale Structured Learning. Caltech. UC San Diego. UC San Diego. Steve Branson Oscar Beijbom Serge Belongie. CVPR 2013, Portland, Oregon. Overview. Structured prediction Learning from larger datasets. TINY IMAGES. Deformable part models. Object detection. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Large-Scale Structured Learning

Efficient Large-Scale Structured Learning

Steve Branson Oscar Beijbom Serge Belongie

CVPR 2013, Portland, Oregon

UC San Diego UC San Diego Caltech

Page 2: Efficient Large-Scale Structured Learning

Overview• Structured prediction • Learning from larger datasets

TINY IMAGES

Large Datasets

Mammal

Primate Hoofed Mammal

Odd-toedGorilla

Deformable part models Object detection

Orangutan Even-toed

Cost sensitive Learning

Page 3: Efficient Large-Scale Structured Learning

Overview• Available tools for structured learning not as

refined as tools for binary classification• 2 sources of speed improvement– Faster stochastic dual optimization algorithms– Application-specific importance sampling routine

Mammal

Primate Hoofed Mammal

Odd-toedGorillaOrangutan

Even-toed

Page 4: Efficient Large-Scale Structured Learning

Summary• Usually, train time = 1-10 times test time• Publicly available software package– Fast algorithms for multiclass SVMs, DPMs– API to adapt to new applications– Support datasets too large to fit in memory– Network interface for online & active learning

Mammal

Primate Hoofed Mammal

Odd-toedGorillaOrangutan

Even-toed

Page 5: Efficient Large-Scale Structured Learning

Summary

Cost-sensitive multiclass SVM• 10-50 times faster than

SVMstruct

• As fast as 1-vs-all binary SVM

Deformable part models• 50-1000 faster than– SVMstruct

– Mining hard negatives– SGD-PEGASOS

Mammal

Primate Hoofed Mammal

Odd-toedGorillaOrangutan Even-toed

Page 6: Efficient Large-Scale Structured Learning

Binary vs. Structured

Binary Learner

SVM, Boosting,Logistic Regression,

etc.

Object Detection, Pose Registration, Attribute

Prediction, etc.

BIN

ARY

DATA

SET

BIN

ARY

OU

TPU

T

Structured Output

Structured Dataset

𝑌=(𝑥 , 𝑦 ,𝑤 , h)

𝑌=−1𝑌=+1

Page 7: Efficient Large-Scale Structured Learning

Binary vs. Structured

Binary Learner

SVM, Boosting,Logistic Regression,

etc.

Object Detection, Pose Registration, Attribute

Prediction, etc.

BIN

ARY

DATA

SET

BIN

ARY

OU

TPU

T

Structured Output

Structured Dataset

• Pros: binary classifier is application independent• Cons: what is lost in terms of:– Accuracy at convergence?– Computational efficiency?

Page 8: Efficient Large-Scale Structured Learning

Binary vs. Structured

Structured Prediction Loss∆ (𝑔 (𝑋 ) ,𝑌 𝑔𝑡)

≈ ≈ ∆ 01Binary Loss Convex Upper Bound

Source of Computational Speed

Page 9: Efficient Large-Scale Structured Learning

Binary vs. Structured

Structured Prediction Loss∆ (𝑔 (𝑋 ) ,𝑌 𝑔𝑡)

≈ ≈ ∆ 01Binary Loss Convex Upper Bound

ℓ (𝑋 ;𝑤)∆ (𝑔 (𝑋 ) ,𝑌 )

≈Convex Upper Bound on Structured Prediction Loss

Page 10: Efficient Large-Scale Structured Learning

Binary vs. Structured

Application-specific optimization algorithms that:– Converge to lower test error than binary solutions– Lower test error for all amounts of train time

Page 11: Efficient Large-Scale Structured Learning

Binary vs. Structured

Application-specific optimization algorithms that:– Converge to lower test error than binary solutions– Lower test error for all amounts of train time

Page 12: Efficient Large-Scale Structured Learning

Structured SVM• SVMs w/ structured output

• Max-margin MRF [Taskar et al. NIPS’03]

[Tsochantaridis et al. ICML’04]

Page 13: Efficient Large-Scale Structured Learning

Binary SVM SolversF aster Linear SVM Solvers

SVM struct𝑂 (𝑇𝑛𝜆𝜖 )

Quadratic to linear in trainset size

SVM perf P EGASOS L IBLINEARCutting Plane SGD≫ ¿ ≥

Page 14: Efficient Large-Scale Structured Learning

Binary SVM SolversF aster Linear SVM Solvers

SVM struct𝑂 (𝑇𝑛𝜆𝜖 )

Linear to independent in trainset size

Quadratic to linear in trainset size

SVM perf P EGASOS L IBLINEARCutting Plane SGD≫ ¿ ≥

Page 15: Efficient Large-Scale Structured Learning

Binary SVM SolversF aster Linear SVM Solvers

SVM struct𝑂 (𝑇𝑛𝜆𝜖 )

Linear to independent in trainset size

Quadratic to linear in trainset size

• Faster on multiple passes• Detect convergence• Less sensitive to

regularization/learning rate

SVM perf P EGASOS L IBLINEARCutting Plane SGD≫ ¿ ≥

Page 16: Efficient Large-Scale Structured Learning

Structured SVM Solvers

SVM perf P EGASOS L IBLINEARCutting Plane SGD

Faster Linear SVM Solvers

≫ ¿ ≥

SVM structCutting Plane SGD¿ ≥Applied to

SSVMs

[Shalev-Shwartz et al. JMLR’13]

[Ratliff et al. AIStats’07]

Page 17: Efficient Large-Scale Structured Learning

• Use faster stochastic dual algorithms• Incorporate application-specific importance

sampling routine– Reduce train times when prediction time T is large– Incorporate tricks people use for binary methods

Random Example Importance Sample

Maximize Dual SSVM objective w.r.t. samples

Our Approach

Page 18: Efficient Large-Scale Structured Learning

Our ApproachFor t=1… do1. Choose random training example (Xi,Yi)2. ,…,ImportanceSample()3. Approx. maximize Dual SSVM objective w.r.t. iend

Random Example Importance Sample

Maximize Dual SSVM objective w.r.t. samples

(Provably fast convergence for simple approx. solver)

Page 19: Efficient Large-Scale Structured Learning

Recent Papers w/ Similar Ideas

• Augmenting cutting plane SSVM w/ m-best solutions

• Applying stochastic dual methods to SSVMsA. Guzman-Rivera, P. Kohli, D. Batra. “DivMCuts…” AISTATS’13.

S. Lacoste-Julien, et al. “Block-Coordinate Frank-Wolfe…” JMLR’13 .

Page 20: Efficient Large-Scale Structured Learning

Applying to New Problems

1. Define loss function 2. Implement feature extraction routine3. Implement importance sampling routine

1. Loss function 2. Features 3. Importance sampling routine

Page 21: Efficient Large-Scale Structured Learning

Applying to New Problems3. Implement importance sampling routine

a) Is fastb) Favor samples w/ • High loss+• Uncorrelated features: small

Page 22: Efficient Large-Scale Structured Learning

Example: Object Detection

1. Loss function 2. Features 3. Importance sampling routine• Add sliding window & loss

into dense score map• Greedy NMS

Page 23: Efficient Large-Scale Structured Learning

Example: Deformable Part Models

1. Loss function sum of part losses

2. Features 3. Importance sampling routine• Dynamic programming• Modified NMS to return

diverse set of poses

Page 24: Efficient Large-Scale Structured Learning

Cost-Sensitive Multiclass SVM

1. Loss functionClass confusion cost 4

2. Featurese.g., bag-of-words

3. Importance sampling routine• Return all classes• Exact solution using 1

dot product per class

cat dog ant fly car bus cat dog ant fly car bus

Page 25: Efficient Large-Scale Structured Learning

Results: CUB-200-2011

• Pose mixture model, 312 part/pose detectors• Occlusion/visibility model• Tree-structured DPM w/ exact inference

Page 26: Efficient Large-Scale Structured Learning

Results: CUB-200-2011

5794 training examples 400 training examples

• ~100X faster than mining hard negatives and SVMstruct

• 10-50X faster than stochastic sub-gradient methods• Close to convergence at 1 pass through training set

Page 27: Efficient Large-Scale Structured Learning

Results: ImageNet

Comparison to other fast linear SVM solvers

Comparison to other methods for cost-sensitive SVMs

• Faster than LIBLINEAR, PEGASOS• 50X faster than SVMstruct

Page 28: Efficient Large-Scale Structured Learning

Conclusion• Orders of magnitude faster than SVMstruct

• Publicly available software package– Fast algorithms for multiclass SVMs, DPMs– API to adapt to new applications– Support datasets too large to fit in memory– Network interface for online & active learning

Mammal

Primate Hoofed Mammal

Odd-toedGorillaOrangutan

Even-toed

Page 29: Efficient Large-Scale Structured Learning

Thanks!