efficient large-scale structured learning
DESCRIPTION
Efficient Large-Scale Structured Learning. Caltech. UC San Diego. UC San Diego. Steve Branson Oscar Beijbom Serge Belongie. CVPR 2013, Portland, Oregon. Overview. Structured prediction Learning from larger datasets. TINY IMAGES. Deformable part models. Object detection. - PowerPoint PPT PresentationTRANSCRIPT
Efficient Large-Scale Structured Learning
Steve Branson Oscar Beijbom Serge Belongie
CVPR 2013, Portland, Oregon
UC San Diego UC San Diego Caltech
Overview• Structured prediction • Learning from larger datasets
TINY IMAGES
Large Datasets
Mammal
Primate Hoofed Mammal
Odd-toedGorilla
Deformable part models Object detection
Orangutan Even-toed
Cost sensitive Learning
Overview• Available tools for structured learning not as
refined as tools for binary classification• 2 sources of speed improvement– Faster stochastic dual optimization algorithms– Application-specific importance sampling routine
Mammal
Primate Hoofed Mammal
Odd-toedGorillaOrangutan
Even-toed
Summary• Usually, train time = 1-10 times test time• Publicly available software package– Fast algorithms for multiclass SVMs, DPMs– API to adapt to new applications– Support datasets too large to fit in memory– Network interface for online & active learning
Mammal
Primate Hoofed Mammal
Odd-toedGorillaOrangutan
Even-toed
Summary
Cost-sensitive multiclass SVM• 10-50 times faster than
SVMstruct
• As fast as 1-vs-all binary SVM
Deformable part models• 50-1000 faster than– SVMstruct
– Mining hard negatives– SGD-PEGASOS
Mammal
Primate Hoofed Mammal
Odd-toedGorillaOrangutan Even-toed
Binary vs. Structured
Binary Learner
SVM, Boosting,Logistic Regression,
etc.
Object Detection, Pose Registration, Attribute
Prediction, etc.
BIN
ARY
DATA
SET
BIN
ARY
OU
TPU
T
Structured Output
Structured Dataset
𝑌=(𝑥 , 𝑦 ,𝑤 , h)
𝑌=−1𝑌=+1
Binary vs. Structured
Binary Learner
SVM, Boosting,Logistic Regression,
etc.
Object Detection, Pose Registration, Attribute
Prediction, etc.
BIN
ARY
DATA
SET
BIN
ARY
OU
TPU
T
Structured Output
Structured Dataset
• Pros: binary classifier is application independent• Cons: what is lost in terms of:– Accuracy at convergence?– Computational efficiency?
Binary vs. Structured
Structured Prediction Loss∆ (𝑔 (𝑋 ) ,𝑌 𝑔𝑡)
≈ ≈ ∆ 01Binary Loss Convex Upper Bound
Source of Computational Speed
Binary vs. Structured
Structured Prediction Loss∆ (𝑔 (𝑋 ) ,𝑌 𝑔𝑡)
≈ ≈ ∆ 01Binary Loss Convex Upper Bound
ℓ (𝑋 ;𝑤)∆ (𝑔 (𝑋 ) ,𝑌 )
≈Convex Upper Bound on Structured Prediction Loss
Binary vs. Structured
Application-specific optimization algorithms that:– Converge to lower test error than binary solutions– Lower test error for all amounts of train time
Binary vs. Structured
Application-specific optimization algorithms that:– Converge to lower test error than binary solutions– Lower test error for all amounts of train time
Structured SVM• SVMs w/ structured output
• Max-margin MRF [Taskar et al. NIPS’03]
[Tsochantaridis et al. ICML’04]
Binary SVM SolversF aster Linear SVM Solvers
SVM struct𝑂 (𝑇𝑛𝜆𝜖 )
Quadratic to linear in trainset size
SVM perf P EGASOS L IBLINEARCutting Plane SGD≫ ¿ ≥
Binary SVM SolversF aster Linear SVM Solvers
SVM struct𝑂 (𝑇𝑛𝜆𝜖 )
Linear to independent in trainset size
Quadratic to linear in trainset size
SVM perf P EGASOS L IBLINEARCutting Plane SGD≫ ¿ ≥
Binary SVM SolversF aster Linear SVM Solvers
SVM struct𝑂 (𝑇𝑛𝜆𝜖 )
Linear to independent in trainset size
Quadratic to linear in trainset size
• Faster on multiple passes• Detect convergence• Less sensitive to
regularization/learning rate
SVM perf P EGASOS L IBLINEARCutting Plane SGD≫ ¿ ≥
Structured SVM Solvers
SVM perf P EGASOS L IBLINEARCutting Plane SGD
Faster Linear SVM Solvers
≫ ¿ ≥
SVM structCutting Plane SGD¿ ≥Applied to
SSVMs
[Shalev-Shwartz et al. JMLR’13]
[Ratliff et al. AIStats’07]
• Use faster stochastic dual algorithms• Incorporate application-specific importance
sampling routine– Reduce train times when prediction time T is large– Incorporate tricks people use for binary methods
Random Example Importance Sample
Maximize Dual SSVM objective w.r.t. samples
Our Approach
Our ApproachFor t=1… do1. Choose random training example (Xi,Yi)2. ,…,ImportanceSample()3. Approx. maximize Dual SSVM objective w.r.t. iend
Random Example Importance Sample
Maximize Dual SSVM objective w.r.t. samples
(Provably fast convergence for simple approx. solver)
Recent Papers w/ Similar Ideas
• Augmenting cutting plane SSVM w/ m-best solutions
• Applying stochastic dual methods to SSVMsA. Guzman-Rivera, P. Kohli, D. Batra. “DivMCuts…” AISTATS’13.
S. Lacoste-Julien, et al. “Block-Coordinate Frank-Wolfe…” JMLR’13 .
Applying to New Problems
1. Define loss function 2. Implement feature extraction routine3. Implement importance sampling routine
1. Loss function 2. Features 3. Importance sampling routine
Applying to New Problems3. Implement importance sampling routine
a) Is fastb) Favor samples w/ • High loss+• Uncorrelated features: small
Example: Object Detection
1. Loss function 2. Features 3. Importance sampling routine• Add sliding window & loss
into dense score map• Greedy NMS
Example: Deformable Part Models
1. Loss function sum of part losses
2. Features 3. Importance sampling routine• Dynamic programming• Modified NMS to return
diverse set of poses
Cost-Sensitive Multiclass SVM
1. Loss functionClass confusion cost 4
2. Featurese.g., bag-of-words
3. Importance sampling routine• Return all classes• Exact solution using 1
dot product per class
cat dog ant fly car bus cat dog ant fly car bus
Results: CUB-200-2011
• Pose mixture model, 312 part/pose detectors• Occlusion/visibility model• Tree-structured DPM w/ exact inference
Results: CUB-200-2011
5794 training examples 400 training examples
• ~100X faster than mining hard negatives and SVMstruct
• 10-50X faster than stochastic sub-gradient methods• Close to convergence at 1 pass through training set
Results: ImageNet
Comparison to other fast linear SVM solvers
Comparison to other methods for cost-sensitive SVMs
• Faster than LIBLINEAR, PEGASOS• 50X faster than SVMstruct
Conclusion• Orders of magnitude faster than SVMstruct
• Publicly available software package– Fast algorithms for multiclass SVMs, DPMs– API to adapt to new applications– Support datasets too large to fit in memory– Network interface for online & active learning
Mammal
Primate Hoofed Mammal
Odd-toedGorillaOrangutan
Even-toed
Thanks!