vowpal wabbit
TRANSCRIPT
VOWPAL WABBIT Paul Mineiro
O P E ND A T AS C I E N C EC O N F E R E N C E_
BOSTON 2015
@opendatasci
Vowpal Wabbit
Paul MineiroMicrosoft
Vowpal Wabbit: What Is It?
Machine Learning Toolkit and Research VehicleOpen Sourcehttps://github.com/JohnLangford/vowpal_wabbit/
Commercially Supportedhttp://azure.microsoft.com/en-us/services/machine-learning/
Currently sponsored by Microsoft ResearchFormerly sponsored by Yahoo! Research
Vowpal Wabbit: What Is It?
Machine Learning Toolkit and Research VehicleOpen Sourcehttps://github.com/JohnLangford/vowpal_wabbit/
Commercially Supportedhttp://azure.microsoft.com/en-us/services/machine-learning/
Currently sponsored by Microsoft ResearchFormerly sponsored by Yahoo! Research
It’s aMindset!
Iterate quickly
Smash giant data sets
Go beyond classification
Iterate quickly
Sub-Linear Debugging
Key Technology: Online Learning
Key Concept: Progressive Validation Loss
Goal: Rapid Interactive Experimentation
Sub-Linear Debugging
Key Technology: Online Learning
Key Concept: Progressive Validation Loss
Goal: Rapid Interactive Experimentation
Latency killsproductivity.
Sub-Linear Debugging: Pitfalls
Bias-Variance Tradeoffs (``Learning Curves Cross’’)
Lower Bias: model class matches target better.
Lower Variance: fit less sensitive to training set.Ideal: push on both.Usually: pushing on just one, e.g.,
New features: lowering bias, increasing variance.
Regularizing: lowering variance.
Smash giant data sets
There’s no data like more data
Subject to the Bayes limit, larger training sets admit beneficial tradeoffs of bias for variance, potentially resulting in substantially lower generalization error.
There’s no data like more data
Smash giant data sets
Strategy 1: Multinode
Multinode Training
Start cluster spanning daemonStart (many) vw and point them at the daemonTwo strategies available:
iterative (SGD + Averaging)L-BFGS
Both might work poorly for non-convex problems
Multinode Training
Start cluster spanning daemonStart (many) vw and point them at the daemonTwo strategies available:
iterative (SGD + Averaging)L-BFGS
Both might work poorly for non-convex problems
such as matrix factorization
Smash giant data sets
Strategy 2: Multicore
Multicore Training
Start several vw in daemon modeShared (lock-free!) stateSend data to children via netcat
Multicore Training
Start several vw in daemon modeShared (lock-free!) stateSend data to children via netcat
… and then hope for the best.
Go beyond classification
Structured Prediction
Exploration Learning
Go beyond classification
Structured Prediction
Go beyond classification
Structured Prediction: What Is It?Linear DynamicsNon-linear Dynamics
Equilibrium ThermodynamicsNon-equilibrium Thermodynamics
ClassificationStructured Prediction
Structured Prediction: What Is It?Linear DynamicsNon-linear Dynamics
Equilibrium ThermodynamicsNon-equilibrium Thermodynamics
ClassificationStructured Prediction
Shit we understood first
Everything else
Structured Prediction: ExamplesTask Input Output
Image Segmentation
Machine Translation Ces deux principes se tiennent à la croisée de la philosophie, de la politique, de l’économie, de la sociologie et du droit.
Both principles lie at the crossroads of philosophy, politics, economics, sociology, and law.
Syntactic Analysis The monster ate a big sandwich.
The monster ate a big sandwich.
Structured Prediction HaikuA joint prediction
Across a single inputLoss measured jointly
Hal Daumé III
Structured Prediction via Reduction(Imperatively) Define Search Space:
Process your inputMake calls to predictInform vw about losses experienced
Testing uses exactly same code as training
Example: Entity and Relation ExtractionJames Earl Ray pleaded guilty in Memphis, Tenn. to
the assassination of civil rights leader
Martin Luther King Junior.
Example: Entity and Relation ExtractionJames Earl Ray pleaded guilty in Memphis, Tenn. to
the assassination of civil rights leader
Martin Luther King Junior.
Person Location
Person
Example: Entity and Relation ExtractionJames Earl Ray pleaded guilty in Memphis, Tenn. to
the assassination of civil rights leader
Martin Luther King Junior.
Person Location
Person
kill (James Earl Ray, Martin Luther King Junior)
ER Search Space Pseudocodepreds={}foreach pos in input: // left to right
thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)
foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)
preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)
ER Search Space Pseudocodepreds={}foreach pos in input: // left to right
thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)
foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)
preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)
Predict entities
preds={}foreach pos in input: // left to right
thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)
foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)
preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)
ER Search Space PseudocodePredict entities
Predict relations
preds={}foreach pos in input: // left to right
thispred=predict(input,pos,preds,’entity’)preds=preds {(pos,thispred)}if (label) loss(label,thispred,’entity’)
foreach pair in zip(preds,preds): thispred=predict(input,pair,preds,’relation’)
preds=preds {(pos,thispred)}if (label) loss(label,thispred,’relation’)
ER Search Space PseudocodePredict entities
Predict relations
Enforceconstraint
shere
Play with it.https://github.com/JohnLangford/vowpal_wabbit
Ask questions.https://groups.yahoo.com/neo/groups/vowpal_wabbit/info
Have fun.
FIN