johnlangford vw/ · baseline: example for cb loss estimates reward/lossestimationiskey(e.g....
TRANSCRIPT
![Page 1: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/1.jpg)
Vowpal Wabbit 2017 Update
John Langford
http://hunch.net/~vw/
git clonegit://github.com/JohnLangford/vowpal_wabbit.git
![Page 2: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/2.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)
5. Sparse Models6. Baseline (Alberto)7. Optimized Exploration algorithms (Alberto)8. Cost Sensitive Active Learning (Akshay)9. Active Learning to Search (Hal)10. Java Interface (Jon Morra)11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 3: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/3.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)5. Sparse Models
6. Baseline (Alberto)7. Optimized Exploration algorithms (Alberto)8. Cost Sensitive Active Learning (Akshay)9. Active Learning to Search (Hal)10. Java Interface (Jon Morra)11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 4: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/4.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)5. Sparse Models6. Baseline (Alberto)
7. Optimized Exploration algorithms (Alberto)8. Cost Sensitive Active Learning (Akshay)9. Active Learning to Search (Hal)10. Java Interface (Jon Morra)11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 5: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/5.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)5. Sparse Models6. Baseline (Alberto)7. Optimized Exploration algorithms (Alberto)
8. Cost Sensitive Active Learning (Akshay)9. Active Learning to Search (Hal)10. Java Interface (Jon Morra)11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 6: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/6.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)5. Sparse Models6. Baseline (Alberto)7. Optimized Exploration algorithms (Alberto)8. Cost Sensitive Active Learning (Akshay)
9. Active Learning to Search (Hal)10. Java Interface (Jon Morra)11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 7: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/7.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)5. Sparse Models6. Baseline (Alberto)7. Optimized Exploration algorithms (Alberto)8. Cost Sensitive Active Learning (Akshay)9. Active Learning to Search (Hal)
10. Java Interface (Jon Morra)11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 8: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/8.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)5. Sparse Models6. Baseline (Alberto)7. Optimized Exploration algorithms (Alberto)8. Cost Sensitive Active Learning (Akshay)9. Active Learning to Search (Hal)10. Java Interface (Jon Morra)
11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 9: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/9.jpg)
What is Vowpal Wabbit
1. Large Scale linear regression (*)2. Online Learning (*)3. Active Learning (*)4. Learning Reduction (*)5. Sparse Models6. Baseline (Alberto)7. Optimized Exploration algorithms (Alberto)8. Cost Sensitive Active Learning (Akshay)9. Active Learning to Search (Hal)10. Java Interface (Jon Morra)11. JSON/Decision Service (Markus)
(*) Old stuff
![Page 10: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/10.jpg)
Community
1. BSD license.
2. Mailing list >500, Github >1K forks, >1K,>1K issues, >100 contributors
3.
![Page 11: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/11.jpg)
Community
1. BSD license.2. Mailing list >500, Github >1K forks, >1K,
>1K issues, >100 contributors
3.
![Page 12: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/12.jpg)
Community
1. BSD license.2. Mailing list >500, Github >1K forks, >1K,
>1K issues, >100 contributors3.
![Page 13: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/13.jpg)
Sparse Models
Scenario: You want to train a model with manypotential parameters but use little RAM at test time.
Step 1: vw -b 26 --l1 1e-7 <training set>(memory footprint is 1GB)Step 2: vw -t --sparse_weights <test set>(memory footprint is 100MB)
![Page 14: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/14.jpg)
Sparse Models
Scenario: You want to train a model with manypotential parameters but use little RAM at test time.Step 1: vw -b 26 --l1 1e-7 <training set>(memory footprint is 1GB)
Step 2: vw -t --sparse_weights <test set>(memory footprint is 100MB)
![Page 15: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/15.jpg)
Sparse Models
Scenario: You want to train a model with manypotential parameters but use little RAM at test time.Step 1: vw -b 26 --l1 1e-7 <training set>(memory footprint is 1GB)Step 2: vw -t --sparse_weights <test set>(memory footprint is 100MB)
![Page 17: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/17.jpg)
Baseline
Setting: online regressionI Problem: range of targets (e.g. offset) isunknown
I Bias term (weight for “constant” features) canbe slow to learn
I Hurts performance of learning / explorationalgorithms
Goal: adapt quickly and automatically to the rangeof targets
![Page 18: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/18.jpg)
Baseline
Setting: online regressionI Problem: range of targets (e.g. offset) isunknown
I Bias term (weight for “constant” features) canbe slow to learn
I Hurts performance of learning / explorationalgorithms
Goal: adapt quickly and automatically to the rangeof targets
![Page 19: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/19.jpg)
Baseline
Solution:I Learn baseline regressor separately from rest
I From constant features on example --baselineI Or separate global constant example --baseline
--global_only
I Residual regression on the other featuresNote: learning rate multiplied by max label toconverge faster than other normalized updates
![Page 20: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/20.jpg)
Baseline
Solution:I Learn baseline regressor separately from rest
I From constant features on example --baselineI Or separate global constant example --baseline
--global_only
I Residual regression on the other features
Note: learning rate multiplied by max label toconverge faster than other normalized updates
![Page 21: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/21.jpg)
Baseline
Solution:I Learn baseline regressor separately from rest
I From constant features on example --baselineI Or separate global constant example --baseline
--global_only
I Residual regression on the other featuresNote: learning rate multiplied by max label toconverge faster than other normalized updates
![Page 22: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/22.jpg)
Contextual Bandits
![Page 23: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/23.jpg)
Baseline: example for cb loss estimates
Reward/loss estimation is key (e.g. doubly robust)
> vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.050.682315> vw ... --loss0 9 --loss1 100.787594> vw ... --loss0 9 --loss1 10 --baseline0.710636> vw ... --loss0 9 --loss1 10 --baseline --global_only0.636140
![Page 24: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/24.jpg)
Baseline: example for cb loss estimates
Reward/loss estimation is key (e.g. doubly robust)
> vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.050.682315> vw ... --loss0 9 --loss1 100.787594> vw ... --loss0 9 --loss1 10 --baseline0.710636> vw ... --loss0 9 --loss1 10 --baseline --global_only0.636140
![Page 25: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/25.jpg)
Contextual bandits: bagging
Bagging: “bootstrapped Thompson sampling”I Each update is performed Poisson(1) timesI For only one policy, greedy performs better(always update once)
I --bag n --greedify treats first policy like greedyI Often works better, especially for small n
![Page 26: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/26.jpg)
Contextual bandits: bagging
Bagging: “bootstrapped Thompson sampling”I Each update is performed Poisson(1) timesI For only one policy, greedy performs better(always update once)
I --bag n --greedify treats first policy like greedyI Often works better, especially for small n
![Page 27: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/27.jpg)
Contextual bandits: cover
Cover: maintains set of diverse policies good forexplore/exploit
I New parameterization: --cover n [--psi 0.01][--nounif]
I ψ controls diversity cost for training policies(ψ = 0 → all ERM policies)
I εt = 1/√Kt always
I --nounif disables exploration on ε actions (notchosen by any policy)
![Page 28: JohnLangford vw/ · Baseline: example for cb loss estimates Reward/lossestimationiskey(e.g. doublyrobust) > vw ds.txt --cbify 10 --cb_explore_adf --cb_type dr --epsilon 0.05](https://reader034.vdocuments.net/reader034/viewer/2022050516/5fa05ab9cb8575091d4a7d2b/html5/thumbnails/28.jpg)
Contextual bandits: miscellaneous
I Most changes are only in the ADF codeI --cbify K --cb_explore_adf for using ADF codein cbify
I --loss0 [0] –loss1 [1] to specify different lossencodings
I Cover + MTR uses MTR for the first (ERM)policy, DR for the rest
I Upcoming: reduce uniform ε exploration inε-greedy using disagreement test (from activelearning)