Алексей Натекин (dm labs, opendatascience): «Градиентный бустинг:...

15
Gradient Boosting New stuff, possibilities and tricks Alex Natekin

Upload: mailru-group

Post on 22-Jan-2018

3.099 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Gradient Boosting New stuff, possibilities and tricks

AlexNatekin

Page 2: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Such boosting

wow much learning

MicrosoftLightGBM

CRAN

Page 3: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Boost our plan for today:

GBM as of May 2017

Inside the black box

Lesser knowncapabilities

Page 4: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

MicrosoftLightGBM

CRAN

•Leaf-wise tree growth

•Histogram-based trees

•Feature & data parallel split search

•Common tasks

•Regularized tree structure

•(new) histogram-based trees

•Feature parallel split search

•Common tasks + full customization

•Vanilla + TONS of tweaks

•Histogram-based optimisation

•Feature parallel split search

•Common tasks, some extensions

•Vanilla •Some tree implementations are plain bad

•As extensible as one wants

Main GBM libraries:

Page 5: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

tree_method = “hist”CRAN

MicrosoftLightGBM

CRAN

Current competition:

Page 6: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Next big boost:

Page 7: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Next big boost:

Page 8: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Such challenge

wow much kaggle

?

7

Page 9: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

• A lot of implementations, great, good and so-so: • Multi-platform solutions outperfrom all: xgboost, lightgbm and h2o • There are many niche packages with specialised boosters, losses and tweaks

• GBM benchmarks: • https://github.com/szilard/benchm-ml • https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-

speed-comparison-17f95cee68b5 • https://medium.com/data-design/benchmarking-lightgbm-how-fast-is-lightgbm-vs-

xgboost-7b5484746ac4

• Next big thing - GBM on GPU, currently in active development: • https://blog.h2o.ai/2017/05/machine-learning-on-gpus/ • Xgboost also has it’s GPU implementation, but H2O wrapped it under it’s framework

GBM as of May 2017

Page 10: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Inside the black-box

Variable importance

Partial dependency plots

Distillation and GBM reconstruction

Page 11: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

• GBM variable importance: • Mostly implemented as gains and frequencies across splits. Don’t trust them • Better approach for a blackbox - via shuffling variables and looking at loss change • Nice packages: https://github.com/limexp/xgbfir/ + https://github.com/Far0n/xgbfi

• Partial dependence plots: • Just fix all variables to mean values and plot prediction grids for chosen variables • Useful for overall model validation and highlighting strong interactions • Very useful for validation key features and (chosen) interactions

• GBM distillation and reconstruction: • Use Xgboost functions predict_leaf_indices • Lasso, glmnet, glinternet • Can actually refit it all

Inside the black-box

Page 12: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Random cool stuff

Varying tree complexity

Tuning: Discrete random FTW

RL: boosting for minecraft

Page 13: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

• You can tweak GBM a lot: • Changing tree depth across iterations (smaller first, deeper afterwards) • Same applies to other parameters (deeper trees might need more randomness)

• Tuning GBM: • Better to tune alpha\eta\shrinkage with fixed number of trees • Packages bring more hyper parameter tweaks, histogram resolution is often useful • Discrete random search works really well, significantly decrease time • H2O has it off-the-shelf https://blog.h2o.ai/2016/06/h2o-gbm-tuning-tutorial-for-r/

• GBM for random cool tasks: • Strange yet working demo for RL and Minecraft https://arxiv.org/pdf/1603.04119.pdf • Some custom GBMs for NER with CRF http://proceedings.mlr.press/v38/chen15b.pdf

Random cool stuff

Page 14: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

1. Lightgbm seems like the go-to 2017 GBM library

2. Await many cool news with GPU (especially H2O + Xgboost)

3. Don’t forget about model inspection and PDP

4. We have the distillation capabilities, why are we not using them?

5. Random search helps a lot with tuning

Summary

Page 15: Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами

Thanks!