kaggle boschコンペ振り返り

Bosch Production Line Performance 2017/1/20

hskksk

• Result•

bosch production line performance

In this competition, Bosch is challenging Kagglers to predict internal failures using thousands of measurements and tests made for each component along the assembly line. This would enable Bosch to bring quality products at lower costs to the end user.

• : 2016/8/17

• : 2016/11/12

• 2016/9

Submissions are evaluated on the Matthews correlation coefficient (MCC) between the predicted and the observed response. The MCC is given by:

where TP is the number of true positives, TN the number of true negatives, FP the number of false positives, and FN the number of false negatives.

Lx_Sy_Dz Lx_Sy_F{z-1} 8

0: 1,176,868 (99.4%)1: 6,879 ( 0.6%)

extremely imbalanced data

Result

• g_votte

• tkm

• hskksk( )

LB(hskksk only)

Public Leaderboard

Private Leaderboard

Top Ten !

• LB(CV )

• ( )

• GCP with R/Python

• Rmarkdown

• xgboost

• github

• 30submit LB

1. Cross-Validation fold

3. MCC

1• Cross-Validation fold

• Predicting Redhat Business Value

Redhat• CV

• CV CV ( )

• fold

•→

• ID→ ID

qqplot

• Station32, 33OK

MCC• Gaussian Process LB

• ,mcc

• LB

Feature engineering

• 25

• 3154

• Forum magic feature

• ID

Station 38

• Station 38!!

• IDStation 38 NA

• bitmap( 17017 )

• bitmap

• Stacking

• xgboost

• objective

Stacking• 2 stacking

• 8 xgboost stacking

• narrow-deep stacking

• deep learning

• Layer

xgboost• base_margin

• dart(Dropouts meet Multiple Additive Regression Trees)

base_marginbase_marginxgboost learn = xgb.DMatrix(...) base_margin = logit( p(y|x))

setinfo(learn, 'base_margin', base_margin) m <- xgb.train( data = learn, ... )

base_margin•

• Dropouts meet Multiple Additive Regression Trees1

• dropout

• 0.5

1 Rashmi, K. V, & Gilad-Bachrach, R. (n.d.). DART: Dropouts meet Multiple Additive Regression Trees, 38.

xgboost• GBDT-feature + Factorization Machines

• GBDT-feature: GBDT tree

• One-hot Encoding → Factorization Machines

• OpenMP libffm

• only libFM

objective

binary:logistic

smoothed-MCC

mcc smoothingxgboost gradient,hessian(diagonal only)

• hskksk Line2 tkm Line0

• 3 fold 1

• MCC LB Feedback

• tkm g_votte

• LB Feedback58

Public Private• tkm submit Public

Score Private

• Publictkm

• submit

• mcc

kaggle• CV LB CV

• fold

• fold CV

kaggle•

• Accuracy confusion matrix

• mcc

• think more, try less 2

2 kaggle (Owen Zhang)

Enjoy Kaggle!

kaggle boschコンペ振り返り

Data & Analytics

kaggle otto group

kaggle ensembling guide

cm utaipei kaggle share

avito duplicate ads detection @ kaggle

beating kaggle the easy way

Станислав Семенов, data scientist, kaggle...

3.22 a1プレゼンボードコンペ用2...3.22...

a la découverte de kaggle

stories behind kaggle competitions

stories behind kaggle competitions with wendy kan from...

뉴스를이용한주식시장 예측(with...

abstract instructor: natalia sizova world data: exploring...

intro to kaggle

yds name コンペ...

kaggle: coupon purchase prediction

kaggle - global data science community

kaggle tradeshift challenge

02 コンペ 001 - koizumi-lt.co.jp

kaggle kdd cup report

before kaggle