kaggle boschコンペ振り返り

63
Bosch Production Line Performance 2017/1/20 hskksk 1

Upload: keisuke-hosaka

Post on 15-Feb-2017

464 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Kaggle boschコンペ振り返り

Bosch Production Line Performance 2017/1/20

hskksk

1

Page 2: Kaggle boschコンペ振り返り

• Result•

2

Page 3: Kaggle boschコンペ振り返り

bosch production line performance

3

Page 4: Kaggle boschコンペ振り返り

4

Page 5: Kaggle boschコンペ振り返り

In this competition, Bosch is challenging Kagglers to predict internal failures using thousands of measurements and tests made for each component along the assembly line. This would enable Bosch to bring quality products at lower costs to the end user.

5

Page 6: Kaggle boschコンペ振り返り

• : 2016/8/17

• : 2016/11/12

• 2016/9

6

Page 7: Kaggle boschコンペ振り返り

Submissions are evaluated on the Matthews correlation coefficient (MCC) between the predicted and the observed response. The MCC is given by:

where TP is the number of true positives, TN the number of true negatives, FP the number of false positives, and FN the number of false negatives.

7

Page 8: Kaggle boschコンペ振り返り

Lx_Sy_Dz Lx_Sy_F{z-1} 8

Page 9: Kaggle boschコンペ振り返り

9

Page 10: Kaggle boschコンペ振り返り

0: 1,176,868 (99.4%)1: 6,879 ( 0.6%)

extremely imbalanced data

10

Page 11: Kaggle boschコンペ振り返り

Result

11

Page 12: Kaggle boschコンペ振り返り

• g_votte

• tkm

• hskksk( )

12

Page 13: Kaggle boschコンペ振り返り

LB(hskksk only)

13

Page 14: Kaggle boschコンペ振り返り

LB( )

14

Page 15: Kaggle boschコンペ振り返り

Public Leaderboard

15

Page 16: Kaggle boschコンペ振り返り

Private Leaderboard

16

Page 17: Kaggle boschコンペ振り返り

Top Ten !

17

Page 18: Kaggle boschコンペ振り返り

18

Page 19: Kaggle boschコンペ振り返り

• LB(CV )

• ( )

19

Page 20: Kaggle boschコンペ振り返り

20

Page 21: Kaggle boschコンペ振り返り

• GCP with R/Python

• Rmarkdown

• xgboost

• github

GCP1

21

Page 22: Kaggle boschコンペ振り返り

CV

22

Page 23: Kaggle boschコンペ振り返り

LB

• 30submit LB

•LB

23

Page 24: Kaggle boschコンペ振り返り

1. Cross-Validation fold

2.

3. MCC

24

Page 25: Kaggle boschコンペ振り返り

1• Cross-Validation fold

• Predicting Redhat Business Value

25

Page 26: Kaggle boschコンペ振り返り

Redhat• CV

• CV CV ( )

• fold

26

Page 27: Kaggle boschコンペ振り返り

•→

• ID→ ID

27

Page 28: Kaggle boschコンペ振り返り

2•

28

Page 29: Kaggle boschコンペ振り返り

qqplot

• Station32, 33OK

29

Page 30: Kaggle boschコンペ振り返り

30

Page 31: Kaggle boschコンペ振り返り

3

MCC• Gaussian Process LB

• ,mcc

• LB

31

Page 32: Kaggle boschコンペ振り返り

Feature engineering

32

Page 33: Kaggle boschコンペ振り返り

• 25

• 3154

33

Page 34: Kaggle boschコンペ振り返り

1. ID

• Forum magic feature

2.

3.

•34

Page 35: Kaggle boschコンペ振り返り

• ID

35

Page 36: Kaggle boschコンペ振り返り

• ID

36

Page 37: Kaggle boschコンペ振り返り

Station 38

• Station 38!!

• IDStation 38 NA

37

Page 38: Kaggle boschコンペ振り返り

ID

38

Page 39: Kaggle boschコンペ振り返り

• bitmap( 17017 )

• bitmap

39

Page 40: Kaggle boschコンペ振り返り

40

Page 41: Kaggle boschコンペ振り返り

41

Page 42: Kaggle boschコンペ振り返り

42

Page 43: Kaggle boschコンペ振り返り

43

Page 44: Kaggle boschコンペ振り返り

44

Page 45: Kaggle boschコンペ振り返り

• Stacking

• xgboost

• xgboost

• objective

45

Page 46: Kaggle boschコンペ振り返り

Stacking• 2 stacking

• 8 xgboost stacking

• narrow-deep stacking

• deep learning

• Layer

46

Page 47: Kaggle boschコンペ振り返り

xgboost• base_margin

• dart(Dropouts meet Multiple Additive Regression Trees)

47

Page 48: Kaggle boschコンペ振り返り

base_marginbase_marginxgboost learn = xgb.DMatrix(...) base_margin = logit( p(y|x))

setinfo(learn, 'base_margin', base_margin) m <- xgb.train( data = learn, ... )

48

Page 49: Kaggle boschコンペ振り返り

base_margin•

dart

49

Page 50: Kaggle boschコンペ振り返り

Dart

• Dropouts meet Multiple Additive Regression Trees1

• dropout

• 0.5

1 Rashmi, K. V, & Gilad-Bachrach, R. (n.d.). DART: Dropouts meet Multiple Additive Regression Trees, 38.

50

Page 51: Kaggle boschコンペ振り返り

xgboost• GBDT-feature + Factorization Machines

• GBDT-feature: GBDT tree

• One-hot Encoding → Factorization Machines

• OpenMP libffm

• only libFM

51

Page 52: Kaggle boschコンペ振り返り

objective

binary:logistic

52

Page 53: Kaggle boschコンペ振り返り

smoothed-MCC

mcc smoothingxgboost gradient,hessian(diagonal only)

53

Page 54: Kaggle boschコンペ振り返り

54

Page 55: Kaggle boschコンペ振り返り

55

Page 56: Kaggle boschコンペ振り返り

56

Page 57: Kaggle boschコンペ振り返り

• hskksk Line2 tkm Line0

57

Page 58: Kaggle boschコンペ振り返り

• 3 fold 1

• MCC LB Feedback

• tkm g_votte

• LB Feedback58

Page 59: Kaggle boschコンペ振り返り

Public Private• tkm submit Public

Score Private

• Publictkm

59

Page 60: Kaggle boschコンペ振り返り

• submit

• mcc

60

Page 61: Kaggle boschコンペ振り返り

kaggle• CV LB CV

• fold

• fold CV

61

Page 62: Kaggle boschコンペ振り返り

kaggle•

• Accuracy confusion matrix

• mcc

• think more, try less 2

2 kaggle (Owen Zhang)

62

Page 63: Kaggle boschコンペ振り返り

Enjoy Kaggle!

63