kaggle boschコンペ振り返り

Post on 15-Feb-2017

464 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bosch Production Line Performance 2017/1/20

hskksk

1

• Result•

2

bosch production line performance

3

4

In this competition, Bosch is challenging Kagglers to predict internal failures using thousands of measurements and tests made for each component along the assembly line. This would enable Bosch to bring quality products at lower costs to the end user.

5

• : 2016/8/17

• : 2016/11/12

• 2016/9

6

Submissions are evaluated on the Matthews correlation coefficient (MCC) between the predicted and the observed response. The MCC is given by:

where TP is the number of true positives, TN the number of true negatives, FP the number of false positives, and FN the number of false negatives.

7

Lx_Sy_Dz Lx_Sy_F{z-1} 8

9

0: 1,176,868 (99.4%)1: 6,879 ( 0.6%)

extremely imbalanced data

10

Result

11

• g_votte

• tkm

• hskksk( )

12

LB(hskksk only)

13

LB( )

14

Public Leaderboard

15

Private Leaderboard

16

Top Ten !

17

18

• LB(CV )

• ( )

19

20

• GCP with R/Python

• Rmarkdown

• xgboost

• github

GCP1

21

CV

22

LB

• 30submit LB

•LB

23

1. Cross-Validation fold

2.

3. MCC

24

1• Cross-Validation fold

• Predicting Redhat Business Value

25

Redhat• CV

• CV CV ( )

• fold

26

•→

• ID→ ID

27

2•

28

qqplot

• Station32, 33OK

29

30

3

MCC• Gaussian Process LB

• ,mcc

• LB

31

Feature engineering

32

• 25

• 3154

33

1. ID

• Forum magic feature

2.

3.

•34

• ID

35

• ID

36

Station 38

• Station 38!!

• IDStation 38 NA

37

ID

38

• bitmap( 17017 )

• bitmap

39

40

41

42

43

44

• Stacking

• xgboost

• xgboost

• objective

45

Stacking• 2 stacking

• 8 xgboost stacking

• narrow-deep stacking

• deep learning

• Layer

46

xgboost• base_margin

• dart(Dropouts meet Multiple Additive Regression Trees)

47

base_marginbase_marginxgboost learn = xgb.DMatrix(...) base_margin = logit( p(y|x))

setinfo(learn, 'base_margin', base_margin) m <- xgb.train( data = learn, ... )

48

base_margin•

dart

49

Dart

• Dropouts meet Multiple Additive Regression Trees1

• dropout

• 0.5

1 Rashmi, K. V, & Gilad-Bachrach, R. (n.d.). DART: Dropouts meet Multiple Additive Regression Trees, 38.

50

xgboost• GBDT-feature + Factorization Machines

• GBDT-feature: GBDT tree

• One-hot Encoding → Factorization Machines

• OpenMP libffm

• only libFM

51

objective

binary:logistic

52

smoothed-MCC

mcc smoothingxgboost gradient,hessian(diagonal only)

53

54

55

56

• hskksk Line2 tkm Line0

57

• 3 fold 1

• MCC LB Feedback

• tkm g_votte

• LB Feedback58

Public Private• tkm submit Public

Score Private

• Publictkm

59

• submit

• mcc

60

kaggle• CV LB CV

• fold

• fold CV

61

kaggle•

• Accuracy confusion matrix

• mcc

• think more, try less 2

2 kaggle (Owen Zhang)

62

Enjoy Kaggle!

63

top related