heuristic design of experiments w meta gradient search

44
1 Heuristic Design of Experiments with Meta-Gradient Search of Model Training Parameters SF Bay ACM, Data Mining SIG, Feb 28, 2011 http://www.sfbayacm.org/?p=2464 [email protected] www.LinkedIn.com/in/GregMakowski

Upload: greg-makowski

Post on 13-Jul-2015

209 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Heuristic design of experiments w meta gradient search

1

Heuristic Design of Experimentswith Meta-Gradient Search

of Model Training Parameters

SF Bay ACM, Data Mining SIG, Feb 28, 2011http://www.sfbayacm.org/?p=2464

[email protected]/in/GregMakowski

Page 2: Heuristic design of experiments w meta gradient search

2

Choice is good…

But can beoverwhelming

Page 3: Heuristic design of experiments w meta gradient search

Key Questions Discussed• You (a data miner) have many algorithms or

libraries you can use, with many choices…– How to stay organized among all the choices?

• Algorithm parameters• Adjustments in Cost vs. Profit (Type I vs. II error bias)• Metric selection (Lift if acting on top % vs. RMSE or ROC)• Ensemble Modeling, boosting, bagging, stacking• Data versions, preprocessing, trying new fields

– How to plan, and learn as you go?– How simple should you stay ?– to keep descriptiveness vs. Occam’s Razor?

3

Page 4: Heuristic design of experiments w meta gradient search

Outline

Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”

How to Measure Progress

Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt

How to Describe Any Complex System – Sensitivity

4

Page 5: Heuristic design of experiments w meta gradient search

Enterprise MinerSample Data Flow for a Project

5:

(Boxes are expanded in later slides)

LearningTuning

Validation

StratifiedSampling

Page 6: Heuristic design of experiments w meta gradient search

Type I vs. II Error WeightsProfit-Loss Ratios

6

In the Data Source,NOT the Model Engines

In other software,may use a weight field

Need to stay organizedregardless

Page 7: Heuristic design of experiments w meta gradient search

Regression

• It is always good to find thebest linear solution early on– Like testing a null hypothesis:

(linear vs. non-linear) problem

• Can feed “score” or “residualerror” as a source field intonon-linear models

7

Page 8: Heuristic design of experiments w meta gradient search

Neural Net Architectureand Parameters

8

field 1

field

2

$

c

$

$

$$

$

$ $

$$$ $$

$

c

c

c

c

c

c

c

c

c

c

c

c

c

c

c

c c

c

c

c cc

c

c

A Neural NetSolution

“Non-Linear”

Severalregions

which arenot adjacent

MLP

RBF

Page 9: Heuristic design of experiments w meta gradient search

A Comparison of a Neural Netand Regression

9

A Logistic regression formula:Y = f( a0 + a1*X1 + a2*X2 + a3*X3)

a* are coefficients

Backpropagation, cast in a similar form:H1 = f(w0 + w1*I1 + w2*I2 + w3*I3)H2 = f(w4 + w5*I1 + w6*I2 + w7*I3):

Hn = f(w8 + w9*I1 + w10*I2 + w11*I3)

O1 = f(w12 + w13*H1 + .... + w15*Hn)On = ....

w* are weights, AKA coefficientsI1..In are input nodes or input variables.H1..Hn are hidden nodes, which extract features of the data.O1..On are the outputs, which group disjoint categories.f() is the SIGMOID function, a non-linear “S” curve

it is very noisy in the brain – chemical depletion of neurotransmitters

a0

a1 a2 a3

X1 X2 X3

Y

Input 1 I2 I3

Bias

H1 Hidden 2

Output

w1w2

w3

Direct connect

Page 10: Heuristic design of experiments w meta gradient search

Neural Net• Network Architecture can be linear

(MLP) or circular (many RBF)

• Network Direct Connection allowsinputs to connect to output (to find thesimple, linear solution first)

• Network Hidden Units can go up to64 (much better than 8)

• Profit/Loss uses settings in Data Source10

Page 11: Heuristic design of experiments w meta gradient search

11

What does a DecisionTree Look Like?

Split 3

Age

Income

Split 1

Split 2

$$$$

$

$$$

$$$ $

$

$$$

$$

$

$

c

c c

c

c

cc

cc

cc

c$

Leaf 3

Leaf 4

Leaf 1

Leaf 2

If (Age < Split1) then:…If (Income > Split2) then Leaf1 with dollar_avg1:…If (Income < Split2) then Leaf2 with dollar_avg2If (Age > Split1) then:…If (Income > Split3) then Leaf3 with dollar_avg3:…If (Income < Split3) then Leaf4 with dollar_avg4

Leaf 1

Split 1

Split 2 Split 3

Leaf 2 Leaf 3 Leaf 4

TreeDepth = 2

Page 12: Heuristic design of experiments w meta gradient search

Decision Tree

• Primary Parameters to vary– Criterion

• Probchisq (Default)• Entropy• Gini

– Assessment (Decision vs. Lift)

– Tree size (depth, leaf size, Xvalid)

12

Page 13: Heuristic design of experiments w meta gradient search

Gradient Boosting(Tree Based)

Based on “Greedy FunctionApproximation: A GradientBoosting Machine” by JeromeFriedmanEach new CART tree:

• is on a 60% random sample• Is a small, general tree• Forecasts the error from the forecast

from all previous trees summed• May have 50 to 2,000 trees in a

sequence• Evaluate how far “back” in sequence

to prune13

Page 14: Heuristic design of experiments w meta gradient search

DM Algorithms Available in Packages

14

Regres-s ion

LassoReg

Decis ionTree

Neura lNet

SupportVectorMach

Other TOT

2 1 0 0 0 1 40 0 1 0 0 0 13 0 3 3 0 3 121 0 1 0 1 1 40 0 4 0 0 0 43 2 5 3 2 3 180 0 0 0 0 5 5

# Modules per Forecasting Family in DM Software

Page 15: Heuristic design of experiments w meta gradient search

Feel Overwhelmed on Lots of ComplexAlgorithm Parameters? GOOD!

• A deep understanding of algorithms, math andassumptions helps significantly Heuristics– i.e. typically, regression has a problem with correlating

inputs because the solution calculation uses matrixinversion (if you are worried about weight sign inversion)

– SVM’s or Bayesian Nets do not have this problem,because they are solved differently.

• Don’t have a problem with correlating inputs, input selectionbecomes more random – but you still get a decent solution

• How can you manage the details?– I am glad you asked…. Moving on to the next section

15

Page 16: Heuristic design of experiments w meta gradient search

Outline

Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”

How to Measure Progress

Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt

How to Describe Any Complex System – Sensitivity

16

Page 17: Heuristic design of experiments w meta gradient search

Model Exploration Process• Scientific Method of

Hypothesis Test– If you change ONE thing, than any change

in the results is because of that onechange

– Design of Experiments (DOE), test plan– Best to compare model settings on same

data version• New data versions add new preprocessed fields,

or new months (records)– Key design objective: all experiments are

reproducible• SAME Random split between Learning – Test –

Validation, with a consistent random seed– LTV split before loading data in a tool, so same

partitioning for all tools/libraries/algorithms

Page 18: Heuristic design of experiments w meta gradient search

Model Notebook

18

Train ValGap =Abs(

Trn-Val)

ConsrvResult

varsofferd

varselct

1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60

varsofferd

HidnNodes

DirectConn

Arch

1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.411 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.581 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.871 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.411 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98

Param1

Param2

Param3

Input Parameters Outcomes

Lift in Top 10%

VarsSeltd

TrnTime

DataVer

AlgorModNum

Bad vs. Good

Page 19: Heuristic design of experiments w meta gradient search

Model NotebookOutcome Details

• My Heuristic Design Objectives: (yours may be different)

– Accuracy in deployment– Reliability and consistent behavior, a general solution

• Use one or more hold-out data sets to check consistency• Penalize more, as the forecast becomes less consistent

– No penalty for model complexity (if it validates consistently)• Let me drive a car to work, instead limiting me to a bike

– Message for check writer– Don’t consider only Occam’s Razor: value consistent good results

– Develop a “smooth, continuous metric” to sort and findmodels that perform “best” in future deployment

19

Page 20: Heuristic design of experiments w meta gradient search

Model NotebookOutcome Details

• Training = results on the training set• Validation = results on the validation hold out• Gap = abs( Training – Validation )

A bigger gap (volatility) is a bigger concern for deployment, a symptomMinimize Senior VP Heart attacks! (one penalty for volatility)Set expectations & meet expectationsRegularization helps significantly

• Conservative Result= worst( Training, Validation) + Gap_penaltyCorr / Lift / Profit higher is better: Cons Result = min(Trn, Val) - GapMAD / RMSE / Risk lower is better: Cons Result = max(Trn, Val) + Gap

Business Value or Pain ranking = function of( conservative result )20

Page 21: Heuristic design of experiments w meta gradient search

Model Notebook

21

Train ValGap =Abs(

Trn-Val)

ConsrvResult

varsofferd

varselct

1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60

varsofferd

HidnNodes

DirectConn

Arch

1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.411 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.581 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.871 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.411 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98

Param1

Param2

Param3

Input Parameters Outcomes

Lift in Top 10%

VarsSeltd

TrnTime

DataVer

AlgorModNum

Bad vs. Good

Page 22: Heuristic design of experiments w meta gradient search

Model Notebook ProcessTracking Detail Training the Data Miner

DataVer

Author

AlgorModNum

chngfromprior

Param 1 Param 2 Param 3 Param 4 Param 5 Param 6 Param 7VarSel

Trntime(sec)

Train ValGap =Abs(

Trn-Val)

ConsrvResult

Train ValGap =Abs(

Trn-Val)

ConsrvResult

Train ValGap =Abs(

Trn-Val)

ConsrvResult

DataVer

Author

AlgorModNum

chngfromprior

varsoffered

varselectn

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

1 GM B logistic 1 0 27 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.111 GM B logistic 2 1 19 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.111 GM B logistic 3 1 6, no dbc stepws 4 7.51 1.98 5.53 -3.55 4.90 3.96 0.94 3.02 investigate inconsistency

1 GM B logistic 4 113, only

dbcstepws 7 9.58 7.33 2.25 5.08 6.59 5.25 1.34 3.91

DataVer

Author

AlgorModNum

chngfromprior

varsoffered

regr typevar

selectn2-factorinteract

polynomVarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

1 GM regr 1 0 27 logistic stepws n 9 12 5.77 5.94 0.17 5.60 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.481 GM regr 2 1 27 logistic stepws Yes 9 16 5.76 5.94 0.18 5.58 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.481 GM regr 3 1 27 logistic stepws n 2 10 57 5.86 6.93 1.07 4.79 3.48 5.03 1.55 1.93 2.32 2.61 0.29 2.031 GM regr 4 1 27 logistic stepws Yes 2 11 58 5.86 6.93 1.07 4.79 3.48 5.04 1.56 1.92 2.32 2.92 0.60 1.72

4 GM regr 5 4 3 logistic stepwise Yes 2 8 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 6 5 28 logistic stepwise Yes 24 GM regr 7 5 3 logistic stepwise n 2 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 8 5 3 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 9 5 3 logistic stepwise Yes 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 10 8 28 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 11 5 3 logistic stepwise Yes 3 6 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.984 GM regr 12 5 3 logistic stepwise Yes 4 2 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98

4n GM regr 13 11 3 logistic stepwise Yes 3 6 78 18.39 18.79 0.39 18.00 9.58 9.55 0.03 9.52 4.96 4.92 0.03 4.89

4n GM regr 14 11 3 6 78 12.49 12.12 0.36 11.76 7.63 7.42 0.20 7.22 4.29 4.47 0.18 4.12

1 GM DM Regr 1 0 27 logistic stepws 13 15 12.00 3.17 8.83 -5.66 7.21 4.16 3.05 1.11 4.28 3.07 1.21 1.86

4 GM DM Regr 2 0 28max v3000

min rsq0.005

useaov16 var

YES6 72 16.27 15.76 0.52 15.24 8.67 8.03 0.64 7.39 4.58 4.24 0.34 3.90

1 GM PLS 1 01 GM PLS 2 1 27 default default default default 4 18 11.26 3.08 8.18 -5.10 7.12 4.85 2.27 2.58 4.28 3.12 1.16 1.961 GM PLS 3 1 Test Set Cros Val didn't finish, don't use Xvalidation4 GM PLS 4 0 28 PLS NIPALS 200 28 122 16.63 15.76 0.87 14.89 8.93 8.03 0.90 7.13 4.76 4.32 0.45 3.87

DataVer

Author

AlgorModNum

chngfromprior

varsoffered

hiddenDirectConn ?

archVarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

1 GM AutoNrl 1 0 27 2 n MLP all 35 4.19 3.76 0.43 3.33 2.47 2.57 0.10 2.37 1.77 1.88 0.11 1.661 GM AutoNrl 2 1 27 6 n MLP all 189 4.37 2.77 1.60 1.17 2.82 1.78 1.04 0.74 1.98 1.93 0.05 1.88

1 GM AutoNrl 3 1 27 8 n MLPtrn action= search

all 532 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29

1 GM AutoNrl 4 1 27 8 n MLPactiv =logistic

all 356 5.12 2.97 2.15 0.82 3.02 3.37 0.35 2.67 1.90 2.57 0.67 1.23

1 GM AutoNrl 5 1 27 6 n MLParch =block

all 130 0.89 0.97 0.08 0.81

1 GM AutoNrl 6 1 27 6 n MLParch =funnel

all 595 1.36 1.08 0.28 0.80

4 GM AutoNrl 7 1 28 6 n MLP all 1201 16.2722 15.76 0.51 15.24 8.65 7.88 0.77 7.11 4.46 4.24 0.22 4.03

DataVer

Author

AlgorModNum

chngfromprior

varsoffered

hiddenDirectConn ?

arch DecayDecisionWeight

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

1 GM Neural 1 0 27 3 n MLP all 77 6.65 10.89 4.24 2.41 3.90 6.53 2.63 1.27 2.52 3.96 1.44 1.081 GM Neural 2 1 27 10 n MLP all 40 6.88 6.73 0.15 6.58 3.97 4.55 0.58 3.39 2.56 3.02 0.46 2.101 GM Neural 3 1 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 3.49 5.45 1.96 1.53 2.32 3.22 0.90 1.421 GM Neural 4 1 27 10 n RBF (orbfeq) all 34 5.67 5.54 0.13 5.41 3.25 4.85 1.60 1.65 2.20 3.22 1.02 1.181 GM Neural 5 1 27 10 Y RBF all 35 5.95 7.92 1.97 3.98 3.48 4.85 1.37 2.11 2.31 3.17 0.86 1.45

js1 JS Neural 6 0 17 5 n MLP Softmax 10,-5,-1,0 all 6.03 6.53 0.50 5.53 3.40 4.55 1.15 2.25 2.67 3.36 0.69 1.98js1 JS Neural 7 6 15 5 Y MLP Softmax 10,-5,-1,0 all 6.14 5.74 0.40 5.34 3.59 2.97 0.62 2.35 2.77 2.37 0.40 1.97js1 JS Neural 8 6 15 3 Y MLP Softmax 0.5 10,-5,-1,0 all 6.27 7.13 0.86 5.41 3.54 3.56 0.02 3.52 2.74 2.57 0.17 2.40js1 JS Neural 9 6 15 3 n MLP Softmax 0.5 10,-5,-1,0 all 6.27 6.33 0.06 6.21 3.57 4.65 1.08 2.49 2.76 2.82 0.06 2.702 GM Neural 10 2 35 12 Y MLP 20,0,-1,0 all3 GM Neural 11 2 45 20 n MLP 20,0,-1,0 all 18 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.913 GM Neural 12 11 45 20 n MLP 0.8 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.913 GM Neural 13 11 45 20 n MLP 0.6 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.914 GM Neural 14 11 3 20 n MLP 0.01 20,0,-1,0 all 204 16.39 15.15 1.24 13.91 8.67 8.03 0.64 7.39 4.82 4.39 0.43 3.974 GM Neural 15 11 28 20 n MLP 0.01 20,0,-1,0 all 713 16.39 15.76 0.63 15.12 8.54 7.88 0.66 7.22 4.40 4.25 0.15 4.114 GM Neural 16 15 31 40 n MLP 0.01 20,0,-1,0 all 782 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.60 4.77 0.17 4.444 GM Neural 17 15 same, max iter 20 --> 50 all 1754 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.66 4.77 0.11 4.55

4 GM Neural 18 1629 (notwoYr)

40 0 0 all 18.386 18.98 18.18 0.80 17.38 9.25 9.59 0.34 8.90 4.67 4.86 0.20 4.47

4n GM DMNeural 19 0 13 3 n all 19 10.60 2.57 8.03 -5.46 6.93 4.36 2.57 1.79 4.14 2.57 1.57 1.00

didn't finish, out of memory

recent_serrtrn_dbc changed to recent_serrtrn_flag(does DBC on ser patt help? YES)

add Feb & Mar torecent*

same, max iter 20 --> 50

Model NotebookProject = Transit, Last Update 5/6/2010

OutcomesStatus Lift in Top 10% Over File Avg

OutcomesLift in Top 20% Over File Avg

Input Parameters OutcomesLift in Top 5% Over File Avg

Input / Test Outcome

Regression

Top5%

Top10%

Top20%

AutoNeural

Neural

Yippeee!

More

Heuristic Strategy:1) Try a few models of many

algorithm types (seed thesearch)

2) Opportunistically spendmore effort on what isworking (invest in top stocks)

3) Still try a few trials onmedium success (diversify,limited by project time-box)

4) Try ensemble methods,combining model forecasts& top source vars w/modelThe Data Mining Battle Field

Page 23: Heuristic design of experiments w meta gradient search

Model Notebook ProcessTracking Detail Training the Data Miner

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

varsoffered

criterionmax

depthleaf size

asses =5% Lift

DecisionWeight

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2748 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2749 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9150 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.1051 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9152 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9153 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9154 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.5455 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54

56 1 GM Dec Tree 10 3 27 entropy 6 5obs

import =Y

20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91

57 1 GM Dec Tree 11 3 27 entropy 6 5asses =5% Lift

20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91

58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9146 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.9447 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.5348 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.8549 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8450 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9551 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9552 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.2453 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8454 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.9855 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1256 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1257 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift 20,0,-1,0 19 26 18.94 15.42 3.52 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2258 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2259 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0560 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0561 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5262 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5263 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73

64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratioitm_to_led

20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14

65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1466 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1467 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1468 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.2264 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94

ex=20knode smp

= 30k

65 2 GM Dec Tree 37 1914, raw

onlyentropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57

5.28 2.15 0.41

66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5867 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5868 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5869 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.8070 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.3871 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6772 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.4173 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift sub=lrg 20,0,-5,1 16 52 14.35 16.16 1.81 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.7874 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.2575 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.2476 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.3477 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.2078 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.2179 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.1880 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.0581 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.3282 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6783 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6784 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67

85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.5186 4 GM Dec Tree 58 57 107 18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.3587 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.9188 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.7389 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.9090 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.4091 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift sub=lrg 20,0,-5,1 9 60 16.17 14.66 1.50 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.7692 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.5193 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.4094 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.2995 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.4996 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50

97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.5898 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64

use RAW vars ONLY, to test value of my preprocessing

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

binarymodel

cleanupmodel

max numrips

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.8395 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.5396 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3797 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3798 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.3699 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75

100 1 GM Rule Ind 7 1 tree neural 4 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75

101 3 GM Rule Ind 8 1 tree neural 16 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92102 3 GM Rule Ind 9 1 tree neural 8 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92103 3 GM Rule Ind 10 1 tree neural 32 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

N Iter Shrinkmax

depthPerf

Huber M-Regr

DecisionWeight

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

104 1 GM Grad Boost 1 0 50 0.10 2 disk No 20,0,-5,0 6 12 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71105 1 GM Grad Boost 2 1 50 0.10 2 RAM No 20,0,-5,0 6 59 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71106 1 GM Grad Boost 3 1 200 0.10 2 disk No 20,0,-5,0 19 429 13.77 12.67 1.10 11.57 8.28 7.23 1.05 6.18 4.53 4.06 0.47 3.59107 1 GM Grad Boost 4 1 50 0.10 4 disk No 20,0,-5,0 18 139 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72108 1 GM Grad Boost 5 1 200 0.10 4 disk No 20,0,-5,0 13 401 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72109 1 GM Grad Boost 6 1 1000 0.10 2 disk No 20,0,-5,0 13 2003 11.60 12.67 1.07 10.53 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72110 1 GM Grad Boost 7 3 200 0.05 2 disk No 20,0,-5,0 16 367 13.49 12.67 0.82 11.85 7.85 7.33 0.52 6.81 4.49 4.11 0.38 3.73111 1 GM Grad Boost 8 3 200 0.20 2 disk No 20,0,-5,0 13 263 13.49 12.48 1.01 11.47 7.80 7.43 0.37 7.06 4.53 4.08 0.45 3.63112 1 GM Grad Boost 9 3 200 0.02 2 (stop No 20,0,-5,0 10 285 11.26 12.67 1.41 9.85 7.53 7.08 0.45 6.63 4.42 4.12 0.30 3.82113 1 GM Grad Boost 10 3 200 0.07 2 invest- No 20,0,-5,0 19 410 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71114 1 GM Grad Boost 11 10 400 0.07 2 igating) No 20,0,-5,0 19 618 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71115 1 GM Grad Boost 12 10 400 0.07 2 Huber .6 20,0,-5,0 17 569 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67116 1 GM Grad Boost 13 10 200 0.07 2 Huber .8 20,0,-5,0 17 364 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67117 1 GM Grad Boost 14 10 200 0.07 2 Huber 1 20,0,-5,0 17 366 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

N Iter Shrinkmax

depthLeaf

FractionHuber M-

RegrDecisionWeight

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

118 JS 1 JS Grad Boost 15 1 50 0.10 2 0.05 No 10,-2.5,-1,0 ? ? 13.22 13.77 0.55 12.67 8.14 7.64 0.50 7.14 4.41 4.44 0.03 4.38119 JS 1 JS Grad Boost 16 15 100 0.08 2 0.05 No 10,-2.5,-1,0 ? ? 16.27 13.33 2.94 10.39 8.47 7.33 1.14 6.19 4.49 4.22 0.27 3.95120 JS 1 JS Grad Boost 17 15 100 0.15 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.33 2.60 10.73 9.15 7.78 1.37 6.41 4.54 4.29 0.25 4.04121 JS 1 JS Grad Boost 18 15 100 0.20 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.77 2.16 11.61 8.57 8.22 0.35 7.87 4.53 4.44 0.09 4.35122 2 GM Grad Boost 19 10 400 0.07 2 default No 20,0,-5,0 23 1717 16.09 14.79 1.30 13.49 8.84 8.13 0.71 7.42 4.56 3.50 1.06 2.44123 2 GM Grad Boost 20 15 50 0.10 2 0.05 No 10,-2.5,-1,0 14 239 14.31 13.75 0.56 13.19 8.36 7.71 0.65 7.06 4.53 4.43 0.10 4.33

124 2 GM Grad Boost 21 10 200 0.10 2 0.1ex=20k

node smp= 30k

20,0,-1,0 19 818 15.38 14.38 1.00 13.38 8.62 7.81 0.81 7.00 4.44 4.38 0.06 4.32

125 2 GM Grad Boost 22 19 50 0.10 3 0.05 20,0,-5,0 21 355 14.93 13.96 0.97 12.99 8.44 8.02 0.42 7.60 4.56 4.43 0.13 4.30126 2 GM Grad Boost 23 19 500 0.10 3 0.05 20,0,-5,0 28 4058 17.33 15.42 1.91 13.51 9.11 8.54 0.57 7.97 4.71 4.53 0.18 4.35

127 3 GM Grad Boost 24 23 200 0.07 3 0.05 20,0,-1,0 20 2915 13.83 16.67 2.84 10.99 7.36 8.54 1.18 6.18 4.01 4.58 0.58 3.43128 3 GM Grad Boost 25 23 500 0.07 3 0.05 20,0,-5,0 21 8012 14.10 16.25 2.15 11.94 7.66 9.38 1.72 5.94 4.08 4.69 0.61 3.47129 3 GM Grad Boost 26 23 1000 0.07 3 0.05 20,0,-5,0 22 14243 14.10 16.25 2.15 11.96 7.66 9.38 1.72 5.94 4.10 4.79 0.69 3.42130 3 GM Grad Boost 27 23 200 0.10 3 0.05 20,0,-5,0 19 3100 13.88 16.67 2.78 11.10 7.58 8.85 1.28 6.30 3.98 4.69 0.71 3.27131 3 GM Grad Boost 28 23 200 0.15 3 0.05 20,0,-5,0 19 3088 13.88 15.83 1.95 11.94 7.52 8.75 1.23 6.29 4.02 4.69 0.67 3.35132 3 GM Grad Boost 29 23 200 0.20 3 0.05 20,0,-5,0 17 2570 14.16 16.25 2.09 12.07 7.44 9.17 1.73 5.71 4.03 4.69 0.66 3.37133 3 GM Grad Boost 30 23 200 0.25 3 0.05 20,0,-5,0 17 2798 14.16 16.25 2.09 12.07 7.63 9.17 1.54 6.10 4.16 4.90 0.74 3.42134 3 GM Grad Boost 31 23 200 0.30 3 0.05 20,0,-5,0 17 2882 14.38 17.08 2.70 11.68 7.69 8.96 1.27 6.41 4.27 4.90 0.62 3.65135 3 GM Grad Boost 32 23 600 0.20 3 0.05 20,0,-5,0 27 10956 15.70 17.08 1.38 14.32 8.29 9.58 1.29 7.00 4.44 5.00 0.57 3.87

136 4 GM Grad Boost 33 32 50 0.20 3 0.05 20,0,-5,0 1 852 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39137 4 GM Grad Boost 34 32 1

138 4 GM Grad Boost 35 32 15 980 14.62 13.94 0.68 13.26 8.14 7.42 0.71 6.71 4.22 4.03 0.19 3.84

(tree settings the same, dropped INT* categorical vars, not DBC)

(same Grad Boost settings. Exclude 3 & 4 way INT*dbc vars, like the same bad results, 1 var =

(same Grad Boost settings. Exclude all INT*dbc vars, using just 62)

improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19)

interactions are getting selected, improve Trn results butdecrease Val results. Perhaps I should regen the INT*dbc with alarger number of min records.

DecisionTreeData Version 1

More

DecisionTreeData Version 2

DecisionTreeData Version 3

DecisionTreeData Version 4

“Agile Software Design”Get something simple,fully working and testedearly on (Data Version 1)

Data Version 2…4Working, incremental improvementsIncremental complexity

Different preprocessingAdd more fields, recordsAdd & test morecomplexity

Page 24: Heuristic design of experiments w meta gradient search

Model Notebook ProcessTracking Detail Training the Data Miner

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

varsoffered

criterionmax

depthleaf size

asses =5% Lift

DecisionWeight

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2748 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2749 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9150 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.1051 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9152 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9153 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9154 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.5455 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54

56 1 GM Dec Tree 10 3 27 entropy 6 5obs

import =Y

20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91

57 1 GM Dec Tree 11 3 27 entropy 6 5asses =5% Lift

20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91

58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9146 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.9447 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.5348 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.8549 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8450 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9551 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9552 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.2453 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8454 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.9855 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1256 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1257 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift 20,0,-1,0 19 26 18.94 15.42 3.52 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2258 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2259 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0560 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0561 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5262 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5263 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73

64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratioitm_to_led

20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14

65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1466 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1467 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1468 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.2264 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94

ex=20knode smp

= 30k

65 2 GM Dec Tree 37 1914, raw

onlyentropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57

5.28 2.15 0.41

66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5867 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5868 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5869 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.8070 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.3871 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6772 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.4173 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift sub=lrg 20,0,-5,1 16 52 14.35 16.16 1.81 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.7874 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.2575 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.2476 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.3477 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.2078 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.2179 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.1880 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.0581 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.3282 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6783 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6784 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67

85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.5186 4 GM Dec Tree 58 57 107 18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.3587 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.9188 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.7389 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.9090 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.4091 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift sub=lrg 20,0,-5,1 9 60 16.17 14.66 1.50 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.7692 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.5193 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.4094 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.2995 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.4996 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50

97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.5898 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64

use RAW vars ONLY, to test value of my preprocessing

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

binarymodel

cleanupmodel

max numrips

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.8395 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.5396 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3797 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3798 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.3699 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75

100 1 GM Rule Ind 7 1 tree neural 4 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75

101 3 GM Rule Ind 8 1 tree neural 16 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92102 3 GM Rule Ind 9 1 tree neural 8 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92103 3 GM Rule Ind 10 1 tree neural 32 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

N Iter Shrinkmax

depthPerf

Huber M-Regr

DecisionWeight

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

104 1 GM Grad Boost 1 0 50 0.10 2 disk No 20,0,-5,0 6 12 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71105 1 GM Grad Boost 2 1 50 0.10 2 RAM No 20,0,-5,0 6 59 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71106 1 GM Grad Boost 3 1 200 0.10 2 disk No 20,0,-5,0 19 429 13.77 12.67 1.10 11.57 8.28 7.23 1.05 6.18 4.53 4.06 0.47 3.59107 1 GM Grad Boost 4 1 50 0.10 4 disk No 20,0,-5,0 18 139 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72108 1 GM Grad Boost 5 1 200 0.10 4 disk No 20,0,-5,0 13 401 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72109 1 GM Grad Boost 6 1 1000 0.10 2 disk No 20,0,-5,0 13 2003 11.60 12.67 1.07 10.53 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72110 1 GM Grad Boost 7 3 200 0.05 2 disk No 20,0,-5,0 16 367 13.49 12.67 0.82 11.85 7.85 7.33 0.52 6.81 4.49 4.11 0.38 3.73111 1 GM Grad Boost 8 3 200 0.20 2 disk No 20,0,-5,0 13 263 13.49 12.48 1.01 11.47 7.80 7.43 0.37 7.06 4.53 4.08 0.45 3.63112 1 GM Grad Boost 9 3 200 0.02 2 (stop No 20,0,-5,0 10 285 11.26 12.67 1.41 9.85 7.53 7.08 0.45 6.63 4.42 4.12 0.30 3.82113 1 GM Grad Boost 10 3 200 0.07 2 invest- No 20,0,-5,0 19 410 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71114 1 GM Grad Boost 11 10 400 0.07 2 igating) No 20,0,-5,0 19 618 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71115 1 GM Grad Boost 12 10 400 0.07 2 Huber .6 20,0,-5,0 17 569 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67116 1 GM Grad Boost 13 10 200 0.07 2 Huber .8 20,0,-5,0 17 364 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67117 1 GM Grad Boost 14 10 200 0.07 2 Huber 1 20,0,-5,0 17 366 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67

Mcnt

DataVer

Author

AlgorModNum

chngfromprior

N Iter Shrinkmax

depthLeaf

FractionHuber M-

RegrDecisionWeight

VarSel

TrnTime

Train Val GapConsrvResult

Train Val GapConsrvResult

Train Val GapConsrvResult

118 JS 1 JS Grad Boost 15 1 50 0.10 2 0.05 No 10,-2.5,-1,0 ? ? 13.22 13.77 0.55 12.67 8.14 7.64 0.50 7.14 4.41 4.44 0.03 4.38119 JS 1 JS Grad Boost 16 15 100 0.08 2 0.05 No 10,-2.5,-1,0 ? ? 16.27 13.33 2.94 10.39 8.47 7.33 1.14 6.19 4.49 4.22 0.27 3.95120 JS 1 JS Grad Boost 17 15 100 0.15 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.33 2.60 10.73 9.15 7.78 1.37 6.41 4.54 4.29 0.25 4.04121 JS 1 JS Grad Boost 18 15 100 0.20 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.77 2.16 11.61 8.57 8.22 0.35 7.87 4.53 4.44 0.09 4.35122 2 GM Grad Boost 19 10 400 0.07 2 default No 20,0,-5,0 23 1717 16.09 14.79 1.30 13.49 8.84 8.13 0.71 7.42 4.56 3.50 1.06 2.44123 2 GM Grad Boost 20 15 50 0.10 2 0.05 No 10,-2.5,-1,0 14 239 14.31 13.75 0.56 13.19 8.36 7.71 0.65 7.06 4.53 4.43 0.10 4.33

124 2 GM Grad Boost 21 10 200 0.10 2 0.1ex=20k

node smp= 30k

20,0,-1,0 19 818 15.38 14.38 1.00 13.38 8.62 7.81 0.81 7.00 4.44 4.38 0.06 4.32

125 2 GM Grad Boost 22 19 50 0.10 3 0.05 20,0,-5,0 21 355 14.93 13.96 0.97 12.99 8.44 8.02 0.42 7.60 4.56 4.43 0.13 4.30126 2 GM Grad Boost 23 19 500 0.10 3 0.05 20,0,-5,0 28 4058 17.33 15.42 1.91 13.51 9.11 8.54 0.57 7.97 4.71 4.53 0.18 4.35

127 3 GM Grad Boost 24 23 200 0.07 3 0.05 20,0,-1,0 20 2915 13.83 16.67 2.84 10.99 7.36 8.54 1.18 6.18 4.01 4.58 0.58 3.43128 3 GM Grad Boost 25 23 500 0.07 3 0.05 20,0,-5,0 21 8012 14.10 16.25 2.15 11.94 7.66 9.38 1.72 5.94 4.08 4.69 0.61 3.47129 3 GM Grad Boost 26 23 1000 0.07 3 0.05 20,0,-5,0 22 14243 14.10 16.25 2.15 11.96 7.66 9.38 1.72 5.94 4.10 4.79 0.69 3.42130 3 GM Grad Boost 27 23 200 0.10 3 0.05 20,0,-5,0 19 3100 13.88 16.67 2.78 11.10 7.58 8.85 1.28 6.30 3.98 4.69 0.71 3.27131 3 GM Grad Boost 28 23 200 0.15 3 0.05 20,0,-5,0 19 3088 13.88 15.83 1.95 11.94 7.52 8.75 1.23 6.29 4.02 4.69 0.67 3.35132 3 GM Grad Boost 29 23 200 0.20 3 0.05 20,0,-5,0 17 2570 14.16 16.25 2.09 12.07 7.44 9.17 1.73 5.71 4.03 4.69 0.66 3.37133 3 GM Grad Boost 30 23 200 0.25 3 0.05 20,0,-5,0 17 2798 14.16 16.25 2.09 12.07 7.63 9.17 1.54 6.10 4.16 4.90 0.74 3.42134 3 GM Grad Boost 31 23 200 0.30 3 0.05 20,0,-5,0 17 2882 14.38 17.08 2.70 11.68 7.69 8.96 1.27 6.41 4.27 4.90 0.62 3.65135 3 GM Grad Boost 32 23 600 0.20 3 0.05 20,0,-5,0 27 10956 15.70 17.08 1.38 14.32 8.29 9.58 1.29 7.00 4.44 5.00 0.57 3.87

136 4 GM Grad Boost 33 32 50 0.20 3 0.05 20,0,-5,0 1 852 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39137 4 GM Grad Boost 34 32 1

138 4 GM Grad Boost 35 32 15 980 14.62 13.94 0.68 13.26 8.14 7.42 0.71 6.71 4.22 4.03 0.19 3.84

(tree settings the same, dropped INT* categorical vars, not DBC)

(same Grad Boost settings. Exclude 3 & 4 way INT*dbc vars, like the same bad results, 1 var =

(same Grad Boost settings. Exclude all INT*dbc vars, using just 62)

improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19)

interactions are getting selected, improve Trn results butdecrease Val results. Perhaps I should regen the INT*dbc with alarger number of min records.

More

Can treat model notebook tableas meta-data (i.e. 144 records ormodels)

Train models on meta-dataSource vars = model parametersTarget 1 = conservative result

orTarget 2 = training time

Perform sensitivity analysisto answer questions:

Q) Searching which modeltraining parameters lead to thebest results?Q) …most training time?

Page 25: Heuristic design of experiments w meta gradient search

Outline

Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”

How to Measure Progress

Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt

How to Describe Any Complex System – Sensitivity

25

Page 26: Heuristic design of experiments w meta gradient search

Design Of Experiments (DOE)Parameter Search

• Ideally, vary one parameter at a time, quantify the results– Bigger challenge in BIG DATA compute per model

• Exhaustive Grid Search O(3P)– for Param A = Low, Med, High (test 3 settings)– for Param B = Low, Med, High– for Param C = Low, Med, High– easy to implement, not the most efficient– Can use Fractional Factorial design (i.e. 10%)

• Scales less effectively for many parameters

• Stochastic Search (Genetic Algorithms) O(1002)– Directed Random Search is more efficient than Grid Search, but…– Can be overkill in complexity: (100 models / generation) * (100’s gens)

• Taguchi Analysis (works with this DOE approach)– Efficient multivariate orthogonal search– test landing pages w/ Offermatica (acquired by Ominture in 2007 for DOE)– http://en.wikipedia.org/wiki/Taguchi_methods– Does not use domain knowledge of parameter interactions - OPPORTUNITY

A

B

C

Page 27: Heuristic design of experiments w meta gradient search

TaguchiDesign

• Not a full gridsearch

• Can weimprove withexperienceand aheuristicprocess?

27http://www.itl.nist.gov/div898/handbook/pri/section5/pri56.htmhttp://www.jmp.com/support/downloads/pdf/jmp_design_of_experiments.pdf

Page 28: Heuristic design of experiments w meta gradient search

Model Parameters

Algorithm Searches Meta-Search by a Data MinerDesign of Experiments (DOE)

Over Your ChoicesAlgorithm Model Parameters Model Training ParametersRegression weights variable selct (forward, step)Neural net weights step size; learning rate

Decision Tree (spend < $1000) max depth; (Gini, Entropy)

28

Page 29: Heuristic design of experiments w meta gradient search

Model Parameters vs.Model Training Parameters

Algorithm Searches Meta-Search by a Data MinerDesign of Experiments (DOE)

Over Your ChoicesAlgorithm Model Parameters Model Training ParametersRegression weights variable select (forward, step)Neural net weights step size; learning rate

Decision Tree (spend < $1000) max depth; (Gini, Entropy)

29

Page 30: Heuristic design of experiments w meta gradient search

Heuristic Planning YourDesign of Experiments (DOE)

• Assumptions about Data Mining Project– May be on BIG DATA, with practical constraints– May be training 4 to 400 models (not 4000+ like GA)– Want diversity, to investigate different algorithms– Want to generalize process to future deployments

• Heuristic Strategies– Use knowledge of interacting parameters (parallel tests)

• (Cost+profit weights) and (boosting weights) fight each other– Delay searching compute intensive parameters

• First stabilize most other “computationally reasonable” params• Large decision tree depth,• neural nets w/ lots of connections

– Opportunistically spend time by algorithm success30

Page 31: Heuristic design of experiments w meta gradient search

Gradient Descent Numerical MethodsSearching to Find Minima

31

ForestFieldsBeachWaterDeep Water

High Error

Low Error

Weight Parameter 1W

eightParam2

Min

Min

hill tops

beachwater

Min

Page 32: Heuristic design of experiments w meta gradient search

Gradient Descent Numerical MethodsSearching to Find Minima

32

ForestFieldsBeachWaterDeep Water

High Error

Low Error

“Ski Down” fromthe mountains toLake TahoeMoving = adjust paramX = starting positionM = a local minimum

Weight Parameter 1W

eightParam2

X

M

M

hill tops

beachwater

Page 33: Heuristic design of experiments w meta gradient search

Conservative Result with Respect toModel Training Parameters

33

ForestFieldsBeachWaterDeep Water

High Error

Low Error

“Ski Down” fromthe mountains toLake TahoeMoving = adjust paramX = starting positionM = a local minimum

Model Parameter 1M

odelParam2

X

M

M

Page 34: Heuristic design of experiments w meta gradient search

Heuristic Planning YourDesign of Experiments (DOE)

• Start with a reasonable default setting ofparameters,– the “center of the daisy” the gradient check

• Vary one parameter at a time from the center– “each petal of the daisy” gradient search trial

• Move to the next “reasonable multivariate start”– The “stem of the daisy” steepest descent 34

Page 35: Heuristic design of experiments w meta gradient search

Heuristic “Meta-Gradient Search” ofModel Training Parameters

35

High Error

Low Error Parameter 1Param

eter2

M

Page 36: Heuristic design of experiments w meta gradient search

Heuristic “Meta-Gradient Search” ofModel Training Parameters

36

High Error

Low Error Parameter 1Param

eter2

M

Page 37: Heuristic design of experiments w meta gradient search

Heuristic “Meta-Gradient Search” ofModel Training Parameters

37Parameter 1

Parameter2

Mvs.

Taguchi DOE

Art vs. Science?

No, a practicalcompliment

using existingnum. methods

Page 38: Heuristic design of experiments w meta gradient search

Heuristic “Meta-Gradient Search” ofModel Training Parameters

38

ModNum

chngfromprior

varsoffered

criterionmax

depthleaf size

1 0 27 default 6 52 1 27 probchisq 6 53 1 27 entropy 6 54 1 27 gini 6 55 3 27 entropy 12 56 3 27 entropy 6 107 3 27 entropy 6 1008 3 27 entropy 6 1009 3 27 entropy 6 5

10 3 27 entropy 6 5

11 3 27 entropy 6 5

12 3 27 entropy 10 2

Can you give a moretangible example?This sounds a bit

vague.

Change from Prior Model– tracks change from the“center of a daisy”(Model 1 or 3)

Page 39: Heuristic design of experiments w meta gradient search

Heuristic “Meta-Gradient Search” ofModel Training Parameters

• After stabilizing most of the “fast” and “medium”compute time parameters, search the “long computetime settings”

• With the final parameter settings, if 2x or 10x more datais available, perform a “final bake in,” long training run

• Then try Ensemble Methods– Stacking, boosting, bagging combining many of the best

models,– Gradient Boosting over residual error– Select models who’s residual errors correlate the least– Use a 2nd stage model to combine 1st stage models and top

preprocessed fields (for context switching)– Last year’s KDD Cup winners– Netflix winners used Ensemble methods

Page 40: Heuristic design of experiments w meta gradient search

Outline

Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”

How to Measure Progress

Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt

How to Describe Any Complex SystemSensitivity Analysis

40

Page 41: Heuristic design of experiments w meta gradient search

Needs to Describe Forecast Alg• Many Data Mining solutions need description

– To check writer (to SVP, owner, business unit, …) business realitycheck before deployment

– “What if” analysis, to fine tune larger system• Feed Operations Research or Revenue Management systems

– Need a modeling “descriptive simulation” (political donations)– When evaluating credit, by law required to offer 4 “reason

codes” for each person scored – when they are declined

• Should the Data Miner cut algorithm choices?– NO! “I understand how a bike works, but I drive a car to work”– how much detailed understanding is needed?– Provide enough info to “drive the car” vs. “build the car”

• Check writer does not need to understand B-tree to buy SQL 41

Page 42: Heuristic design of experiments w meta gradient search

Sensitivity Analysis(OAT) One At a Time*

42

Arbitrarily ComplexData Mining System

Present record N, S times, each input 5% bigger (fixed input delta)Record delta change in output, S times per record

Aggregate: average(abs(delta)), target change per input field delta

(S) Source fields*Some catch interactions

Target field

For source fields withbinned ranges, sensitivity

tells you importance of therange, i.e. “low”, …. “high”

Can put sensitivity values inPivot Tablesor Cluster

Record Level “Reasoncodes” can be extractedfrom the most important

bins that apply to the givenrecord

Delta in forecast

Page 43: Heuristic design of experiments w meta gradient search

43

Descriptions of Predictive ModelsReason Codes – Ranked by Sensitivity Analysis

• Reason codes are specific to the model and record

• Ranked predictive fields Mr. Smith Mr. Jonesmax_late_payment_120d 0 1max_late_payment_90d 1 0bankrupt_in_last_5_yrs 1 1max_late_payment_60d 0 0

• Mr. Smith’s reason codes include:max_late_payment_90d 1bankrupt_in_last_5_yrs 1

Page 44: Heuristic design of experiments w meta gradient search

Summary• Conservative Result (How to Measure)

– Continuous metric to select accurate and general models• Heuristic Meta-Gradient Search (How to Plan)

– An automated or human process to plan a Design ofExperiments (DOE)

– Searches the training parameters that a data miner adjustsin data mining software (“meta-parameter search”)

– Heuristic DOE improvements• Most systems can be “reasonably described”

– Focus on repeatable business benefit (accuracy) overdescription or blind Occam’s Razor on a tech metric

44

SF Bay ACM, Data Mining SIG, Feb 28, 2011http://www.sfbayacm.org/?p=2464

[email protected]/in/GregMakowski

Take Away: The process of goingfrom design objectives to heuristic design