my five predictive analytics pet peeves april 2013...my five predictive analytics pet peeves dean...
TRANSCRIPT
My Five Predictive Analytics
Pet Peeves Dean Abbott
Abbott Analytics, Inc. Predictive Analytics World, San Francisco, CA (#pawcon)
April 16, 2013
Email: [email protected] Blog: http://abbottanalytics.blogspot.com
Twitter: @deanabb © Abbott Analytics, Inc. 2001-2013 1
Topics
• Why Pet Peeves? • A call for humility for Predictive Modelers
• The Five Pet Peeves 1. Machine Learning Skills > Domain Expertise
2. Just Build the Most Accurate Model!
3. Significance?…What do you mean by Significance?
4. My Algorithm is better than Your Algorithm
5. My classifier calls everything 0…time to resample!
© Abbott Analytics, Inc. 2001-2013 2
Peeve 1 Which is Better: Machine Learning
Expertise or Domain Expertise?
• Question: who is more important in the process of building predictive models: • The Data Scientist / Predictive Modeler / Data Miner
• The Domain Expert / Business Stakeholder?
© Abbott Analytics, Inc. 2001-2013 3
Photo from http://despair.com/superioritee.html
Which is Better: 2012 Strata Conference Debate?
From Strata Conference: http://radar.oreilly.com/2012/03/machine-learning-expertise-google-analytics.html
© Abbott Analytics, Inc. 2001-2013 4
“I think you can get pretty far with some
common sense, maybe Google-ing the
basic information you need to know
about a domain, and a lot of statistical
intuition”
Formula for Success?
© Abbott Analytics, Inc. 2001-2013 5
Conclusion: Frame the Problem First
• Mark Driscoll: Moderator of Strata Debate • “could you currently prepare your data for a Kaggle
competition? If so, then hire a machine learner. If not, hire a data scientist who has the domain expertise and the data hacking skills to get you there.” – http://medriscoll.com/post/18784448854/the-data-science-debate-domain-expertise-or-machine
• But even this may not work, which brings me to the second pet peeve…
© Abbott Analytics, Inc. 2001-2013 6
Peeve 2 Just Build Accurate Models
• The Problems with Model Accuracy:
1. There’s More to Success than “Accuracy”
2. Which Accuracy?
© Abbott Analytics, Inc. 2001-2013 7
The Winner is… Best Accuracy
© Abbott Analytics, Inc. 2001-2013 8
http://www.netflixprize.com/leaderboard
Why Model Accuracy is Not Enough: Netflix Prize
http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
© Abbott Analytics, Inc. 2001-2013 9
Why Data Science is Not Enough: Netflix Prize
http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
There’s more to a solution than accuracy—you have to be able to use it!
© Abbott Analytics, Inc. 2001-2013 10
Peeve 3 The Best Model Wins
• We select the “winning model”, but is there a significant difference in model performance?
© Abbott Analytics, Inc. 2001-2013 12
KDD Cup 98 Results
© Abbott Analytics, Inc. 2001-2013 13
Calculator from http://www.answersresearch.com/means.php
Example: Statistical Significance without Practical Significance
Measure Control Campaign
(based on model)
Number Mailed 5,000,000 4,000,000 Response Rate 1% 1.011%
outside margin of error? yes i.e., statisticall significant? yes
expected responders 50,000 40,000 actual responders 50,000 40,440
difference 0 440
Revenue Per Responder $100 Total Revenue Expected $4,000,000
Total Revenue Actual $4,044,000 Difference Revenue $44,000
Significance based on z=2 (95.45% confidence)
• Cost per contact: negligible (email)
• Cost for analysts to build model: $80,000
© Abbott Analytics, Inc. 2001-2013 14
Peeve 4 My Algorithm is Better than Your Algorithm
© Abbott Analytics, Inc. 2001-2013 15
From 2011 Rexer Analytics Data Mining Survey
http://www.rexeranalytics.com/Data-Miner-Survey-Results-2011.html
Every Algorithm Has its Day
© Abbott Analytics, Inc. 2001-2013 16
Elder, IV, J. F., and Lee, S. S. (1997), “Bundling Heterogeneous Classi�ers with Advisor Perceptrons,” Technical Report, University of Idaho, October, 14.
Modeling)Technique)/>)Modeling)
Implementa6on)/>)Par6cipant)
Affilia6on)Loca6on))Par6cipant)
Affilia6on)Type)
AUCROC)(Trapezoidal)Rule))
AUCROC)(Trapezoidal)Rule))Rank))
Top)Decile)Response)Rat)
Top)Decile)Response)Rate)Rank)
TreeNet&+&Logis-c&Regression& Salford&Systems& Mainland&China& Prac--oner& 70.01%& 1& 13.00%& 7&
Probit&Regression& SAS& USA& Prac--oner& 69.99%& 2& 13.13%& 6&
MLP&+&nHTuple&Classifier& Brazil& Prac--oner& 69.62%& 3& 13.88%& 1&
TreeNet& Salford&Systems& USA& Prac--oner& 69.61%& 4& 13.25%& 4&
TreeNet& Salford&Systems& Mainland&China& Prac--oner& 69.42%& 5& 13.50%& 2&
Ridge&Regression& Rank& Belgium& Prac--oner& 69.28%& 6& 12.88%& 9&
2HLayer&Linear&Regression& USA& Prac--oner& 69.14%& 7& 12.88%& 9&
Log&Regr+&Decision&Stump&+&AdaBoost&+&VFI& Mainland&China& Academia& 69.10%& 8& 13.25%& 4&
Logis-c&Average&of&Single&Decision&Func-ons& Australia& Prac--oner& 68.85%& 9& 12.13%& 17&
Logis-c&Regression& Weka& Singapore& Academia& 68.69%& 10& 12.38%& 16&
Logis-c&Regression& Mainland&China& Prac--oner& 68.58%& 11& 12.88%& 9&
Decision&Tree&+&Neural&Network&+&Log.Regression& Singapore& 68.54%& 12& 13.00%& 7&
Scorecard&Linear&Addi-ve&Model& Xeno& USA& Prac--oner& 68.28%& 13& 11.75%& 20&
Random&Forest& Weka& USA& 68.04%& 14& 12.50%& 14&
Expanding&Regression&Tree&+&RankBoost&+&Bagging& Weka& Mainland&China& Academia& 68.02%& 15& 12.50%& 14&
Logis-c&Regression& SAS&+&Salford& India& Prac--oner& 67.58%& 16& 12.00%& 19&
J48&+&BayesNet& Weka& Mainland&China& Academia& 67.56%& 17& 11.63%& 21&
Neural&Network&+&General&Addi-ve&Model& Tiberius& USA& Prac--oner& 67.54%& 18& 11.63%& 21&
Decision&Tree&+&Neural&Network& Mainland&China& Academia& 67.50%& 19& 12.88%& 9&
Decision&Tree&+&Neural&Network&+&Log.&Regression& SAS& USA& Academia& 66.71%& 20& 13.50%& 2&
PAKDD Cup 2007 Results: Look at all them Algorithms!
• 18 Different Algorithms Used in Top 20 Solutions;
© Abbott Analytics, Inc. 2001-2013 17
http://lamda.nju.edu.cn/conf/pakdd07/dmc07/results.htm
Peeve 5 You Must Stratify Data
to Balance the Target Class
• For example, 93% non-responders (N), 7% responders (R)
• What’s the Problem? (The justification for resampling) • “Sample is biased toward responders” • “Models will learn non-responders better” • “Most algorithms will generate models that say ‘call
everything a non-responder’ and get 93% correct classification!” (I used to say this too)
• Most common solution: • Stratify the sample to get 50%/50% (some will argue that one
only needs 20-30% responders)
© Abbott Analytics, Inc. 2001-2013 18
Neural Network Results on Same Data
© Abbott Analytics, Inc. 2001-2013 19
Distribution of Target
NOTE: all models built using JMP 10, SAS Institute, Inc.
Sample Decision Tree Built on Imbalanced Population
© Abbott Analytics, Inc. 2001-2013 20
Distribution of Target
Predictions of Target Variable Se
nsiti
vity
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00 0.20 0.40 0.60 0.80 1.001-Specificity
But….ROC Curve Looks like this
Why do we get a ROC Curve that looks OK, but the confusion matrix says “everything is N (No)”?
All Rows
5388Count
2778.7248G^2
114.84791LogWorth
AVG_DON>=12.6
2462Count
1906.1388G^2
16.436964LogWorth
REC_DON_AMT>=22
1110Count
1073.8838G^2
1.0384958LogWorth
RFA_2(L3F, L2F, L3G)
91Count
115.37798G^2
RFA_2(L4G, L2G, L1F, L1G, L1E, L2E, L4F)
1019Count
947.1647G^2
0.7753319LogWorth
MAX_DON_DT<9110
32Count
41.183459G^2
MAX_DON_DT>=9110
987Count
900.58503G^2
REC_DON_AMT<22
1352Count
772.40299G^2
1.5219174LogWorth
CARDPM12>=8
26Count
30.289597G^2
CARDPM12<8
1326Count
734.00892G^2
AVG_DON<12.6
2926Count
623.4558G^2
14.98445LogWorth
REC_DON_AMT>=15
1256Count
463.93369G^2
5.3217072LogWorth
MAX_DON_AMT>=21
155Count
122.97078G^2
MAX_DON_AMT<21
1101Count
317.08262G^2
REC_DON_AMT<15
1670Count
101.41981G^2
1.739958LogWorth
MAX_DON_AMT>=20
132Count
35.849605G^2
1.6609832LogWorth
CARDGIFT_LIFE<4
15Count
15.012073G^2
CARDGIFT_LIFE>=4
117Count
11.515776G^2
MAX_DON_AMT<20
1538Count
55.605138G^2
So What Happened?
• Note: no algorithm predicts decisions (N or R): they all produce probabilities/likelihoods/confidences
• Every data mining tool creates decisions (and by extension, forms confusion matrices) by thresholding the predicted probability by 0.5 (i.e., assuming equal likelihoods is the baseline)
• When the imbalance is large, algorithms will not produce probs/likelihoods > 0.5… a score this large is far too unlikely for an algorithm to be “that sure”
© Abbott Analytics, Inc. 2001-2013 21
What the Predictions Looks Like
© Abbott Analytics, Inc. 2001-2013 22
Confusion Matrices For the Decision Tree: Before and After
Response_STR) N) R) Total)N& 2,798& 2,204& 5,002&
R& 45& 341& 386&
Total& 2,843& 2,545& 5,388&
© Abbott Analytics, Inc. 2001-2013 24
Response_STR) N) R) Total)N& 5,002& 0& 5,002&
R& 386& 0& 386&
Total& 5,388& 0& 5,388&
Decision Tree: Threshold at 0.071
Decision Tree: Threshold at 0.5
Conclusions • The Rant is Done!
• The Five Pet Peeves 1. Machine Learning Skills > Domain Expertise
• Be humble; we need both data science and domain experts! 2. Just Build the Most Accurate Model!
• Select the model that addresses your metric 3. Significance?…What do you mean by Significance?
• Don’t get hung up on “best” when many models will do well • Learn from difference in patterns found by these models
4. My Algorithm is better than Your Algorithm • Don’t stress about the algorithm; learn to use a few very well
5. My classifier calls everything 0…time to resample! • Don’t throw away 0s needlessly; only do it when there are enough of them
that you won’t miss them.
© Abbott Analytics, Inc. 2001-2013 25