uplift modeling with roc: an srl case studypages.cs.wisc.edu/~hous21/presentations/ilp_13.pdf ·...
TRANSCRIPT
Uplift Modeling with ROC: An SRL Case Study
Houssam Nassif, Finn Kuusisto, Elizabeth S. Burnside,and Jude Shavlik
University of Wisconsin, Madison, USA
Thursday, August 29, 2013
Introduction
The Task
What are we trying to accomplish?
I Identify patients with breast cancer who may be goodcandidates for watchful waiting.
I Use ILP to take advantage of our relational dataset andproduce interpretable classifiers.
I Use metrics that are understandable to medical experts.
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
In Situ vs. Invasive Breast Cancer
Breast Cancer Stages
There are two main stages of breast cancer.
In Situ
I Earlier stage
I Cancer is localizedg
Invasive
I Later stage
I Cancer has invadedsurrounding tissue
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
In Situ vs. Invasive Breast Cancer
Breast Cancer Age Differences
Breast cancer differs between older and younger patients.
Olderg
I Cancer tends to progressless aggressively
I Patient has less timeremaining for progression
Younger
I Cancer tends to progressmore aggressively
I Patient has more timeremaining for progression
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
In Situ vs. Invasive Breast Cancer
Overtreatment Problem
Who is treated?
Everyone
Can we reduce costly and risky overtreatment in older patients within situ cancer?
That is the goal
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
In Situ vs. Invasive Breast Cancer
Overtreatment Problem
Who is treated?
Everyone
Can we reduce costly and risky overtreatment in older patients within situ cancer?
That is the goal
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
In Situ vs. Invasive Breast Cancer
Watchful Waiting
Who are our most viable candidates for watchful waiting?
I Older
I In situ
I Sufficiently different from that of younger patients
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
In Situ vs. Invasive Breast Cancer
Our Dataset
Older Younger
In Situ Invasive In Situ Invasive
132 401 110 264
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
Uplift Modeling
Uplift Modeling
Predictive modeling technique that attempts to specificallycharacterize a particular subgroup of a population.
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
Lift and Uplift
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
00
Examples Labeled Positive
Tru
e P
ositi
ves
Lift
Lift
Lift
t
c
0.0 0.2 0.4 0.6 0.8 1.0
050
100
150
200
250
Examples Labeled Positive
Upl
ift
Uplift
Uplift = Liftt − Liftc
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
Lift and Uplift
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
00
Examples Labeled Positive
Tru
e P
ositi
ves
Lift
Lift
Lift
t
c
0.0 0.2 0.4 0.6 0.8 1.00
5010
015
020
025
0
Examples Labeled Positive
Upl
ift
Uplift
Uplift = Liftt − Liftc
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
Understandable Metrics
What’s the problem?
Lift isn’t a common metric
Can we achieve the same characterization using a different metric?
That is the goal
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
Understandable Metrics
What’s the problem?
Lift isn’t a common metric
Can we achieve the same characterization using a different metric?
That is the goal
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
Differential ILP
How do we get an ILP algorithm to consider metrics like liftand ROC?
Start with Score as You Use (SAYU)
Now how do we make it differential?
Train two classifiers instead of one
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
Differential ILP
How do we get an ILP algorithm to consider metrics like liftand ROC?
Start with Score as You Use (SAYU)
Now how do we make it differential?
Train two classifiers instead of one
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Learning Method
SAYL Algorithm
SAYL
Initialize naıve classifiers and theorywhile Stop criteria not met do
Select seed exampleConstruct bottom clausewhile Clause space not exhausted do
Select new clauseTrain classifiers with theory and new clauseif New clause improves ROC difference then
Add new clause to theorybreak
end ifend while
end while
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Evaluation
SAYL-ROC Performance
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Up
lift
(nb
of
po
sitiv
es)
Fraction of total mammograms
SAYL-ROC
SAYL
DPS
MF
Baseline
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Conclusions and Future Work
Conclusions and Future Work
Conclusions
I No significant difference between SAYL and SAYL-ROC
I SAYL-ROC training may be easier to understand outsideof marketing
I SAYL-ROC tends to construct much larger theories
I SAYL-ROC theories may be more difficult to interpret
Future Work
I Experiment with different class skews
I Experiment with different domains
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Appendix
Learned Rules
Some example rules.
1. Patient had prior in situ biopsy,BI-RADS score of prior biopsy was 1
2. Patient has low breast density,principal finding is calcification or single dilated duct
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Appendix
SAYL TAN Models
TAN model learned on older population
breast category
combined BI-RADS increased up to 3 points over previous mammogram
had previous in situ biopsy at same location breast BI-RADS score = 4
no family history of cancer, and no prior surgery breast has mass size ≤ 13 mm
TAN model learned on younger population
breast category
combined BI-RADS increasedup to 3 points
over previous mammogram
had previous in situ biopsyat same location
breast BI-RADS score = 4no family history of cancer,
and no prior surgery breast has mass size ≤ 13 mm
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Appendix
Marketing Customer Groups
PersuadablesCustomers who will respond only when targeted.
Sure ThingsCustomers who will respond even when not targeted.
Lost CausesCustomers who will not respond, regardless of whether theywere targeted or not.
Sleeping DogsCustomers who will not respond as a result of being targeted.
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Appendix
Marketing Ideal Ranking
Persuadables Sleeping Dogs Sure Things, Lost Causes
Increasing probability of response from targeting
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA
Appendix
Marketing Dataset
Target Control
Response No Response Response No Response
Persuadables Sure Things
Sleeping Dogs Lost Causes
Sleeping Dogs Sure Things
Persuadables Lost Causes
Uplift Modeling with ROC: An SRL Case Study University of Wisconsin, Madison, USA