active cost-sensitive learning (intelligent test strategies) charles x. ling, phd department of...
TRANSCRIPT
Active Cost-sensitive Learning
(Intelligent Test Strategies)
Charles X. Ling, PhDDepartment of Computer Science
University of Western Ontario, Ontario, Canada
[email protected]://www.csd.uwo.ca/faculty/clingJoint work with Victor Sheng, Qiang
Yang, …
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Everything has a cost/benefit!
Materials, products, services Disease, working/living condition, waiting, … Happiness, love, life, …
Money, Sex and Happiness: An Empirical Study, by David G. Blanchflower & Andrew J. Oswald, in Journal The Scandinavian Journal of Economics. 106:3, 2004. Pages: 393-415
Lasting/happy marriage is worth about $100,000 in happiness
Utility-based learning: optimization; unifies many issues & is ultimate goal
Everything has a cost/benefit!
In medical diagnosis… Tests have costs: temperature ($1), X-ray ($30), biopsy
($900) Diseases have costs: flu ($100), diabetes (100k), cancer
(108) Misdiagnosis has (different) costs
Cost of false alarm ($500) << cost of missing a cancer ($500,000)
Doctors: balance the cost of tests and misdiagnosis
Our goal: to minimize the total cost Many other similar applications… Model this process
Cost-sensitive learning Intelligent test strategies
Patient Test 1 Test 2 … Test n Cancer?
(Cost) $1 $30 ... $900 FP/FN= 100/300k
001 39 Low … High 1002 35 Med … ? 0003 42 ? … ? 0… … … … … …
New1 ? Med … ? ?
Review of Previous Work
Cost-sensitive learning: a survey (Turney 2000) Active research, also for imbalanced data problem
CS meta learning (wrapper): thresholding, sampling, weighting, …
CS learning algorithms. CSNB, our CS trees …but all consider misclassification costs only
Some work considers test costs only A few previous works consider both test costs and
misclassification costs (Turney 1995, Zubek and Dietterich 2002, Lizotte et al 2003); all computationally expensive
Review of Previous Work
Active learning: actively seeking for extra info
Pool-based: a pool of unlabeled examples, which ones to label
Membership query: Is this instance positive? Feature value acquisition
During training. But “missing is useful!” During testing: our work
Human learning is active in many ways
Review of Previous Work
Diagnosis: wide applications in medicine, mechanical systems, software, …
Most previous AI-based diagnosis systems…
Manually built (partially) Does not incorporate costs/benefit Cannot actively suggest the processes
Our work: cost-sensitive and active; useful for diagnosis and policy setting
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Cost-sensitive Decision Tree
Patient Test 1 Test 2 … Test n Cancer?
(Cost) $1 $30 ... $900 FP/FN= 100/300k
001 39 Low … High 1
002 35 Med … ? 0
003 42 ? … ? 0
… … … … … … 1
T1
T60
0
T2
T3
10
Low Med
<36 >=36
0
1 2
a cb
Advantages: tree structure, comprehensiblity
Objective: minimizing the total cost of tests and misclassification.
Attribute Splitting Criteria Previous methods: C4.5 reduces
the entropy (randomness), performs badly on cost sensitive tasks
New (ICML’04): we reduce the total expected cost
E
E3E2E1
1 2 3
Choose T such that E – (E1+E2+E3) is maxC
C3C2C1
1 2 3
Choose T such that C – (C1+C2+C3+C_Test) is max
Case Study: Heart Disease
Predict coronary artery disease Class 0: less than 50% artery
narrowing; Class 1: more than 50% artery narrowing
~300 patients, collected from hospitals
13 non-invasive tests on patients
13 Tests (Heart Disease)Tests Costs Meaning
age $1 age of the patient
sex $1 sex
cp $1 chest pain type
trestbps
$1 resting blood pressure
chol $7.27 cholesterol in mg/dl
fbs $5.20 fasting blood sugar
restecg $15.50 resting electrocardiography results
thalach $102.90 maximum heart rate
thal $102.90 maximum heart rate reached
exang $87.30 exercise induced angina
oldpeak $87.30 ST depression induced by exercise
slope $87.30 slope of the peak exercise ST segment
ca $100.90 number of major vessels colored by fluoroscopy
Cost-sensitive tree for Heart Disease
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
restecg
($15.5)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
• Naturally prefer tests with small cost
• Balance cost and discriminating power
• Local heart-failure specialist thinks this tree is reasonable.
Considering Group Discount
Tests Costs Meaning
age $1 age of the patient
sex $1 sex
cp $1 chest pain type
trestbps
$1 resting blood pressure
chol $7.27 cholesterol in mg/dl
fbs $5.20 fasting blood sugar
restecg $15.50 resting electrocardiography results
thalach $102.90 maximum heart rate
thal $102.90 finishing heart rate
exang $87.30 exercise induced angina
oldpeak $87.30 ST depression induced by exercise
slope $87.30 slope of the peak exercise ST segment
ca $100.90 number of major vessels colored by fluoroscopy
Discount: $2.10
Discount: $101.90
Discount: $86.30
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
restecg
($15.5)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
thalach($1)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
individual cost: $102.9
Before After
Different trees without/with group discount
Algorithm of Cost-sensitive Decision Tree
CSDT(Examples, Attributes, TestCosts) If all examples are positive, return root with label=+ If all examples are negative, return root with label=- If maximum cost reduction <0, return root with label
according to min(PTP+ NFP, NTN+ PFN) Let A be an attribute with maximum cost reduction root A Update TestCosts if discount applies For each possible value vi of the attribute A
Add a new branch A=vi below root Segment the training examples Example_vi into the new
branch Call CSDT(examples_vi, Attributes-A, TestCosts) to build
subtree
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Patient Test 1 Test 2 … Test n Cancer?
(Cost) $1 $30 ... $900 FP/FN= 100/300k
001 39 Low … High 1
002 35 Med … ? 0
003 42 ? … ? 0
… … … … … … 1
T1
T60
0
T2
T3
10
Low Med
<36 >=36
0
1 2
a cb
New1 ? ? … ? ?
Three categories of intelligent test strategies1. Sequential Test: one test, wait, … then predict 2. Single Batch Test: one batch of tests, then predict3. Sequential Batch Test: batch 1, batch 2, … then predictMinimize total cost of tests and misclassification, not trivialOur methods: utilizing the minimum-cost tree structure
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Sequential Test
Use tree structure to guide test sequence
“Optimal” because tree is (locally) optimal
Sequential Test
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
thalach($1)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
Experimental Comparison
Using 10 datasets from UCI
No. of Attributes
No. of Examples
Class dist. (N/P)
Ecoli 6 332 230/102
Breast 9 683 444/239
Heart 8 161 98/163
Thyroid 24 2000 1762/238
Australia 15 653 296/357
Tic-tac-toe
9 958 332/626
Mushroom
21 8124 4208/3916
Kr-vs-kp 36 3196 1527/1669
Voting 16 232 108/124
Cars 6 446 328/118
Comparing Sequential Test Eager learning: Sequential Test (OST) (ICML’04) Lazy learning: Lazy Sequential Test (LazyOST) (TKDE’05) Cost-sensitive Naïve Bayes (CSNB) (ICDM’04)
40
50
60
70
80
90
100
0.2 0.4 0.6 0.8 1
Ratio of Unknown Attributes
To
tal C
ost
CSNB OST LazyOST
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Single Batch Test Only one batch – not an easy task If too few, important tests not
requested; prediction is not accurate; total cost high
If too many, some tests are wasted; total cost high
The test example may not be classified by a leaf
Single Batch Test Expected cost reduction: if a test is
done, what are the possible outcomes and cost reduction
))](())(()([)()( iRmisciRpicimisciE
R(.): all reachable unknown nodes and leaves
i
j3j2j1
1 2 3
Single Batch Test
A*-like search algorithm Form a candidate list (L) and a batch list (B) Choose a test with maximum positive
expected cost reduction from L, add it to B Update L: add all reachable unknowns to L
Efficient with tree structure until expected cost reduction is 0
L = empty /* list of reachable and unknown attributes */B = empty /* the batch of tests */u = the first unknown attribute when classifying a test caseAdd u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */Until L is emptyOutput B as the batch of tests
))(())(( iRmisciRp
Single Batch Test
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
thalach($1)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
]
Single Batch Test
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
thalach($1)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
]
Single Batch Test
cp is unknown. cp has positive expected cost reduction. cp is added to the batch. cp’s reachable unknown nodes are added into the candidate list.
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
thalach($1)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
]
From the candidate list, choose one with maximum positive expected cost reduction. Add it to the batch, and update the candidate list. Repeat. After 7 steps, expected cost reduction is 0.
Single Batch Test
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
thalach($1)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
]
Single Batch Test
Do all tests in the batch
1
2
2
3211
11 1
2
2 2
1
2 3
41
2
3
1 2
thal($102.9)
fbs($5.2)
restecg
($15.5)
sex($1)
chol($7.27)
0
cp ($1)
0
slope($87.3)
thalach($1)
age($1)
thal($102.9)
1 0 11
1 0 01 1
1 10 0
21
]
Predict by internal node
Single Batch Test
Make a prediction. Some tests are wasted.
Comparing Single Batch Tests
Naïve Single Batch (NSB) (ICML’04) Cost-sensitive Naïve Bayes Single Batch (CSNB-SB) (ICDM’04) Greedy Single Batch (GSB) (TKDE’05) Single Batch Test (OSB) (TKDE’05)
350
400
450
500
550
600
650
700
750
0.2 0.4 0.6 0.8 1
Ratio of Unknown Attributes
Tota
l Cos
t
CSNB-SB NSB GSB OSB
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Sequential Batch Batch 1, batch 2, … , prediction Must include the cost of waiting in tests Wait cost of a batch: maximum wait cost in the
batch Less than the sum
Combines Sequential Test and Single Batch Test If all waiting costs =0, it becomes Sequential Test If all waiting costs very large, Single Batch
Sequential Batch
The wait cost is derived from wait time
age sex cp trestbpscho
lfbs
restecg
thalach
exang
oldpek
slope
ca thal
0.001 0.001 0.001 0.01 4 4 0.5 1 1 1 1 1 1
Test wait time in hours
Sequential Batch Extending the Single Batch to include the batch
cost An additional constraint: cumulative ROI
BatchCosttestCost
ionCostReductROI
No more batches!
Loop L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 & ROI increases then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is emptyIf (B is not empty) then Output B as the current batch of tests; obtain their values at a cost Classify the test example further, until encountering another unknown testElse exit the first Loop
))(())(( iRmisciRp
Sequential Batch
Comparing Sequential Batch Test
120
170
220
270
320
370
420
470
0.2 0.4 0.6 0.8 1Unknow n attribute ratio
Tota
l cost
SingBSeqTSBT
Outline
Introduction Cost-sensitive decision trees Test strategies
Sequential Test Single Batch Test Sequential Batch Test
Conclusions and future work
Future Work Deal with different test examples differently Consider more costs: acquiring new examples
If $10 for each new example, how many do I need? For $10, tell me if this patient has cancer
If test is not accurate (e.g. 90%), how to build trees and how to do tests (will I do it again)?
From cost-sensitive trees, derive medical policy for expensive/risky or cheap/effective tests
Conclusions Cost-sensitive decision tree: effective for
learning with minimal total cost Can be used to model learning from data with costs
Design and compare various test strategies Sequential Test: one test, wait, …: low cost but long wait Single Batch Test: one batch of tests: quick but higher cost Sequential Batch Test: batch, wait, batch, …: best tradeoff
Our methods perform better than previous ones
Can be readily applied to real-world diagnoses
References C.X. Ling, Q. Yang, J. Wang, and S. Zhang. Decision Trees with Minimal Costs. ICML'2004. X. Chai, L. Deng, Q. Yang, and C.X. Ling. Test-Cost Sensitive Naive Bayes Classification. ICDM'2004. C.X. Ling, S. Sheng, Q. Yang. “Intelligent Test Strategies for Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. S. Zhang, Z. Qin, C.X. Ling, S. Sheng. "Missing is Useful": Missing Values in Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. Turney, P.D. 2000. Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML’2000. Zubek, V.B., and Dietterich, T. 2002. Pruning improves heuristic search for cost-sensitive learning. ICML’2002. Turney, P.D. 1995. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. JAIR, 2:369-409. Lizotte, D., Madani, O., and Greiner R. 2003. Budgeted Learning of Naïve-Bayes Classifiers. In Uncertainty in AI.