make every interaction count decision trees: profiling and segmentation sachin chincholi,...

28
Make every interaction count™ Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutes Starting in 10 minutes Starting in 5 minutes Starting in 2 minutes Starting now USA: 1 866 793 4279 Austria 0800 28 1673 Belgium: 0800 505 60 Canada: 1 866 270 8076 India 000800 100 6558 Republic of Ireland: 1800 944 607 Netherlands 0800 0233 593 Norway: 800 164 90 Spain 900 801 508 Sweden: 0200 125 679 UK: 0808 109 1441 International: +44 20 8609 1476 Access code 131716 #

Upload: jack-jackson

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Portrait Software Copyright 2007 Decision Trees: Profiling and Segmentation –Presenter: Sachin Chincholi, Professional Services –Audience: Existing Quadstone Users

TRANSCRIPT

Page 1: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Make every interaction count™

Decision Trees:Profiling and SegmentationSachin Chincholi, Professional Services

Starting in 15 minutesStarting in 10 minutesStarting in 5 minutesStarting in 2 minutesStarting now

USA: 1 866 793 4279Austria 0800 28 1673Belgium: 0800 505 60Canada: 1 866 270 8076India 000800 100 6558Republic of Ireland: 1800 944 607Netherlands 0800 0233 593Norway: 800 164 90Spain 900 801 508Sweden: 0200 125 679UK: 0808 109 1441International: +44 20 8609 1476Access code 131716 #

Page 2: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007 CUSTOMER CONFIDENTIAL

How to ask a Question

Page 3: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Decision Trees: Profiling and Segmentation

– Presenter: Sachin Chincholi, Professional Services

– Audience: Existing Quadstone Users

Page 4: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Decision Trees for insight

+ Transparent – Easily understandable by non-statisticians– Sanity check your modelling framework

– Is your objective defined correctly?– Are the initial splits plausible?

+ Fast to build– Quick alert to possible contamination

Page 5: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Decision Trees for Modeling

+ Transparent– Easier to get buy-in from the business– Easy to code

+ Non-parametric– No assumptions about underlying distributions of Analysis Candidates

+ Non-linear– Allow easy discovery of non-linear patterns (age vs. income)

– ‘Unstable’– Different populations give very different trees

Page 6: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Interpreting a decision tree

≥ 40

The split at Age = 40 is the most predictive

< 40 Age

#2 #3

50.2% of 2030220.1% of 79698

Age Income

Color is used to show match rates

#1

Objective: Response match = 26.2% of 100000

Match rate for the objective over the entire population

Page 7: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Decision tree build process

– Given an objective, Decision Tree Builder will find the most predictive split among all possible splits, with all analysis candidates, given the current binnings

– The population is then split into two segments based on this– The same method splits each of the two segments into two further segments– This process continues until the tree is finished, as determined by the tree constraints

Page 8: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Choice of a decision tree split

– Each possible split is assigned a quality value– The splits are ranked:

– The quality value depends on the tree type:– Binary outcome tree and classification tree: Information gain– Regression tree: R2

Split Quality Value

Income < 40000? 0.205MaritalStatus = Single? 0.201Income < 30000? 0.199

… …

Page 9: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

0.11

Choice of a decision tree split (2)

Objective: Response Level: 1

Age 18 20 30 40 50 60 65

Income 0 10000 20000 30000 40000 50000 100000

LoanAmount 0 2000 10000 20000 30000 50000 100000

MaritalStatusSingle Married Widow

0.1040.1050.1210.1450.132

0.186 0.2050.1930.199 0.156

0.0980.1630.1690.123

0.205

0.111

0.2010.1690.180Misc.

0.1750.1000.131

Page 10: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Splitting criterion

–Information = Σ p(c).log(p(c))

– Sum of (proportion C x log(proportion(C)) for all C’s

– Equivalent to likelihood-ratio test for comparing two populations

– Seeks to separate out classes, while minimising small nodes

c=1,n

Page 11: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Is the decision tree any good (binary case)?

Proportion of actual non- matches

1

Proportion of actual matches

0

0.5

10.5

Gini “curve”

0

Sort by predicted propensity

Page 12: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Calculating the Gini value

Gini = A/B x 100%

Gini “curve”

A

B

Page 13: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Gini “curves”

Perfect model

Totally unpredictive model

Page 14: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Overfitting

Predictivepower

Complexity (relative to dataset size)

apparent

actual

overfitting

*

Page 15: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Best Practice

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning

Page 16: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Best Practice

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning

Page 17: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Best Practise

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning

Page 18: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

0%

20%

40%

60%

80%

100%

120%

140%

1 2 3

Population

perc

ent o

f mea

n

Confidence interval for 100 responses…

1000 10,000 100,000

Mean

Upper

Lower

Page 19: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Upper and Lower Confidence intervals, 95% confidenceEg. 50 responses out of N, suggests that the true (population) mean is 95% likely to

be between 75% and 130% of the observed (sample) mean.

1015

2550

100300 500 1000 3000 5000 10000

162

146131

121112 109 106 104 103 102

20

40

60

80

100

120

140

160

180

200

10 100 1000 10000

Number of Responses

Perc

ent o

f obs

erve

d m

ean

Confidence intervals

Page 20: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

What makes a good segment?

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

1 2 3

Node

Res

pons

e ra

teIf this is the average…

Is this worth knowing?

Is this?

Page 21: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning

Best Practice

Page 22: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Possible splits scale exponentially

1

10

100

1000

10000

100000

1000000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of categories

Num

ber o

f spl

its

Page 23: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning

Best Practice

Page 24: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning

Best Practise

Page 25: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning

Best Practise

Page 26: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Reporting on your model

– Audit the model you build

– Monitor future ‘through the door’ populations

Page 27: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007

Where to find out more

– Quadstone System Support website:

http://support.quadstone.com/info/releases/#qs5.3

– Documentation– What’s new in the Quadstone System 5.3 release notes– Updated Quadstone System help (F1)– Updated Quadstone System data-build command and TML reference– Updated Data Build Manager reference– Updated Quadstone System administration reference– Customer-specific release notes

– Quadstone System Support– Web Site: http://support.quadstone.com/– Email: [email protected]– Tel: US 1-800-335-3860; All +44 131 240 3140

Page 28: Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Portrait Software Copyright 2007 Friday, May 5, 2023 Page 28Portrait Software Copyright 2008www.portraitsoftware.com

Asia PacificLevel 715-17 Young StreetSydney NSW 2000AustraliaF: +61 2 8004 9600

Questions?

EMEA (Headquarters)The Smith Centre, The FairmileHenley-on-Thames, Oxfordshire,RG9 6AB, United KingdomT: +44 (0)1491 416600F: +44 (0)1491 416601

The Americas125 Summer Street16th FloorBoston MA 02110, USAT: +1 617 457-5200F: +1 617 457-5299

Asia PacificLevel 715-17 Young StreetSydney NSW 2000AustraliaF: +61 2 8004 9600