data mining process source: crisp-dm (spss.com website)
TRANSCRIPT
Elementising
Ralph B and Julianne Kimball Trustees for Kimball Fred CSte. 11613150 Hiway 9Box 1234 Boulder CrkColo 95006
Addressee First Name(1): RalphAddressee Middle Initial(1): BAddressee Last Name(1): KimballAddressee First Name(2): JulianneAddressee Last Name(2): KimballAddressee Relationship: Trustees forRelationship Person First Name: FredRelationship Person Middle Name: CRelationship Person Last Name: KimballStreet Address Number: 13150Street Name: Hiway 9Suite Number: 116Post Office Box Number: 1234City: Boulder CrkState: ColoFive Digit Zip: 95006
Matching/Householding
Match record with other customer records containing Ralph and Julianne Kimball
Establish that they are part of the same household
Analyst Issues
Physical data problemsData DictionariesValidation (Frequencies)Missing DataThe “zero” value problemInappropriate (Future) data for modelingUnavailable data
Physical
Cannot access data ASCII vs EBCDIC On a medium that you can’t use (certain type of tape,
for instance)
Missing Data
IgnoreFind the right values if you canUse Average for that variableReplace with number that matches its
characteristics(What do the missing people look like in terms of the dependent? Who else looks like that?
The zero problem
What does 0 mean?If “Number of Revolving Bankcard Trades
Currently Past Due” = 0, what does that mean?
# of Bank Rev. Trds Currently Past Due
Cumulative Cumulative
BRPSTD Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 13485 96.0 13485 96.0
1 486 3.5 13971 99.5
2 57 0.4 14028 99.9
3 12 0.1 14040 100.0
4 2 0.0 14042 100.0
# of Trds
Cumulative Cumulative
TRADES Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 1606 11.4 1606 11.4
1 1080 7.7 2686 19.1
2 1056 7.5 3742 26.6
3 1007 7.2 4749 33.8
4 949 6.8 5698 40.6
5 911 6.5 6609 47.1
6 849 6.0 7458 53.1
7 793 5.6 8251 58.8
8 682 4.9 8933 63.6
9 622 4.4 9555 68.0
10+ 4487 32.0 14042 100.0
# of Bank Rev. Trds
Cumulative Cumulative
BRTRDS Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
INQS. & PR ONLY 64 0.5 64 0.5
PR ONLY 22 0.2 86 0.6
INQS. ONLY 960 6.8 1046 7.4
NO RECORD 560 4.0 1606 11.4
0 6183 44.0 7789 55.5
1 2616 18.6 10405 74.1
2 1427 10.2 11832 84.3
3 831 5.9 12663 90.2
4 496 3.5 13159 93.7
5 287 2.0 13446 95.8
6 188 1.3 13634 97.1
7 142 1.0 13776 98.1
8 92 0.7 13868 98.8
9 60 0.4 13928 99.2
10+ 114 0.8 14042 100.0
# of Bank Rev. Trds Currently Past Due
Cumulative Cumulative
BRPSTD Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
NO TRADES OF THIS TYPE 6183 44.0 6183 44.0
INQS. & PR ONLY 64 0.5 6247 44.5
PR ONLY 22 0.2 6269 44.6
INQS. ONLY 960 6.8 7229 51.5
NO RECORD 560 4.0 7789 55.5
MISSING 3475 24.7 11264 80.2
0 2221 15.8 13485 96.0
1 486 3.5 13971 99.5
2 57 0.4 14028 99.9
3 12 0.1 14040 100.0
4 2 0.0 14042 100.0
Inappropriate Data Used
Future data used to build great looking model. Used payments till month end instead of payments
until cycle date.