targeting solar customers using data mining techniques

14

Click here to load reader

Upload: merisa-lee

Post on 11-Jul-2015

274 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Targeting solar customers using Data Mining Techniques

MSCI 446 – Data Mining

Targeting potential Complete Solar customers

using data mining algorithms

Wendy D’Souza, Jesse Feld, Sharan Gurkar, Merisa Lee

Page 2: Targeting solar customers using Data Mining Techniques

Collected through .

260,358 data sets for homeowners in California.

125 Complete Solar customer data sets.

DATA

Page 3: Targeting solar customers using Data Mining Techniques

Due to large discrepancy, classification

algorithms ignored the small subset of customers.

Random sampling was performed 5-7 times on

each algorithm for 125 non-customers to create

a full, balanced data set.

125 customers

125non-customers

X 7

UNBALANCED DATA

Page 4: Targeting solar customers using Data Mining Techniques

DATA SETS

Name

Address

City

Pool owner?

Age

Net worth

Education

Marital status

Length of residence

Household income

Home value

Credit rating

Class variable=

Customer

Not a customer

Page 5: Targeting solar customers using Data Mining Techniques

PRISM ON NON-CUSTOMERS

Ran on 7 data sets with approximately 70 rules

generated for each class attribute. As a result, the

top 6 rules were chosen for each sample test.

Rule Occurrences

If household income is between $50,000 and $54,999, then household does

not have solar power7

If home market value is between $225,000 and $249,999, then household does

not have solar power6

If household income is between $35,000 and $39,999, then household does

not have solar power6

If city of residence is Vista, then household does not have solar power 5

If city of residence is Coronado, then household does not have solar power 5

If city of residence is Novato, then household does not have solar power 4

Page 6: Targeting solar customers using Data Mining Techniques

PRISM ON NON-CUSTOMERS

Average Kappa: 0.303

On average, 52.5% of the instances were classified

correctly.

PRISM was also ran on the set of Complete Solar

customers

-> results achieved were not as promising.

-> likely due to variety in the data set.

Page 7: Targeting solar customers using Data Mining Techniques

Ran on the same 7 data sets.

4 out of 7 sets returned city

2 out of 7 sets returned home market value

1 out of 7 sets returned household income

1R

Best predictor

Page 8: Targeting solar customers using Data Mining Techniques

Ran on the same 7 data sets.

4 out of 7 sets returned city

2 out of 7 sets returned home market value

1 out of 7 sets returned household income

1R

Best predictor

Removed the attribute “city” and the kappa value

increased almost ever time with Home Market value

as the best predictor.

Page 9: Targeting solar customers using Data Mining Techniques

Ran on the same 7 data sets.

4 out of 7 sets returned city

2 out of 7 sets returned home market value

1 out of 7 sets returned household income

1R

Best predictor

Removed the attribute “home market value” and

Kappa value decreased every time. This shows the

importance of home market value.

Page 10: Targeting solar customers using Data Mining Techniques

1R

-0.1

0

0.1

0.2

0.3

0.4

0.5

Kappa Value with City Kappa Value without City Kappa Value without HomeMarket Value

Test 1

Test 2

Test 3

Test 4

Test 5

Test 6

Test 7

Page 11: Targeting solar customers using Data Mining Techniques

1R

0

10

20

30

40

50

60

70

80

%Correctly ClassifiedInstances with City

%Correctly ClassifiedInstances without City

%Correctly ClassifiedInstances without Home

Market Value

Test 1

Test 2

Test 3

Test 4

Test 5

Test 6

Test 7

Page 12: Targeting solar customers using Data Mining Techniques

CLUSTERING

Weak attributes: marital status, gender, age, pool

and education

Strong attributes: income, home market value and

city

Cluster 1 Cluster 2 Cluster 3

City SAN DIEGO SAN JOSE SAN DIEGO

Pool Owner? No No No

Age 44.5-53.4 44.5-53.4 44.5-53.4

Education Level Unkown Unkown Grad School

Marital Status Married Married Married

Length of Residence 13.5+ years 1.5-3 years 13.5+ years

Gender Male Male Male

Income 100k-149k 250k+ 100k-149k

Home Market Value 500k-749k 1M+ 500k-749k

Credit Rating 750-799 700-749 750-799

Solar Customer? No Yes No

# of Points in Cluster 70 87 93

Page 13: Targeting solar customers using Data Mining Techniques

CONCLUSION

Given marketing initiatives, Complete Solar should

target consumers in San Jose and San Dimas,

consumers with medium to high income and

consumers with large homes.

Page 14: Targeting solar customers using Data Mining Techniques

Thank you for your time!

Questions?