targeting solar customers using data mining techniques

Post on 11-Jul-2015

274 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MSCI 446 – Data Mining

Targeting potential Complete Solar customers

using data mining algorithms

Wendy D’Souza, Jesse Feld, Sharan Gurkar, Merisa Lee

Collected through .

260,358 data sets for homeowners in California.

125 Complete Solar customer data sets.

DATA

Due to large discrepancy, classification

algorithms ignored the small subset of customers.

Random sampling was performed 5-7 times on

each algorithm for 125 non-customers to create

a full, balanced data set.

125 customers

125non-customers

X 7

UNBALANCED DATA

DATA SETS

Name

Address

City

Pool owner?

Age

Net worth

Education

Marital status

Length of residence

Household income

Home value

Credit rating

Class variable=

Customer

Not a customer

PRISM ON NON-CUSTOMERS

Ran on 7 data sets with approximately 70 rules

generated for each class attribute. As a result, the

top 6 rules were chosen for each sample test.

Rule Occurrences

If household income is between $50,000 and $54,999, then household does

not have solar power7

If home market value is between $225,000 and $249,999, then household does

not have solar power6

If household income is between $35,000 and $39,999, then household does

not have solar power6

If city of residence is Vista, then household does not have solar power 5

If city of residence is Coronado, then household does not have solar power 5

If city of residence is Novato, then household does not have solar power 4

PRISM ON NON-CUSTOMERS

Average Kappa: 0.303

On average, 52.5% of the instances were classified

correctly.

PRISM was also ran on the set of Complete Solar

customers

-> results achieved were not as promising.

-> likely due to variety in the data set.

Ran on the same 7 data sets.

4 out of 7 sets returned city

2 out of 7 sets returned home market value

1 out of 7 sets returned household income

1R

Best predictor

Ran on the same 7 data sets.

4 out of 7 sets returned city

2 out of 7 sets returned home market value

1 out of 7 sets returned household income

1R

Best predictor

Removed the attribute “city” and the kappa value

increased almost ever time with Home Market value

as the best predictor.

Ran on the same 7 data sets.

4 out of 7 sets returned city

2 out of 7 sets returned home market value

1 out of 7 sets returned household income

1R

Best predictor

Removed the attribute “home market value” and

Kappa value decreased every time. This shows the

importance of home market value.

1R

-0.1

0

0.1

0.2

0.3

0.4

0.5

Kappa Value with City Kappa Value without City Kappa Value without HomeMarket Value

Test 1

Test 2

Test 3

Test 4

Test 5

Test 6

Test 7

1R

0

10

20

30

40

50

60

70

80

%Correctly ClassifiedInstances with City

%Correctly ClassifiedInstances without City

%Correctly ClassifiedInstances without Home

Market Value

Test 1

Test 2

Test 3

Test 4

Test 5

Test 6

Test 7

CLUSTERING

Weak attributes: marital status, gender, age, pool

and education

Strong attributes: income, home market value and

city

Cluster 1 Cluster 2 Cluster 3

City SAN DIEGO SAN JOSE SAN DIEGO

Pool Owner? No No No

Age 44.5-53.4 44.5-53.4 44.5-53.4

Education Level Unkown Unkown Grad School

Marital Status Married Married Married

Length of Residence 13.5+ years 1.5-3 years 13.5+ years

Gender Male Male Male

Income 100k-149k 250k+ 100k-149k

Home Market Value 500k-749k 1M+ 500k-749k

Credit Rating 750-799 700-749 750-799

Solar Customer? No Yes No

# of Points in Cluster 70 87 93

CONCLUSION

Given marketing initiatives, Complete Solar should

target consumers in San Jose and San Dimas,

consumers with medium to high income and

consumers with large homes.

Thank you for your time!

Questions?

top related