7. data mining and its applications

36
Data Mining and Its Applications 1 Data Mining and Its Applications Data Mining Techniques – For Marketing, Sales, and Customer Support, by Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, Inc., 1997. Discovering Data Mining from concept to implementation, by Cabena, Harjinian, Stadler, Verhees and Zanasi, Prentice Hall, 1997. Building Data Mining Applications for CRM, by Alex Berson, Stephen Smith and Kurt Thearling, McGraw Hall, 1999. Data Mining Cookbook – Modeling Data for Marketing, Risk, and Customer Relationship Management, by Olivia Parr Rud, John Wiley & Sons, Inc, 2001. Mastering Data Mining – The Art and Science of Customer Relationship management, by Michael J.A. Berry and Gordon S. Linoff, John Wiley & Sons, Inc, 2000. Machine Learning, by Tom M. Mitchell, McGraw-Hill, 1997. Data Mining – Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2001. Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005.

Upload: tommy96

Post on 13-Jan-2015

856 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Data Mining and Its Applications 1

Data Mining and Its Applications

Data Mining Techniques – For Marketing, Sales, and Customer Support, by Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, Inc., 1997.Discovering Data Mining from concept to implementation, by Cabena, Harjinian, Stadler, Verhees and Zanasi, Prentice Hall, 1997.Building Data Mining Applications for CRM, by Alex Berson, Stephen Smith and Kurt Thearling, McGraw Hall, 1999.Data Mining Cookbook – Modeling Data for Marketing, Risk, and Customer Relationship Management, by Olivia Parr Rud, John Wiley & Sons, Inc, 2001.Mastering Data Mining – The Art and Science of Customer Relationship management, by Michael J.A. Berry and Gordon S. Linoff, John Wiley & Sons, Inc, 2000.Machine Learning, by Tom M. Mitchell, McGraw-Hill, 1997. Data Mining – Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2001.Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005.

Data Mining and Its Applications 2

Lots of data is being collected and warehoused Web data, e-commerce purchases at department/

grocery stores Bank/Credit Card

transactions

Computers have become cheaper and more powerful

Competitive Pressure is Strong Provide better, customized services for an edge (e.g. in

Customer Relationship Management)

Why Mine Data?

Data Mining and Its Applications 3

Mining Large Data Sets - Motivation There is often information “hidden” in the data

that is not readily evident Human analysts may take weeks to discover

useful information Much of the data is never analyzed at all

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

1995 1996 1997 1998 1999

The Data Gap

Total new disk (TB) since 1995

Number of analysts

From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”

Data Mining and Its Applications 4

What is Data Mining?

Many Definitions Non-trivial extraction of implicit, previously unknown and

potentially useful information from data Exploration & analysis, by automatic or

semi-automatic means, of large quantities of data in order to discover meaningful patterns

Data Mining and Its Applications 5

What is (not) Data Mining? What is Data Mining?

– Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)

– Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)

What is not Data Mining?

– Look up phone number in phone directory

– Query a Web search engine for information about “Amazon”

Data Mining and Its Applications 6

Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems

Traditional Techniquesmay be unsuitable due to Enormity of data High dimensionality

of data Heterogeneous,

distributed nature of data

Origins of Data Mining

Machine Learning/Pattern

Recognition

Statistics/AI

Data Mining

Database systems

Data Mining and Its Applications 7

Data Mining Tasks

Prediction Methods Use some variables to predict unknown or

future values of other variables.

Description Methods Find human-interpretable patterns that

describe the data.

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

Data Mining and Its Applications 8

Data Mining Tasks...

Classification ClusteringAssociation Rule DiscoverySequential Pattern Discovery

Data Mining and Its Applications 9

The Virtuous Cycle of Data Mining

Measure the results of your efforts to provide insight on how toexploit your data.

Identify business problems andareas where analyzing data can

provide value

Transform data into actionableinformation using data mining

techniques

Act on the information

Taken from a talk given by Michael J.A. Berry on Data Mining for CRM.

Data Mining and Its Applications 10

Some Typical Business Problems Customer profiling Customer segmentation Customer retention Basket analysis (retail) Direct marketing Cross selling Fraud detection

Data Mining and Its Applications 11

Customer Profiling Question

what kinds of customers were profitable in last year? Data

Customer details such as Age, Gender, Occupation, Salary Levels, Account, etc.,

Earnings from customers in last year. Data Mining

Divide customers into profitability categories according to earnings such as highly profitable, profitable, non-profitable, loss.

Find rules using data mining techniques Analyze the rules and take actions

Data Mining and Its Applications 12

Customer Profiling: Rules

IF age > 30 and Age <=45 and

occupation is professional and

salary level is between 50,000 and 70,000

Then this user is profitable

The rules are with some statistic support such as support and confidence.

Data Mining and Its Applications 13

Customer Segmentation

Customer segmentation is a process to divide customers into different groups or segments. Customers in the same segment have similar needs or behaviors so that similar marketing strategies or service policies can be applied to them.

Customer segments are required in several business areas including Marketing Customer services Products and service development Sales promotion Customer retention

Data Mining and Its Applications 14

Life Cycle of a Loan Product

Data Mining and Its Applications 15

Business Objectives

Mellon Bank Corporation is a major financial services company head-quarted in Pittsburgh. Build an extendible loan secured by the values of a

client’s own property. Achieve the highest possible Return On Investment. Based on customers with DDA, build a model for

HELOC.

Data Mining and Its Applications 16

Data Preparaton

The primary data source was the approximately 40,000 Mellon customers who had (or once had) HELCOCs and DDAs.

Data Demographic data sourced both internally and externally

(age, income, length of residence, and other indicators of economic condition)

DDA data (history of loan balance over 3, 6, 9, 12, 18 months, history of returned checks, history of interest rates.

Property data sourced externally (home purchase price, loan-to-value ratio)

Other data related to credit worthiness

Use 120 variables

Data Mining and Its Applications 17

Data Mining and Its Applications 18

Responders

Data Mining and Its Applications 19

Classification

Data Mining and Its Applications 20

Customer Retention Question:

Find out what kinds of customers tend to churn and build a model which can predict the likely-to-churn customers.

Data mining solution: Collect data about the customers who

have churned. Select a set of customers who have been

loyal. Merge the two data sets to form training,

testing and evaluation data sets.

Data Mining and Its Applications 21

More EfficientAcquisition

More Profit

Longer LastingRelationship

More FrequentUp/Cross Sell

Time

Revenue

Loss

Less Loss

Profit

Understanding Customers

Taken from SPSS talk.

Data Mining and Its Applications 22

More EfficientAcquisition

Longer LastingRelationship

Even More Profit

More FrequentUp/Cross Sell

Time

Revenue

Loss

Less Loss

Profit

Understanding Customers

Taken from SPSS talk.

Data Mining and Its Applications 23

Basket Analysis

Data Mining and Its Applications 24

Basket Analysis

Rule

A DC AA C

B & C D

Support

2/52/52/51/5

Confidence

2/32/42/31/3

A B C A C D B C D A D E B C E

Data Mining and Its Applications 25

The Impact of Fraud

GAO (The United States General Accounting Office) cited $19.1 billion in improper government payments in 17 major programs for fiscal year 1998. Medicare $12.6 Billion Supplemental Security Income $1.6 B The Food Stamp Program $1.4 B Old Age and Survival Insurance $1.2 B Disability Insurance $941 Million Housing Subsidies $847 Million Veterans’ Benefits, Unemployment Insurance

and Others $514 Million

Data Mining and Its Applications 26

Background

HIC (The Health Insurance Commission) in Australia is a federal government agency.

HIC pays insurance claims more than 20 million Australian dollars and pay out about A$8 billion in funds every year.

More than 300 million transactions are processed and stored every year. 1.3TB in five year.

Data Mining and Its Applications 27

Preventing Fraud and Abuse

Business Objectives The focus of the HIC project was on the

recent and steady 10% annual rise in the cost of pathology claims for clinical tests.

Approaches To identify potential fraudulent claims or

claims arising from inappropriate practice, and

To develop general profiles of the GP practices in order to compare practice behaviors of individual GPs.

Data Mining and Its Applications 28

Data Proprocessing

Two databases Episode Database

• One Episode record records a patient visit. • In total, 6.8 million records.• There were 227 different pathology tests.

GP (doctor) database• There are 17,000 records related to active GPs

The behavior of 10,409 GPs was to be studied. A matrix of 10,409 by 227 elements. The elements were then scaled from 0 to 1 with

respect to the total number of tests of each kind.

Data Mining and Its Applications 29

Input to Segmentation

Data Mining and Its Applications 30

Overview

Data Mining and Its Applications 31

Data Mining

They conducted association rule mining, when support = 0.25% , the team decided that the presence of some tests in the input database was causing spurious rules to be revealed (Pathology Episode Initiation (PEI)).

PEI tests depend on who ordered them and where they were ordered.

When the PEI tests were removed, the number of rules dropped significantly.

Data Mining and Its Applications 32

Result Analysis

A request for a microscopic examination of feces for parasites (OCP) was associated with a cultural examination of feces (FCS) in 0.85% of cases. A 92.6% chance that if OCP tests were

requested, they would be done with FCS. A 0.61% of chance, OCP was associated

with a different more expensive test called MCS32, which costs A$13.55 per test.

Data Mining and Its Applications 33

GP Profiles

Data Mining and Its Applications 34

Discussions

Segment 13: Represent the majority of traditional GPs

who are practicing conventionally. 5,450 GPs. Total 52% of GPs.

Only 6.2% of the medical pathology tests Segment 4:

54 GPs. Only 0.51% of GPs. 2.7% of the medical pathology tests.

Data Mining and Its Applications 35明报 2004.4.21

Data Mining and Its Applications 36