business intelligence data mining techniques as tools for business intelligence

38
Business Intelligence Business Intelligence Data Mining Techniques As Tools for Data Mining Techniques As Tools for Business Intelligence Business Intelligence

Upload: august-kelley

Post on 16-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

Business IntelligenceBusiness Intelligence

Data Mining Techniques As Tools for Data Mining Techniques As Tools for

Business IntelligenceBusiness Intelligence

Page 2: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

2

IntroductionIntroduction

Motivation: Why data mining?

What is data mining?

Data Mining: On what kind of data?

Data mining functionality

Are all the patterns interesting?

Classification of data mining systems

Page 3: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

3

What Is Data Mining?What Is Data Mining?

Data mining (knowledge discovery in databases): Extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful) information or patterns from data in large databases

Alternative names and their “inside stories”: Data mining: a misnomer? Knowledge discovery(mining) in databases

(KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Page 4: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

4

Why Data Mining? — Why Data Mining? — Potential ApplicationsPotential Applications

Database analysis and decision support Market analysis and management

target marketing, customer relation management, market basket analysis, cross selling, market segmentation

Risk analysis and management Forecasting, customer retention, improved

underwriting, quality control, competitive analysis Fraud detection and management

Other Applications Text mining (news group, email, documents) and

Web analysis. Intelligent query answering

Page 5: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

5

Market Analysis and Management Market Analysis and Management (1)(1)

Where are the data sources for analysis? Credit card transactions, loyalty cards, discount coupons,

customer complaint calls, plus (public) lifestyle studies

Target marketing Find clusters of “model” customers who share the same

characteristics: interest, income level, spending habits, etc.

Determine customer purchasing patterns over time Conversion of single to a joint bank account: marriage, etc.

Cross-market analysis Associations/co-relations between product sales Prediction based on the association information

Page 6: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

6

Market Analysis and Management Market Analysis and Management (2)(2)

Customer profiling data mining can tell you what types of customers

buy what products (clustering or classification) Identifying customer requirements

identifying the best products for different customers

use prediction to find what factors will attract new customers

Provides summary information various multidimensional summary reports statistical summary information (data central

tendency and variation)

Page 7: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

7

Corporate Analysis and Risk Corporate Analysis and Risk ManagementManagement Finance planning and asset evaluation

cash flow analysis and prediction contingent claim analysis to evaluate assets cross-sectional and time series analysis (financial-

ratio, trend analysis, etc.) Resource planning:

summarize and compare the resources and spending

Competition: monitor competitors and market directions group customers into classes and a class-based

pricing procedure set pricing strategy in a highly competitive market

Page 8: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

8

Fraud Detection and Management Fraud Detection and Management (1)(1) Applications

widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc.

Approach use historical data to build models of fraudulent behavior

and use data mining to help identify similar instances Examples

auto insurance: detect a group of people who stage accidents to collect on insurance

money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network)

medical insurance: detect professional patients and ring of doctors and ring of references

Page 9: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

9

Fraud Detection and Management Fraud Detection and Management (2)(2) Detecting inappropriate medical treatment

Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).

Detecting telephone fraud Telephone call model: destination of the call,

duration, time of day or week. Analyze patterns that deviate from an expected norm.

British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.

Retail Analysts estimate that 38% of retail shrink is due to

dishonest employees.

Page 10: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

10

Data mining: the core of knowledge discovery process.

Data Mining: A KDD ProcessData Mining: A KDD Process

DB-03

DB-01DB-01

DATA SOURCES

DATAWAREHOUSEData

Pre-Processing

DataSelection

Data Integration

Task RelevantData

DataMining

100%

90%

80%

70%

60%

50%

40%

30%

40%

50%

DM Models

ModelEvaluation

KNOWLEDGE

Feedback: Knowledge Integration

Page 11: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

11

Learning the application domain: relevant prior knowledge and goals of application

Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation:

Find useful features, dimensionality/variable reduction, invariant representation.

Choosing functions of data mining Summarization, classification, regression, association,

clustering. Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation

Visualization, transformation, removing redundant patterns, etc.

Deployement: Use of discovered knowledge

Steps of a KDD ProcessSteps of a KDD Process

Page 12: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

12

Standardized Data Mining Standardized Data Mining ProcessesProcesses

Step 1: Business Understanding

Determine the business objectives

Assess the situation Determine the data

mining goals Produce a project plan

Cross-Industry Standard Process for Data Mining CRISP-DM

Page 13: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

13

Standardized Data Mining Standardized Data Mining ProcessesProcesses

Step 2: Data Understanding

Collect the initial data

Describe the data Explore the data Verify the data

Cross-Industry Standard Process for Data Mining CRISP-DM

Page 14: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

14

Standardized Data Mining Standardized Data Mining ProcessesProcesses

Step 3: Data Preparation Select data Clean data Construct data Integrate data Format data

Cross-Industry Standard Process for Data Mining CRISP-DM

Page 15: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

15

Standardized Data Mining Standardized Data Mining ProcessesProcesses

Step 4: Modeling Select the modeling

technique Generate test

design Build the model Assess the model

Cross-Industry Standard Process for Data Mining CRISP-DM

Page 16: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

16

Standardized Data Mining Standardized Data Mining ProcessesProcesses

Step 5: Evaluation Evaluate results Review process Determine next step

Cross-Industry Standard Process for Data Mining CRISP-DM

Page 17: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

17

Standardized Data Mining Standardized Data Mining ProcessesProcesses

Step 6: Deployment Plan deployment Plan monitoring and

maintenance Produce final report Review the project

Cross-Industry Standard Process for Data Mining CRISP-DM

Page 18: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

18

Architecture of a Architecture of a Typical Data Mining SystemTypical Data Mining System

USER

INTERFACE

Best Data Mining Tool

DATA

WAREHOUSE

Statistical Components:

. Data Cleaning

. Data Transformation

. Exploratory Analysis

. Factor Analysis

. ...

Data Mining Components:

. Decision Trees

. Association Rules

. Clustering

. Visualization

. ...

Output

Input

User

DataSources

DomainKnowledge

Base

Data . Cleaning . Integration . Transformatin

Data . Cleaning . Integration . Transformatin

Page 19: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

19

Data Mining Functionalities (1)Data Mining Functionalities (1)

Concept description: Characterization and discrimination

Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions

Association (correlation and causality) Multi-dimensional vs. single-dimensional association age(X, “20..29”) ^ income(X, “20..29K”) buys(X,

“PC”) [support = 2%, confidence = 60%] contains(T, “computer”) contains(x, “software”)

[1%, 75%]

Page 20: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

20

Classification and Prediction Finding models (functions) that describe and

distinguish classes or concepts for future prediction E.g., classify countries based on climate, or classify

cars based on gas mileage Presentation: decision-tree, classification rule, ANN Prediction: Predict some unknown or missing numerical

values Cluster analysis

Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns

Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity

Data Mining Functionalities (2)Data Mining Functionalities (2)

Page 21: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

21

Data Mining Functionalities (3)Data Mining Functionalities (3)

Outlier analysis Outlier: a data object that does not comply with the

general behavior of the data It can be considered as noise or exception but is

quite useful in fraud detection, rare events analysis

Trend and evolution analysis Trend and deviation: regression analysis Sequential pattern mining, periodicity analysis Similarity-based analysis

Other pattern-directed or statistical analyses

Page 22: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

22

Data Mining: Data Mining: Combination of Multiple DisciplinesCombination of Multiple Disciplines

STATISTICS

MACHINELEARNING

DATABASETECHNOLOGY

VISUALIMAGING

OTHERDISCIPLINES

INFORMATIONSCIENCE

DATAMINING

Page 23: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

23

A Multi-Dimensional View of Data A Multi-Dimensional View of Data Mining ClassificationMining Classification Databases to be mined

Relational, transactional, object-oriented, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc.

Knowledge to be extracted Characterization, discrimination, association, classification,

clustering, trend, deviation and outlier analysis, etc. Multiple/integrated functions and mining at multiple levels

Techniques to utilized Database-oriented, data warehouse (OLAP), machine

learning, statistics, visualization, neural network, etc. Applications adapted

Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, Weblog analysis, etc.

Page 24: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

24

Poll: Which data mining Poll: Which data mining technique..?technique..?

Page 25: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

25

1. Association1. Association

Market Basket Analysis

Page 26: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

26

Association RulesAssociation Rules

A.K.A. Association rule mining Mining association rules from transactional

databases using Apriory algorithm Other Methods…

Mining multilevel association rules from transactional databases

Mining multidimensional association rules from transactional databases and data warehouse

From association mining to correlation analysis Constraint-based association mining

Page 27: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

27

What Is Association Mining?What Is Association Mining?

Association rule mining: Finding frequent patterns, associations,

correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

Applications: Market basket analysis, cross-marketing, catalog

design, loss-leader analysis, clustering, classification, etc.

Examples: Rule form: “Body Head [support, confidence]”

buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%] major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%,

75%]

Page 28: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

28

2. What is Cluster Analysis?2. What is Cluster Analysis?

Cluster: a collection of data objects… Similar to one another within the same

cluster Dissimilar to the objects in other clusters

Cluster analysis Grouping a set of data objects into clusters

Clustering is unsupervised classification: no predefined classes

Typical applications As a stand-alone tool to get insight into data

distribution As a preprocessing step for other algorithms

Page 29: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

29

General Applications of ClusteringGeneral Applications of Clustering

Pattern Recognition Spatial Data Analysis Image Processing Economic Science WWW

Document classification Cluster Weblog data to discover

groups of similar access patterns

Page 30: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

30

Examples of Clustering ApplicationsExamples of Clustering Applications

Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs

Land use: Identification of areas of similar land use in an earth observation database

Insurance: Identifying groups of motor insurance policy holders with a high average claim cost

City-planning: Identifying groups of houses according to their house type, value, and geographical location

Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults

Page 31: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

31

The K-Means Clustering MethodThe K-Means Clustering Method

Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Page 32: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

32

Decision tree… A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution

Decision tree generation consists of two phases Tree construction

At start, all the training examples are at the root Examples are recursively partitioned based on selected

attributes Tree pruning

Identify and remove branches that reflect noise or outliers

Use of decision tree: Classifying an unknown sample

3. Decision Tree Induction3. Decision Tree Induction

Page 33: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

33

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

This follows an example from Quinlan’s ID3

Training DatasetTraining Dataset

Page 34: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

34

age?

overcast

student? credit rating?

no yes fairexcellent

<=30 >40

no noyes yes

yes

30..40

Output:Output:A Decision Tree for Credit ApprovalA Decision Tree for Credit Approval

Page 35: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

35

4. Neural Networks4. Neural Networks

Advantages prediction accuracy is generally high robust, works when training examples

contain errors output may be discrete, real-valued, or a

vector of several discrete or real-valued attributes

fast evaluation of the learned target function Criticism

long training time difficult to understand the learned function not easy to incorporate domain knowledge

Page 36: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

36

The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping

k-

f

weighted sum

Inputvector x

output y

Activationfunction

weightvector w

w0

w1

wn

x0

x1

xn

A NeuronA Neuron

Page 37: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

37

Multi-Layer PerceptronMulti-Layer Perceptron

INPUTLAYER

(4 Neurons)

HIDDENLAYER I(4 PEs)

HIDDENLAYER II(3 PEs)

OUTPUTLAYER

(2 Neurons)

Page 38: Business Intelligence Data Mining Techniques As Tools for Business Intelligence

38

Applications of Neural NetworksApplications of Neural Networks

Financial Decision making Fraud Detection Bankruptcy Problem Weather Forcasting Feature Detection Voice Recognition