when to use data mining. introduction an important question that should be answered before you...

19
When to use Data Mining

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

When to use Data Mining

Introduction

• An important question that should be answered before you commence any data mining project is whether data mining techniques are, in fact necessary.

• In determining this it is important to understand what level of sophistication of data mining is required. For instance, do you just need a few standardized printed reports or do you need interactive ROI analysis or OLAP analysis to see what your data looks like?

• Do you need or true data mining techniques that build predictive models to search through your database for useful patterns?

The Data Mining Process

What all Data Mining techniques have in common

• Each Data Mining algorithm has the following in common:– Model Structure. The structure that defines the model

(Is it a tree, a neural network, or a neighbor?)

– Search. How does the algorithm amend and modify the model over time as more data is made available

– Validation. When does the algorithm terminate because it has created a valid model?

What all Data Mining techniques have in common (cont’d)

Data Mining in the Business Process

• When Data Mining is used for non-exploratory reasons or whenever supervised learning techniques are used, this customer reaction provide a fairly well-defined target column within the database, which relates to the business process. The target must have the following attributes in order to be successful with data mining:– The target has value– The target is actionable– The effect of action can be captured

Data Mining in the Business Process (cont’d)

Avoiding some big mistakes in Data Mining

• The technology-centered view of the data mining process emphasizes getting the model right, with the assumption that the predictive product has been well-defined and that the data that has been captured to date is well understood.

• This is not always the case.

Three measures for Data Mining Tools

• Accuracy. The data mining tool must produce a model that is as accurate as possible.

• Explanation. The data mining tool needs to be able to ‘explain’ how the model works to the end user in a clear way

• Integration. The data mining tool must integrate with the current business process, and data and information flow in the company.

• When these three requirements are well met, the data mining tools will produce highly profitable models that are likely to remain stable over long periods of time.

Embedded Data Mining for business

How to measure Accuracy, Explanation, and Integration

• Measuring Accuracy:– Accuracy– Error rate– Error rate at rejection– Mean squared error– Lift– Profit/ROI

How to measure Accuracy, Explanation, and Integration

• Measuring Explanation:– Automated rule generation– OLAP integration– Model validation

• Measuring Integrity– Proprietary data extracts– Metadata– Predictor preprocessing– Predictor/prediction types– Dirty data– Missing values– Scalability

What the Future holds for Embedded Data Mining

• Once the data mining process becomes easy enough to use and is seamlessly integrated into business process and the general data and information flow around the enterprise, there will be new applications and synergies that will make data mining an even more critical requirement for any fully functioning data warehouse– Use data mining to improve the multidimensional database

– Use data mining to improve the data warehouse structure

– Multidimensional databases and summary data will enhance data mining performance. The more data, the better any data mining technique is