part ii tools for knowledge discovery. knowledge discovery in databases chapter 5

Post on 20-Dec-2015

228 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Part II

Tools for

Knowledge Discovery

Knowledge Discovery in Databases

Chapter 5

5.1 A KDD Process Model

Figure 5.1 A seven-step KDD process model

Step 3: Data Preprocessing

CleansedData

Step 2: Create Target Data

DataWarehouse

TargetData

Step 1: Goal Identification

DefinedGoals

Step 4: Data Transformation

TransformedData

Step 7: Taking Action

Step 6: Interpretation & EvaluationStep 5: Data Mining

DataModel

Transactional

Database

FlatFile

Figure 5.2 Applyiing the scientific method to data mining

The Scientific Method

Define the Problem

A KDD Process Model

Take Action

Interpretation / Evaluation

Create Target DataData PreprocessingData TransformationData Mining

Identify the Goal

Verifiy Conclusions

Draw Conclusions

Perform an Experiment

Formulate a Hypothesis

{

Step 1: Goal Identification

• Define the Problem.

• Choose a Data Mining Tool.

• Estimate Project Cost.

• Estimate Project Completion Time.

• Address Legal Issues.

• Develop a Maintenance Plan.

Step 2: Creating a Target Dataset

Figure 5.3 The Acme credit card database

Step 3: Data Preprocessing

• Noisy Data

• Missing Data

Noisy Data

• Locate Duplicate Records.

• Locate Incorrect Attribute Values.

• Smooth Data.

Preprocessing Missing Data

• Discard Records With Missing Values.

• Replace Missing Real-valued Items With the Class Mean.

• Replace Missing Values With Values Found Within Highly Similar Instances.

Processing Missing Data While Learning

• Ignore Missing Values.

• Treat Missing Values As Equal Compares.

• Treat Missing values As Unequal Compares.

Step 4: Data Transformation

• Data Normalization

• Data Type Conversion

• Attribute and Instance Selection

Data Normalization

• Decimal Scaling

• Min-Max Normalization

• Normalization using Z-scores

• Logarithmic Normalization

Attribute and Instance Selection

• Eliminating Attributes

• Creating Attributes

• Instance Selection

Table 5.1 • An Initial Population for Genetic Attribute Selection

Population Income Magazine Watch Credit CardElement Range Promotion Promotion Insurance Sex Age

1 1 0 0 1 1 12 0 0 0 1 0 13 0 0 0 0 1 1

Step 5: Data Mining

1. Choose training and test data.

2. Designate a set of input attributes.

3. If learning is supervised, choose one or more output attributes.

4. Select learning parameter values.

5. Invoke the data mining tool.

Step 6: Interpretation and Evaluation

• Statistical analysis.

• Heuristic analysis.

• Experimental analysis.

• Human analysis.

Step 7: Taking Action

• Create a report.

• Relocate retail items.

• Mail promotional information.

• Detect fraud.

• Fund new research.

5.9 The Crisp-DM Process Model

1. Business understanding

2. Data understanding

3. Data preparation

4. Modeling

5. Evaluation

6. Deployment

5.10 Experimenting with ESX

A Four-Step Model for Knowledge Discovery

1. Identify the goal.

2. Prepare the data.

3. Apply data mining.

4. Interpret and evaluate the results.

Experiment 1: Attribute Evaluation

*Applying the Four-Step Process Model to the Credit Screening

Dataset*

Table 5.2 • A Confusion Matrix for Credit Card Screening

Computed ComputedAccept Reject

Accept 115 38Reject 35 152

Table 5.3 • Test Set Results for a Most Typical Training Model

Computed ComputedAccept Reject

Accept 98 55Reject 25 162

Experiment 2: Parameter Evaluation

*Applying the Four-Step Process Model to the Satellite Image

Dataset*

Figure 5.4 Satellite image data

top related