intelligent data mining

Ethem AlpaydınDepartment of Computer Engineering

Boğaziçi University alpaydin@boun.edu.tr

Intelligent Data Mining

What is Data Mining ?

• Search for very strong patterns (correlations, dependencies) in big data that can generalise to accurate future decisions.

• Aka Knowledge discovery in databases, Business Intelligence

Example Applications• Association

“30% of customers who buy diapers also buy beer.” Basket Analysis

• Classification“Young women buy small inexpensive cars.” “Older wealthy men buy big cars.”

• RegressionCredit Scoring

Example Applications

• Sequential Patterns“Customers who latepay two or more of the first three installments have a 60% probability of defaulting.”

• Similar Time Sequences“The value of the stocks of company X has been similar to that of company Y’s.”

Example Applications

• Exceptions (Deviation Detection)“Is any of my customers behaving differently than usual?”

• Text mining (Web mining)“Which documents on the internet are similar to this document?”

IDIS – US Forest Service

• Identifies forest stands (areas similar in age, structure and species composition)

• Predicts how different stands would react to fire and what preventive measures should be taken?

GTE Labs

• KEFIR (Key findings reporter)• Evaluates health-care utilization

costs• Isolates groups whose costs are

likely to increase in the next year. • Find medical conditions for which

there is a known procedure that improves health condition and decreases costs.

Lockheed

• RECON Stock portfolio selection• Create a portfolio of 150-200

securities from an analysis of a DB of the performance of 1,500 securities over a 7 years period.

• Credit Card Fraud Detection• CRIS: Neural Network software

which learns to recognize spending patterns of card holders and scores transactions by risk.“If a card holder normally buys gas and groceries and the account suddenly shows purchase of stereo equipment in Hong Kong, CRIS sends a notice to bank which in turn can contact the card holder.”

ISL Ltd (Clementine) - BBC

• Audience prediction• Program schedulers must be able

to predict the likely audience for a program and the optimum time to show it.

• Type of program, time, competing programs, other events affect audience figures.

Data Mining is NOT Magic!

Data mining draws on the concepts and methods of databases, statistics, and machine learning.

From the Warehouse to the Mine

DataWarehouse

Standardform

TransactionalDatabases Extract,

transform,cleanse data

Define goals,data transformations

How to mine?

Verification Discovery

Computer-assisted, User-directed, Top-down

Query and ReportOLAP (Online Analytical Processing) tools

Automated, Data-driven, Bottom-up

Steps: 1. Define Goal

• Associations between products ?• New market segments or potential

customers?• Buying patterns over time or product

sales trends?• Discriminating among classes of

customers ?

Steps:2. Prepare Data

• Integrate, select and preprocess existing data (already done if there is a warehouse)

• Any other data relevant to the objective which might supplement existing data

Steps:2. Prepare Data (Cont’d)• Select the data: Identify relevant variables• Data cleaning: Errors, inconsistencies,

duplicates, missing data.• Data scrubbing: Mappings, data

conversions, new attributes• Visual Inspection: Data distribution,

structure, outliers, correlations btw attributes

• Feature Analysis: Clustering, Discretization

Steps:3. Select Tool• Identify task class

Clustering/Segmentation, Association, Classification,

Pattern detection/Prediction in time series

• Identify solution classExplanation (Decision trees, rules) vs Black Box (neural network)

• Model assesment, validation and comparisonk-fold cross validation, statistical tests

• Combination of models

Steps:4. Interpretation

• Are the results (explanations/predictions) correct, significant?

• Consultation with a domain expert

Example

• Data as a table of attributes

Income Owns a house? Marital status

25,000 $

Yes Married

Veli 18,000 $ No Married

We would like to be able to explain the value of one attribute in terms of the values of other attributes that are relevant.

Default

Modelling Data

Attributes x are observable

y =f (x) where f is unknown and probabilistic

Building a Model for Data

Learning from Data

Given a sample X={xt,yt}t

we build f*(xt) a predictor to f (xt)

that minimizes the difference between our prediction and actual value

ttt xfyE 2)(*

Types of Applications

• Classification: y in {C1, C2,…,CK}

• Regression: y in Re• Time-Series Prediction: x

temporally dependent

• Clustering: Group x according to similarity

Example

Yearly income

savingsOKDEFAULT

Example Solution

RULE: IF yearly-income> 1 AND savings> 2 THEN OK ELSE DEFAULT

x2 : savings

x1 : yearly-income1

OKDEFAULT

Decision Treesx1 : yearly incomex2 : savingsy = 0: DEFAULTy = 1: OK

x1 > 1

x2 > 2 y = 0

y = 1 y = 0

Clustering

yearly-income

savingsOKDEFAULTType

Type 2 Type 3

Time-Series Prediction

timeJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

PresentPast Future

Discovery of frequent episodes

Methodology

InitialStandardForm

Testset

Trainset

Predictor 1

Predictor 2

Predictor L

Choosebest

Data reduction:Value and featureReductions

Train alternativepredictors ontrain set

Test trainedpredictors ontest data andchoose best

BestPredictor

Acceptbest ifgoodenough

Data Visualisation

• Plot data in fewer dimensions (typically 2) to allow visual analysis

• Visualisation of structure, groups and outliers

Data Visualisation

Yearly income

savings

Exceptions

Techniques for Training Predictors

• Parametric multivariate statistics• Memory-based (Case-based)

Models • Decision Trees• Artificial Neural Networks

Classification

• x : d-dimensional vector of attributes

• C1 , C2 ,... , CK : K classes

• Reject or doubt

• Compute P(Ci|x) from data and

choose k such that P(Ck|x)=maxj P(Cj|x)

Bayes’ Rulep(x|Cj) : likelihood that an object of class j has its features xP(Cj) : prior probability of class jp(x) : probability of an object (of any class) with feature xP(Cj|x) : posterior probability that object with feature x is of class j

Statistical Methods

• Parametric e.g., Gaussian, model for class densities, p(x|Cj)

Univariate

Multivariate

exp)2(

1)|( 1

2/ jjT

djCp μxΣμxΣ

Training a Classifier

• Given data {xt}t of class Cj

Univariate: p(x|Cj) is N (j,j)

Multivariate: p(x|Cj) is Nd (j,j)

j )(ˆ

)ˆ)(ˆ(

μxμx

Example: 1D Case

Example: Different Variances

Example: Many Classes

2D Case: Equal Spheric Classes

Shared Covariances

Different Covariances

Actions and Risks

i : Action i

(i|Cj) : Loss of taking action i when the situation is Cj

R(i |x) = j (i|Cj) P(Cj |x)

Choose k st

R(k |x) = mini R(i |x)

Function Approximation (Scoring)

Regression

where is noise. In linear regression, Find w,w0 st

)|( tt xfy

00),|( wwxwwxf tt 2

00 )(),( wwxywwE t

Linear Regression

Polynomial Regression

• E.g., quadratic

2012 ),,|( wxwxwwwwxf ttt

2012 )(),,( wxwxwywwwE tt

Polynomial Regression

Multiple Linear Regression

• d inputs:

xwTtdd

wxwxwxw

wwwwxxxf

21021 ),,,,|,,,(

221021

),,,,|,,,(

),,,,(

wwwwxxxfy

Feature Selection

• Subset selectionForward and backward

methods• Linear Projection

Principal Components Analysis (PCA)

Linear Discriminant Analysis (LDA)

Sequential Feature Selection

(x1) (x2) (x3) (x4)

(x1 x3) (x2 x3) (x3 x4)

(x1 x2 x3) (x2 x3 x4)

Forward Selection

(x1 x2 x3 x4)

(x1 x2 x3) (x1 x2 x4) (x1 x3 x4) (x2 x3 x4)

(x2 x4) (x1 x4) (x1 x2)

Backward Selection

Principal Components Analysis (PCA)

Whiteningtransform

Linear Discriminant Analysis (LDA)

Memory-based Methods

• Case-based reasoning • Nearest-neighbor algorithms• Keep a list of known instances and

interpolate response from those

Nearest Neighbor

Local Regression

Mixture of Experts

Missing Data

• Ignore cases with missing data• Mean imputation• Imputation by regression

Training Decision Trees

x1 > 1

x2 > 2 y = 0

y = 1 y = 0

Measuring Disorder

Entropy

e rightrightleftleft loglog

Artificial Neural Networks

Regression: IdentityClassification: Sigmoid (0/1)

)( 02211

wwxwxgy

Training a Neural Network

• d inputs:

T xwggo0

tt xwgyoyXE w

Find w that min E on X

tt yX ,xTraining set:

Nonlinear Optimization

Gradient-descent:Iterative learning Starting from random w is learning factor

Neural Networks for ClassificationK outputs oj , j=1,..,KEach oj estimates P (Cj|x)

)exp(11

Tjj sigmoido

Multiple Outputs

xdx2x1

tj xwggo

Iterative Training

ttX yx ,

xoooyw

LinearNonlinear

Nonlinear classification

Linearly separable NOT Linearly separable;requires a nonlineardiscriminant

Multi-Layer Networks

xdx2x1

wKdh0=+1

o1 o2 oK

tj htgo

tp xwsigmoidh

Probabilistic Networks

,...1.0)|(,05.0)|(

Evaluating Learners

1. Given a model M, how can we assess its performance on real (future) data?

2. Given M1, M2, ..., ML which one is the best?

Cross-validation

1 2 3 k-1 k

Repeat k times and average

Combining Learners: Why?

InitialStandardForm

Validationset

Trainset

Predictor 1

Predictor 2

Predictor L

Choosebest

BestPredictor

Combining Learners: How?

InitialStandardForm

Validationset

Trainset

Predictor 1

Predictor 2

Predictor L

Voting

Conclusions:The Importance of Data

• Extract valuable information from large amounts of raw data

• Large amount of reliable data is a must. The quality of the solution depends highly on the quality of the data

• Data mining is not alchemy; we cannot turn stone into gold

Conclusions: The Importance of the Domain Expert

• Joint effort of human experts and computers

• Any information (symmetries, constraints, etc) regarding the application should be made use of to help the learning system

• Results should be checked for consistency by domain experts

Conclusions: The Importance of Being Patient

• Data mining is not straightforward; repeated trials are needed before the system is finetuned.

• Mining may be lengthy and costly. Large expectations lead to large disappointments !

Once again: Important Requirements for Mining

• Large amount of high quality data• Devoted and knowledgable experts on:

1. Application domain2. Databases (Data warehouse)3. Statistics and Machine Learning

• Time and patience

That’s all folks!

intelligent data mining

big data

data relevant

data conversions

missing data

data distribution

data scrubbing

data transformationshow

data contdselect

Documents

exploring threefold adaptivity to intelligent learning...

intelligent data mining...

ethem alpaydın department of computer engineering...

intelligent data mining for turbo-generator predictive...

exploiting data mining techniques for improving the...

intelligent discovery assistance for the data mining...

intelligent pre-processing for data mining

international journal of business intelligence and data...

social data mining to improve bioinspired intelligent...

intelligent statistical data mining with information...

intelligent data mining of vertical profiler readings to...

digital prosumer - identification of personas through...

data mining: weka - university of...

intelligent data mining via evolutionary computing · 2018....

data mining using intelligent systems: an optimized weighted...

making tourist guidance systems more intelligent, adaptive...

intelligent data mining...

intelligent mining solutions - abb

data mining application in diabetes diagnosis using...

intelligent discovery assistance for the data mining process