ghostminer wine example włodzisław duch dept. of informatics, nicholas copernicus university,...

GhostMiner Wine exampleGhostMiner Wine exampleGhostMiner Wine exampleGhostMiner Wine example

Włodzisław DuchWłodzisław Duch

Dept. of Informatics, Dept. of Informatics, Nicholas Copernicus University, Nicholas Copernicus University,

Toruń, Toruń, PolandPoland

http://www.phys.uni.torun.pl/~duchhttp://www.phys.uni.torun.pl/~duch

ISEP Porto, 8-12 July 2002

http://www.phys.uni.torun.pl/~duch

GhostMiner PhilosophyGhostMiner Philosophy

• There is no free lunch – provide different type of tools for knowledge discovery. Decision tree, neural, neurofuzzy, similarity-based, committees.

• Provide tools for visualization of data.• Support the process of knowledge discovery/model

building and evaluating, organizing it into projects.

GhostMiner, data mining tools from our lab.

http://www.fqspl.com.pl/ghostminer/

• Separate the process of model building and knowledge discovery from model use =>

GhostMiner Developer & GhostMiner Analyzer.

GM summaryGM summaryGM summaryGM summaryGhost Miner combines 4 basic tools for predictive data Ghost Miner combines 4 basic tools for predictive data mining and understanding of data, avoiding too many mining and understanding of data, avoiding too many choices of parameters (like network structure specs):choices of parameters (like network structure specs):

• IncNet ontogenic neural network using Kalman filter IncNet ontogenic neural network using Kalman filter learning separating each class from all other classes;learning separating each class from all other classes;

• Feature Space Mapping neurofuzzy system producing Feature Space Mapping neurofuzzy system producing logical rules of crisp and fuzzy types.logical rules of crisp and fuzzy types.

• Separability Split Value decision tree.Separability Split Value decision tree.• Weighted nearest neighbor method.Weighted nearest neighbor method.

• K-classifiers and committees of models. K-classifiers and committees of models. • MDS visualization MDS visualization

Wine data exampleWine data example

• alcohol content • ash content • magnesium content • flavanoids content • proanthocyanins

phenols content • OD280/D315 of diluted

wines

• malic acid content • alkalinity of ash • total phenols content • nonanthocyanins

phenols content • color intensity • hue• proline.

Chemical analysis of wine from grapes grown in the same region in Italy, but derived from three different cultivars.Task: recognize the source of wine sample.13 quantities measured, continuous features:

Exploration and visualizationExploration and visualization

Load data (using load icon) and look at general info about the data.

Exploration: dataExploration: data

Inspect the data itself in the raw form.

Exploration: data statisticsExploration: data statisticsLook at distribution of feature values

Note that Proline has very large values, therefore the data should be standardized before further processing.

Exploration: data standardizedExploration: data standardizedStandardized data: unit standard deviation, about 2/3 of all data should fall within [mean-std,mean+std]

Other options: normalize to fit in [-1,+1], or normalize rejecting some extreme values.

Exploration: 1D histogramsExploration: 1D histograms

Distribution of feature values in classes

Some features are more useful than the others.

Exploration: 1D/3D histogramsExploration: 1D/3D histograms

Distribution of feature values in classes, 3D

Exploration: 2D projectionsExploration: 2D projections

Projections (cuboids) on selected 2D

Projections on selected 2D

Visualize data Visualize data

Relations in more than 3D are hard to imagine.

SOM mappings: popular for visualization, but rather inaccurate, no measure of distortions.

Measure of topographical distortions: map all Xi

points from Rn to xi points in Rm, m < n, and ask:

How well are Rij = D(Xi, Xj) distances reproduced by

distances rij = d(xi,xj) ?

Use m = 2 for visualization, use higher m for dimensionality reduction.

Visualize data: MDSVisualize data: MDS

Multidimensional scaling: invented in psychometry by Torgerson (1952), re-invented by Sammon (1969) and myself (1994) … Minimize measure of topographical distortions moving the x coordinates.

2

1 2

2

2

2

3

1MDS

11Sammon

11 MDS, more local

ij iji jij

i j

ij

i jij iji j

ij iji jij

i j

S R rR

rS

R R

S r RR

x x

xx

x x

Visualize data: WineVisualize data: Wine

The green outlier can be identified easily.

3 clusters are clearly distinguished, 2D is fine.

Decision treesDecision trees

Test single attribute, find good point to split the data, separating vectors from different classes. DT advantages: fast, simple, easy to understand, easy to program, many good algorithms.

4 attributes used,

10 errors, 168 correct,

94.4% correct.

Simplest things first: use decision tree to find logical rules.

Decision bordersDecision borders

Multivariate trees: test on combinations of attributes, hyperplanes.

Result: feature space is divided into cuboids.

Wine data: univariate decision tree borders for

proline and flavanoids

Univariate trees: test the value of a single attribute x < a.

Separability Split Value (SSV)Separability Split Value (SSV)

SSV criterion:

• select attribute and split value that maximizes the number of correctly separated pairs from different classes;

• if several equivalent split values exist select one that minimizes the number of pairs split from the same class.

Works on raw data, including symbolic values.

Search for splits using best-first or beam-search method.

Tests are A(x) < T or x {si}

Create tree that classifies all data correctly.

Use crossvalidation to determine how many node to prune or what should be the pruning level.

Wine – SSV 5 rulesWine – SSV 5 rules

Lower pruning leads to more complex tree.

7 nodes, corresponding to 5 rules;

10 errors, mostly Class2/3 wines mixed; check the confusion matrix in “results”.

Wine – SSV optimal rulesWine – SSV optimal rules

Various solutions may be found, depending on the search: 5 rules with 12 premises, making 6 errors, 6 rules with 16 premises and 3 errors, 8 rules, 25 premises, and 1 error.

if OD280/D315 > 2.505 proline > 726.5 color > 3.435 then class 1

if OD280/D315 > 2.505 proline > 726.5 color < 3.435 then class 2

if OD280/D315 < 2.505 hue > 0.875 malic-acid < 2.82 then class 2

if OD280/D315 > 2.505 proline < 726.5 then class 2

if OD280/D315 < 2.505 hue < 0.875 then class 3

if OD280/D315 < 2.505 hue > 0.875 malic-acid > 2.82 then class 3

What is the optimal complexity of rules? Use crossvalidation to estimate generalization.

Neurofuzzy systemNeurofuzzy systemssNeurofuzzy systemNeurofuzzy systemssMLP: discrimination, finds separating surfaces as MLP: discrimination, finds separating surfaces as combination of sigmoidal functions.combination of sigmoidal functions.Fuzzy approach: define MF replacing Fuzzy approach: define MF replacing xx(no/yes) by a degree (no/yes) by a degree xx. .

Typically triangular, trapezoidal, Gaussian Typically triangular, trapezoidal, Gaussian ...... MFMF are are used.used.

M.f-s in many dimensions are constructed using products to determine the threshold of XXconstconst.

Advantage: easy to add a priori knowledge (proper bias); may work well for very small datasets!

Feature Space MappingFeature Space MappingFeature Space MappingFeature Space Mapping

Describe the joint prob. density p(X,C). Describe the joint prob. density p(X,C). Neural adaptation using RBF-like algorithms.Neural adaptation using RBF-like algorithms.Good for logical rules and NN predictive models. Good for logical rules and NN predictive models.

1

; ;i i ii

G X P G X P

Feature Space Mapping (FSM) neurofuzzy system.Feature Space Mapping (FSM) neurofuzzy system.Find best network architecture (number of nodes and Find best network architecture (number of nodes and feature selection) using an ontogenic network feature selection) using an ontogenic network (growing and shrinking) with one hidden layer. (growing and shrinking) with one hidden layer. Use separable rectangular, triangular, Gaussian Use separable rectangular, triangular, Gaussian MFMF..

Initialize using clusterization techniques.

Allow for rotation of Gaussian functions.

Wine – FSM rulesWine – FSM rules

Complexity of rules depends on desired accuracy.

Use rectangular functions for crisp rules. Optimal accuracy may be evaluated using crossvalidation.

FSM discovers simpler rules, for example:

if proline > 929.5 then class 1 (48 cases, 45 correct, 2 recovered by other rules).

if color < 3.79285 then class 2 (63 cases, 60 correct)

SSV: hierarchical rulesFSM: density estimation with feature selection.

IncNetIncNet

Transfer functions: Gaussians or combination of sigmoids (bi-central functions).

Training: use Kalman filter approach to estimate network parameters.

Fast Kalman filter training is usually sufficient.

Always creates one network per class, separating it from other samples.

Creates predictive models equivalent to fuzzy rules.

Incremental Neural Network (IncNet). Ontogenic NN with single hidden layer, adding, removing and merging neurons.

k-nearest neighborsk-nearest neighbors

Similarity functions include Minkovsky and similar functions.

Optimize k, the number of neighbors included.

Optimize the scaling factors of features Wi|Xi-Yi|: this goes beyond feature selection.

Use search-based techniques to find good scaling parameters for features.

Notice that:

For k=1 always 100% on the training set is obtained! To evaluate accuracy on training use leave-one-out procedure.

Use various similarity functions to evaluate how similar new case is to all reference (training) cases, use p(Ci|X) = k(Ci)/k.

Committees and K-classifiersCommittees and K-classifiers

Committees:

combine results from different classification models:

create different models using the same method (for example decision tree) on different data samples (bootstraping);

combine several different models, including other committees, into one model;

use majority voting to decide on the predicted class.

No rules, but stable and accurate classification models.

K-classifiers: in K-class problems create K classifiers, one for each class.

SummarySummarySummarySummaryPlease get your copy fromPlease get your copy from

http://www.fqspl.com.pl/ghostminer/

Ghost Miner combines 4 basic tools for predictive data Ghost Miner combines 4 basic tools for predictive data mining and understanding of data.mining and understanding of data.

GM includes K-classifiers and committees of models.GM includes K-classifiers and committees of models.

GM includes MDS visualization/dimensionality reduction.GM includes MDS visualization/dimensionality reduction.

Model building is separated from model use. Model building is separated from model use.

GM provides tools for easy testing of statistical accuracy.GM provides tools for easy testing of statistical accuracy.

Many new classification models are coming.Many new classification models are coming.

ghostminer wine example włodzisław duch dept. of informatics, nicholas copernicus university,...

Documents

visualization of data

standardized data

d slide

data statistics

data relations

visualization load data

data mining tools

predictive data mining