knowledge discovery & data mining classification & fraud ... · knowledge discovery &...
TRANSCRIPT
Knowledge discovery & data miningClassification & fraud detection
5/24/00
Click here to start
Table of Contents
Knowledge discovery & data miningClassification & fraud detection
Module outline
The classification task
Classification systems and inductive learning
Train & test
Train & test
Training step
Test step
Prediction
Machine learning terminology
Comparing classifiers
Classical example: play tennis?
Module outline
Bayesian classification
Estimating a-posteriori probabilities
Naïve Bayesian Classification
Play-tennis example: estimating P(xi|C)
Author: Dino Pedreschi
Knowledge discovery & data mining Classification & fraud detection
file:///C|/My Documents/2DM_class/index.htm (1 of 3) [5/24/2000 10:49:42 PM]
Play-tennis example: classifying X
The independence hypothesis…
Module outline
Decision trees
Classical example: play tennis?
Decision tree obtained with ID3 (Quinlan 86)
From decision trees to classification rules
Decision tree induction
Generate_DT(samples, attribute_list) =
Criteria for finding the best split
Information gain (ID3 – C4.5)
Information gain (ID3 – C4.5)
Information gain - play tennis example
Gini index (CART)
Gini index - play tennis example
Entropy vs. Gini (on continuous attributes)
Other criteria in decision tree construction
The overfitting problem
Stopping vs. pruning
If dataset is large
If data set is not so large
Categorical vs. continuous attributes
Summarizing …
Scalability to large databases
Module outline
Backpropagation
Knowledge discovery & data mining Classification & fraud detection
file:///C|/My Documents/2DM_class/index.htm (2 of 3) [5/24/2000 10:49:42 PM]
Backpropagation
Prediction and (statistical) regression
Other methods (not covered)
Classification with decision trees
What have we achieved?
References - classification
References - classification
Knowledge discovery & data mining Classification & fraud detection
file:///C|/My Documents/2DM_class/index.htm (3 of 3) [5/24/2000 10:49:42 PM]
First Previous Next Last Index Text
Slide 1 of 50
Knowledge discovery & data mining
file:///C|/My Documents/2DM_class/sld001.htm [5/24/2000 10:50:12 PM]
First Previous Next Last Index Text
Slide 2 of 50
Module outline
file:///C|/My Documents/2DM_class/sld002.htm [5/24/2000 10:50:13 PM]
First Previous Next Last Index Text
Slide 3 of 50
The classification task
file:///C|/My Documents/2DM_class/sld003.htm [5/24/2000 10:50:14 PM]
First Previous Next Last Index Text
Slide 4 of 50
Classification systems and inductive learning
file:///C|/My Documents/2DM_class/sld004.htm [5/24/2000 10:50:15 PM]
First Previous Next Last Index Text
Slide 5 of 50
Train & test
file:///C|/My Documents/2DM_class/sld005.htm [5/24/2000 10:50:16 PM]
First Previous Next Last Index Text
Slide 6 of 50
Train & test
file:///C|/My Documents/2DM_class/sld006.htm [5/24/2000 10:50:17 PM]
First Previous Next Last Index Text
Slide 7 of 50
Training step
file:///C|/My Documents/2DM_class/sld007.htm [5/24/2000 10:50:18 PM]
First Previous Next Last Index Text
Slide 8 of 50
Test step
file:///C|/My Documents/2DM_class/sld008.htm [5/24/2000 10:50:19 PM]
First Previous Next Last Index Text
Slide 9 of 50
Prediction
file:///C|/My Documents/2DM_class/sld009.htm [5/24/2000 10:50:20 PM]
First Previous Next Last Index Text
Slide 10 of 50
Machine learning terminology
file:///C|/My Documents/2DM_class/sld010.htm [5/24/2000 10:50:20 PM]
First Previous Next Last Index Text
Slide 11 of 50
Comparing classifiers
file:///C|/My Documents/2DM_class/sld011.htm [5/24/2000 10:50:21 PM]
First Previous Next Last Index Text
Slide 12 of 50
Classical example: play tennis?
file:///C|/My Documents/2DM_class/sld012.htm [5/24/2000 10:50:23 PM]
First Previous Next Last Index Text
Slide 13 of 50
Module outline
file:///C|/My Documents/2DM_class/sld013.htm [5/24/2000 10:50:24 PM]
First Previous Next Last Index Text
Slide 14 of 50
Bayesian classification
file:///C|/My Documents/2DM_class/sld014.htm [5/24/2000 10:50:24 PM]
First Previous Next Last Index Text
Slide 15 of 50
Estimating a-posteriori probabilities
file:///C|/My Documents/2DM_class/sld015.htm [5/24/2000 10:50:25 PM]
First Previous Next Last Index Text
Slide 16 of 50
Naïve Bayesian Classification
file:///C|/My Documents/2DM_class/sld016.htm [5/24/2000 10:50:26 PM]
First Previous Next Last Index Text
Slide 17 of 50
Play-tennis example: estimating P(xi|C)
file:///C|/My Documents/2DM_class/sld017.htm [5/24/2000 10:50:28 PM]
First Previous Next Last Index Text
Slide 18 of 50
Play-tennis example: classifying X
file:///C|/My Documents/2DM_class/sld018.htm [5/24/2000 10:50:29 PM]
First Previous Next Last Index Text
Slide 19 of 50
The independence hypothesis…
file:///C|/My Documents/2DM_class/sld019.htm [5/24/2000 10:50:30 PM]
First Previous Next Last Index Text
Slide 20 of 50
Module outline
file:///C|/My Documents/2DM_class/sld020.htm [5/24/2000 10:50:31 PM]
First Previous Next Last Index Text
Slide 21 of 50
Decision trees
file:///C|/My Documents/2DM_class/sld021.htm [5/24/2000 10:50:32 PM]
First Previous Next Last Index Text
Slide 22 of 50
Classical example: play tennis?
file:///C|/My Documents/2DM_class/sld022.htm [5/24/2000 10:50:33 PM]
First Previous Next Last Index Text
Slide 23 of 50
Decision tree obtained with ID3 (Quinlan 86)
file:///C|/My Documents/2DM_class/sld023.htm [5/24/2000 10:50:34 PM]
First Previous Next Last Index Text
Slide 24 of 50
From decision trees to classification rules
file:///C|/My Documents/2DM_class/sld024.htm [5/24/2000 10:50:35 PM]
First Previous Next Last Index Text
Slide 25 of 50
Decision tree induction
file:///C|/My Documents/2DM_class/sld025.htm [5/24/2000 10:50:36 PM]
First Previous Next Last Index Text
Slide 26 of 50
Generate_DT(samples, attribute_list) =
file:///C|/My Documents/2DM_class/sld026.htm [5/24/2000 10:50:37 PM]
First Previous Next Last Index Text
Slide 27 of 50
Criteria for finding the best split
file:///C|/My Documents/2DM_class/sld027.htm [5/24/2000 10:50:38 PM]
First Previous Next Last Index Text
Slide 28 of 50
Information gain (ID3 – C4.5)
file:///C|/My Documents/2DM_class/sld028.htm [5/24/2000 10:50:39 PM]
First Previous Next Last Index Text
Slide 29 of 50
Information gain (ID3 – C4.5)
file:///C|/My Documents/2DM_class/sld029.htm [5/24/2000 10:50:40 PM]
First Previous Next Last Index Text
Slide 30 of 50
Information gain - play tennis example
file:///C|/My Documents/2DM_class/sld030.htm [5/24/2000 10:50:42 PM]
First Previous Next Last Index Text
Slide 31 of 50
Gini index (CART)
file:///C|/My Documents/2DM_class/sld031.htm [5/24/2000 10:50:43 PM]
First Previous Next Last Index Text
Slide 32 of 50
Gini index - play tennis example
file:///C|/My Documents/2DM_class/sld032.htm [5/24/2000 10:50:44 PM]
First Previous Next Last Index Text
Slide 33 of 50
Entropy vs. Gini (on continuous attributes)
file:///C|/My Documents/2DM_class/sld033.htm [5/24/2000 10:50:45 PM]
First Previous Next Last Index Text
Slide 34 of 50
Other criteria in decision tree construction
file:///C|/My Documents/2DM_class/sld034.htm [5/24/2000 10:50:46 PM]
First Previous Next Last Index Text
Slide 35 of 50
The overfitting problem
file:///C|/My Documents/2DM_class/sld035.htm [5/24/2000 10:50:47 PM]
First Previous Next Last Index Text
Slide 36 of 50
Stopping vs. pruning
file:///C|/My Documents/2DM_class/sld036.htm [5/24/2000 10:50:48 PM]
First Previous Next Last Index Text
Slide 37 of 50
If dataset is large
file:///C|/My Documents/2DM_class/sld037.htm [5/24/2000 10:50:49 PM]
First Previous Next Last Index Text
Slide 38 of 50
If data set is not so large
file:///C|/My Documents/2DM_class/sld038.htm [5/24/2000 10:50:50 PM]
First Previous Next Last Index Text
Slide 39 of 50
Categorical vs. continuous attributes
file:///C|/My Documents/2DM_class/sld039.htm [5/24/2000 10:50:51 PM]
First Previous Next Last Index Text
Slide 40 of 50
Summarizing …
file:///C|/My Documents/2DM_class/sld040.htm [5/24/2000 10:50:52 PM]
First Previous Next Last Index Text
Slide 41 of 50
Scalability to large databases
file:///C|/My Documents/2DM_class/sld041.htm [5/24/2000 10:50:53 PM]
First Previous Next Last Index Text
Slide 42 of 50
Module outline
file:///C|/My Documents/2DM_class/sld042.htm [5/24/2000 10:50:53 PM]
First Previous Next Last Index Text
Slide 43 of 50
Backpropagation
file:///C|/My Documents/2DM_class/sld043.htm [5/24/2000 10:50:54 PM]
First Previous Next Last Index Text
Slide 44 of 50
Backpropagation
file:///C|/My Documents/2DM_class/sld044.htm [5/24/2000 10:50:55 PM]
First Previous Next Last Index Text
Slide 45 of 50
Prediction and (statistical) regression
file:///C|/My Documents/2DM_class/sld045.htm [5/24/2000 10:50:56 PM]
First Previous Next Last Index Text
Slide 46 of 50
Other methods (not covered)
file:///C|/My Documents/2DM_class/sld046.htm [5/24/2000 10:50:57 PM]
First Previous Next Last Index Text
Slide 47 of 50
Classification with decision trees
file:///C|/My Documents/2DM_class/sld047.htm [5/24/2000 10:50:58 PM]
First Previous Next Last Index Text
Slide 48 of 50
What have we achieved?
file:///C|/My Documents/2DM_class/sld048.htm [5/24/2000 10:50:59 PM]
First Previous Next Last Index Text
Slide 49 of 50
References - classification
file:///C|/My Documents/2DM_class/sld049.htm [5/24/2000 10:51:01 PM]