issues with data mining. 2 data mining involves generalization data mining (machine learning) learns...

15
Issues with Data Mining

Upload: kristopher-burns

Post on 27-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

Issues with Data Mining

Page 2: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

2

Data Mining involves Generalization

• Data mining (Machine Learning) learns generalizations of the instances in training data– E.g. a decision tree learnt from weather data

captures generalizations about the prediction of values for Play attribute

– This means, generalizations predict (or describe) the behaviour of instances beyond the training data

– This in turn means, knowledge is extracted from raw data using data mining

• This knowledge drives end-user’s decision making process

Page 3: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

3

Generalization as Search• The process of generalization can be viewed

as searching a space of all possible patterns or models– For a pattern that fits the data

• This view provides a standard framework for understanding all data mining techniques

• E.g decision tree learning involves searching through all possible decision trees– Lecture 4 shows two example decision trees that

fit the weather data– One of them is a better generalization than the

other (Example 2)

Page 4: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

4

Bias

• Important choices made in a data mining system are– representation language– the language chosen

to represent the patterns or models, – search method – the order in which the space

is searched – model pruning method– the way overfitting to

the training data is avoided• This means, each data mining scheme

involves – Language bias– Search bias– Overfitting-avoidance bias

Page 5: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

5

Language Bias

• Different languages used for representing patterns and models– E.g. rules and decision trees

• A concept fits a subset of training data– That subset can be described as a disjunction of rules– E.g classifier for the weather data can be represented as a

disjunction of rules

• Languages differ in their ability to represent patterns and models– This means, when a language with lower representation

ability is used, the data mining system may not achieve good performance

• Domain knowledge (external to training data) helps to cut down search space

Page 6: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

6

Search Bias• An exhaustive search over the search space is

computationally expensive• Search is speeded up by using heuristics

– Pure children nodes indicate good tree stumps in decision tree learning

• By definition heuristics cannot guarantee optimum patterns or models– Using information gain may mislead us to select a suboptimal

attribute at the root• Complex search strategies possible

– Those that pursue several alternatives parallelly– Those that allow backtracking

• A high-level search bias– General-to-specific: start with a root node and grow the

decision tree to fit the specific data– Specific-to-general: choose specific examples in each class and

then generalize the class by including k-nearest neighbour examples

Page 7: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

7

Overfitting-avoidance bias• We want to search for ‘best’ patterns and models• Simple models are the best• Two strategies

– Start with the simplest model and stop building model when it starts to become complex

– Start with a complex model and prune it to make it simpler

• Each strategy biases search in a different way• Biases are unavoidable in practice

– Each data mining scheme might involve a configuration of biases

– These biases may serve some problems well• There is no universal best learning scheme!

– We saw this in our practicals with Weka

Page 8: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

8

Combining Multiple Models• Because there is no ideal data mining scheme, it is

useful to combine multiple models– Idea of democracy – decisions made based on collective

wisdom– Each data mining scheme acts like an expert using its

knowledge to make decisions• Three general approaches

– Bagging– Boosting– Stacking

• Bagging and boosting both follow the same approach– Take a vote on the class prediction from all the different

schemes– Bagging uses a simple average of votes while boosting

uses a weighted average– Boosting gives more weight to more knowledgeable

experts• Boosting is generally considered the most effective

Page 9: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

9

Bias-Variance Decomposition

• Assume – Infinite training data sets of the same size, n– Infinite number of classifiers trained on the above data

sets

• For any learning scheme– Bias = expected error of the classifier even after

increasing training data infinitely– Variance = expected error due to the particular training

set used

• Total expected error = bias + variance• Combining multiple classifiers decreases the

expected error by reducing the variance component

Page 10: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

10

Bagging

• Bagging stands for bootstrap aggregating– Combines equally weighted predictions from multiple

models

• Bagging exploits instability in learning schemes– Instability – small change in training data results in big

change in model

• Idealized version for classifier– Collect several independent training sets– Build a classifier from each training set

• E.g learn a decision tree from each training set– The class of a test instance is the prediction that received

most votes from all the classifiers

• Practically it is not feasible to obtain several independent training sets

Page 11: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

11

Bagging Algorithm

• Involves two stages• Model Generation

– Let n be the number of instances in the training data

– For each of t iterations• Sample n instances with replacement from training data• Apply the learning algorithm to the sample• Store the resulting model

• Classification– For each of the t models:

• Predict class of instance using model

– Return class that has been predicted most often

Page 12: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

12

Boosting• Multiple data mining methods might complement

each other– Each method performing well on a subset of data

• Boosting combines complementing models– Using weighted voting

• Boosting is iterative– Each new model is built to overcome the deficiencies in

the earlier models• Several variants of boosting

– AdaBoost.M1 – based on the idea of giving weights to instances

• Boosting involves two stages– Model generation– Classification

Page 13: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

13

Boosting• Model generation

– Assign equal weight to each training instance– For each of t iterations:

• Apply learning algorithm to weighted dataset and store resulting model

• Compute error e of model on weighted dataset and store error

• If e=0 or e>=0.5– Terminate model generation

• For each instance in dataset:– If instance classified correctly by model:– Multiply weight of instance by e/(1-e)

• Normalize weight of all instances

• Classification– Assign weight of zero to all classes– For each of the t (or less) models:

• Add –log(e/(1-e)) to weight of class predicted by model

– Return class with highest weight

Page 14: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

14

Stacking• Bagging and boosting combine models of the

same type– E.g. a set of decision trees

• Stacking is applied to models of different types– Because voting may not work when different models do

not perform comparably well, – Voting is problematic when two out of three classifiers

perform poorly• Stacking uses a metalearner to combine different

base learners– Base learners: level-0 models– Meta learner: level-1 model– Predictions of base learners fed as inputs to the meta

learner• Base learner predictions on training data cannot

be input to meta learner– Instead use cross-validation results on base learner

• Because classification is done by base learners, meta learners use simple learning schemes

Page 15: Issues with Data Mining. 2 Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data

15

Combining models using Weka

• Weka offers methods to perform bagging, boosting and stacking over classifiers

• In the Explorer, under the classify tab, expand the ‘meta’ section of the hierarchical menu

• AdaboostM1 (one of the boosting methods) on Iris data classifies only 7 out of 150 incorrectly

• You are encouraged to try these methods on your own