# machine learning

Post on 31-Oct-2014

569 views

Embed Size (px)

DESCRIPTION

TRANSCRIPT

- 1. Machine Learning, Data Mining INFO 629 Dr. R. Weber

2. The picnic game

- How did you reason to find the rule?

- According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances .

3. Learning

- Rote Learning

- Learn multiplication tables

- Supervised L e a r n i n g

- Examples are used to help a program identify a concept

- Examples are typically represented with attribute-value pairs

- Notion of supervision originates from guidance from examples

- Unsupervised Learning

- Human efforts at scientific discovery, theory formation

4. Inductive Learning

- Learning by generalization

- Performance of classification tasks

- Classification, categorization, clustering

- Rules indicate categories

- Goal:

- Characterize a concept

5. Concept Learning is a Form of Inductive Learning

- Learner uses:

- positive examples (instances ARE examples of a concept) and

- negative examples (instances ARE NOT examples of a concept)

6. Concept Learning

- Needs empirical validation

- Dense or sparse data determine quality of different methods

7. Validation of Concept Learning i

- The learned concept should be able to correctly classify new instances of the concept

- When it succeeds in a real instance of the concept it finds true positives

- When it fails in a real instance of the concept it finds false negatives

8. Validation of Concept Learning ii

- The learned concept should be able to correctly classify new instances of the concept

- When it succeeds in a counterexample it finds true negatives

- When it fails in a counterexample it finds false positives

9. Basic classification tasks

- Classification

- Categorization

- Clustering

10. Categorization 11. Classification 12. Clustering 13. Clustering

- Data analysis method applied to data

- Data should naturally possess groupings

- Goal: group data into clusters

- Resulting clusters are collections where objects within a cluster are similar to each other

- Objects outside the cluster are dissimilar to objects inside

- Objects from one cluster are dissimilar to objects in other clusters

- Distance measures are used to compute similarity

14. Rule Learning

- Learning widely used in data mining

- Version Space Learning is a search method to learn rules

- Decision Trees

15. Version Space i

- A=1,B=1,C=1 Outcome=1

- A=0,B=.5,C=.5 Outcome=0

- A=0,B=0,C=.3 Outcome=.5

- Creates tree that includes all possible combinations

- Does not learn for rules with disjunctions (i.e. OR statements)

- Incremental method, trains additional data without the need to retrain all data

16. Decision trees

- Knowledge representation formalism

- Represent mutually exclusive rules (disjunction)

- A way of breaking up a data set into classes or categories

- Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class

17. Decision trees consist of: - leaf nodes (classes) -decision nodes(tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997 18. Decision tree induction

- Goal is to correctly classify all example data

- Several algorithms to induce decision trees:ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5

- Constructs decision tree from past data

- Not incremental

- Attempts to find the simplest tree (not guaranteed because it is based on heuristics)

19.

- From:

- a set of target classes

- Training data containing objects of more than one class

- ID3 uses test to refine the training data set into subsets that contain objects of only one class each

- Choosing the right test is the key

ID3 algorithm 20.

- Information gain or minimum entropy

- Maximizing information gain corresponds to minimizing entropy

- Predictive features (good indicators of the outcome)

How does ID3 chooses tests 21. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 22. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 23. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 24. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 25. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 26. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 27. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 28. Explanation-based learning

- Incorporates domain knowledge into the learning process

- Feature values are assigned a relevance factor if their values are consistent with domain knowledge

- Features that are assigned relevance factors are considered in the learning process

29. Familiar Learning Task

- Learn relative importance of features

- Goal: learn individual weights

- Commonly used in case-based reasoning

- Methods include a similarity measure to get feedback about verify their relative importance: feedback methods

- Search methods: gradient descent

- ID3

30. Classificationusing Naive Bayes

- NaveBayes classifier uses two sources of information to classify a new instance

- The distribution of the rtaining dataset (prior probability)

- The region surrounding the new instance in the dataset (likelihood)

- Nave because assumes conditional independence not always applicable

- It is made to simplify the computation and in this sense considered to be Nave.

- Conditional independence reduces the requirement for large number of observations

- Bias in estimating probabilities often may not make a difference in practice -- it is the order of the probabilities, not their exact values, that determine the classifications.

- Comparable in performance with classification trees and with neural networks

- Highly accurate and fast when applied to large databases

- Some links:

- http ://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm

- http://www.statsoft.com/textbook/stnaiveb.html

31. KDD : definition

- Knowledge Discovery in Databases (KDD)is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000)

- KDDis the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky-Shapiro, Smyth 1996 p. 6).

- Data miningis one of the steps in the KDD process.

- Text mining concerns applying data mining techniques to unstructured text.

32. The KDD Process DATA patterns interpretation SELECTED DATA PROCESSED DATA browsing KNOWLEDGE TRANSFORMED DATA filtering preprocessing transformation

Recommended