machine learning

Download Machine Learning

Post on 31-Oct-2014

569 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

 

TRANSCRIPT

  • 1. Machine Learning, Data Mining INFO 629 Dr. R. Weber

2. The picnic game

  • How did you reason to find the rule?
  • According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances .

3. Learning

  • Rote Learning
    • Learn multiplication tables
  • Supervised L e a r n i n g
    • Examples are used to help a program identify a concept
    • Examples are typically represented with attribute-value pairs
    • Notion of supervision originates from guidance from examples
  • Unsupervised Learning
    • Human efforts at scientific discovery, theory formation

4. Inductive Learning

  • Learning by generalization
  • Performance of classification tasks
    • Classification, categorization, clustering
  • Rules indicate categories
  • Goal:
    • Characterize a concept

5. Concept Learning is a Form of Inductive Learning

  • Learner uses:
    • positive examples (instances ARE examples of a concept) and
    • negative examples (instances ARE NOT examples of a concept)

6. Concept Learning

  • Needs empirical validation
  • Dense or sparse data determine quality of different methods

7. Validation of Concept Learning i

  • The learned concept should be able to correctly classify new instances of the concept
    • When it succeeds in a real instance of the concept it finds true positives
    • When it fails in a real instance of the concept it finds false negatives

8. Validation of Concept Learning ii

  • The learned concept should be able to correctly classify new instances of the concept
    • When it succeeds in a counterexample it finds true negatives
    • When it fails in a counterexample it finds false positives

9. Basic classification tasks

  • Classification
  • Categorization
  • Clustering

10. Categorization 11. Classification 12. Clustering 13. Clustering

  • Data analysis method applied to data
  • Data should naturally possess groupings
  • Goal: group data into clusters
  • Resulting clusters are collections where objects within a cluster are similar to each other
  • Objects outside the cluster are dissimilar to objects inside
  • Objects from one cluster are dissimilar to objects in other clusters
  • Distance measures are used to compute similarity

14. Rule Learning

  • Learning widely used in data mining
  • Version Space Learning is a search method to learn rules
  • Decision Trees

15. Version Space i

  • A=1,B=1,C=1 Outcome=1
  • A=0,B=.5,C=.5 Outcome=0
  • A=0,B=0,C=.3 Outcome=.5
  • Creates tree that includes all possible combinations
  • Does not learn for rules with disjunctions (i.e. OR statements)
  • Incremental method, trains additional data without the need to retrain all data

16. Decision trees

  • Knowledge representation formalism
  • Represent mutually exclusive rules (disjunction)
  • A way of breaking up a data set into classes or categories
  • Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class

17. Decision trees consist of: - leaf nodes (classes) -decision nodes(tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997 18. Decision tree induction

  • Goal is to correctly classify all example data
  • Several algorithms to induce decision trees:ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5
  • Constructs decision tree from past data
  • Not incremental
  • Attempts to find the simplest tree (not guaranteed because it is based on heuristics)

19.

  • From:
    • a set of target classes
    • Training data containing objects of more than one class
  • ID3 uses test to refine the training data set into subsets that contain objects of only one class each
  • Choosing the right test is the key

ID3 algorithm 20.

  • Information gain or minimum entropy
  • Maximizing information gain corresponds to minimizing entropy
  • Predictive features (good indicators of the outcome)

How does ID3 chooses tests 21. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 22. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 23. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 24. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 25. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 26. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 27. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 28. Explanation-based learning

  • Incorporates domain knowledge into the learning process
  • Feature values are assigned a relevance factor if their values are consistent with domain knowledge
  • Features that are assigned relevance factors are considered in the learning process

29. Familiar Learning Task

  • Learn relative importance of features
  • Goal: learn individual weights
  • Commonly used in case-based reasoning
  • Methods include a similarity measure to get feedback about verify their relative importance: feedback methods
  • Search methods: gradient descent
  • ID3

30. Classificationusing Naive Bayes

  • NaveBayes classifier uses two sources of information to classify a new instance
    • The distribution of the rtaining dataset (prior probability)
    • The region surrounding the new instance in the dataset (likelihood)
  • Nave because assumes conditional independence not always applicable
  • It is made to simplify the computation and in this sense considered to be Nave.
  • Conditional independence reduces the requirement for large number of observations
  • Bias in estimating probabilities often may not make a difference in practice -- it is the order of the probabilities, not their exact values, that determine the classifications.
  • Comparable in performance with classification trees and with neural networks
  • Highly accurate and fast when applied to large databases
  • Some links:
    • http ://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm
    • http://www.statsoft.com/textbook/stnaiveb.html

31. KDD : definition

  • Knowledge Discovery in Databases (KDD)is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000)
  • KDDis the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky-Shapiro, Smyth 1996 p. 6).
  • Data miningis one of the steps in the KDD process.
  • Text mining concerns applying data mining techniques to unstructured text.

32. The KDD Process DATA patterns interpretation SELECTED DATA PROCESSED DATA browsing KNOWLEDGE TRANSFORMED DATA filtering preprocessing transformation

Recommended

View more >