cpsc 810: machine learning concept learning and general to specific ordering
TRANSCRIPT
CpSc 810: Machine Learning
Concept Learning and General to Specific Ordering
2
Copy Right Notice
Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!
3
Review
Choose the training experience
Choose the target function
Choose a representation
Choose the parameter fitting algorithm
Evaluate the entire system
4
What is a Concept?
A Concept is a subset of objects or events defined over a larger set
Example: The concept of a bird is the subset of all objects (i.e., the set of all things or all animals) that belong to the category of bird.
Alternatively, a concept is a Boolean-valued function defined over this larger set
Example: a function defined over all animals whose value is true for birds and false for every other animal.
Things
Animals
Birds
Cars
5
What is Concept-Learning?
Given a set of examples labeled as members or non-members of a concept, concept-learning consists of automatically inferring the general definition of this concept.
In other words, concept-learning is inferring a Boolean-valued function from training examples of its input and output.
6
Example of a Concept Learning task
Concept: Good Days for Water Sports (values: Yes, No)
Attributes/Features:Sky (values: Sunny, Cloudy, Rainy)AirTemp (values: Warm, Cold)Humidity (values: Normal, High)Wind (values: Strong, Weak)Water (Warm, Cool)Forecast (values: Same, Change)
A training example:
<Sunny, Warm, High, Strong, Warm, Same, Yes>
class
7
Example of a Concept Learning task
Training Examples
The task is to learn to predict the value of EnjoySport for an arbitrary day.
Learn the general concept of EnjoySport
Day Sky AirTemp Humidity Wind Water Forecast WaterSport 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes
8
Terminology and Notation
Instances X: the set of items over which the concept is defined is called the set of instances
The concept to be learned is called the Target Concept (denoted by c: X--> {0,1})
The set of Training Examples D is a set of instances, x, along with their target concept value c(x).
Members of the concept (instances for which c(x)=1) are called positive examples.
Nonmembers of the concept (instances for which c(x)=0) are called negative examples.
9
Terminology and Notation
Given a set of training examples of the target concept c, the problem faced by the learner is to hypothesize, or estimate c.
Hypotheses Space H represents the set of all possible hypotheses.
H is determined by the human designer’s choice of a hypothesis representation.Each Hypothesis h in H is described by a conjunction of constraints on the attributes.
The goal of concept-learning is to find a hypothesis h:X --> {0,1} such that h(x)=c(x) for all x in X.
10
Representing Hypotheses
For each attribute, the constrain can beA specific value (e.g. Water = Warm)“?” means “any value is acceptable” “0” means “no value is acceptable”
Each hypothesis consists of a conjunctions of constraints on attributes.
Example of a hypothesis: <?,Cold, High,?,?,?>If the air temperature is cold and the humidity high then it is a good day for water sports)
Selecting a Hypothesis Representation is an important step since it restricts (or biases) the space that can be searched.
For example, the hypothesis “If the air temperature is cold or the humidity high then it is a good day for water sports” cannot be expressed in our chosen representation.
11
Prototypical Concept Learning Task
12
The inductive learning hypothesis
The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
13
Concept Learning as Search
Concept Learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation.
Goal of the search is to find the hypotheses that best fits the training examples.
What is the possible hypotheses for the EnjoySport learning task
Syntactically distinct hypotheses:5*4*4*4*4*4=5120
Semantically distinct hypotheses (every hypothesis containing one or more “0” represents the empty set of instances):1+(4*3*3*3*3*3)=973
14
General to Specific Ordering of Hypotheses
Most General Hypothesis: Everyday is a good day for water sports <?,?,?,?,?,?>
Most Specific Hypothesis: No day is a good day for water sports <0,0,0,0,0,0>
Definition: Let hj and hk be boolean-valued functions defined over X. Then hj is more-general-than-or-equal-to hk iff for all x in X, [(hk(x) = 1) --> (hj(x)=1)]
Example: h1 = <Sunny,?,?,Strong,?,?>h2 = <Sunny,?,?,?,?,?>
Every instance that are classified as positive by h1 will also be classified as positive by h2 in our example data set. Therefore h2 is more general than h1.
15
Instances, Hypotheses, more_general_than
16
Find-S, a Maximally Specific Hypothesis Learning Algorithm
Initialize h to the most specific hypothesis in H
For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
then do nothing
else replace ai in h by the next more general constraint that is satisfied by x
Output hypothesis h
17
Hypothesis Space Search by Find-S
18
Shortcomings of Find-S
Although Find-S finds a hypothesis consistent with the training data, it does not indicate whether that is the only one available
Is it a good strategy to prefer the most specific hypothesis?
What if the training set is inconsistent (noisy)?
What if there are several maximally specific consistent hypotheses? Find-S cannot backtrack!
19
Version Spaces
A hypothesis h is consistent with a set of training examples D of target concept c iff h(x) = c(x) for each example <x,c(x)> in D.
The version space, denoted VSH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.
20
The List-Then-Eliminate Algorithm
1. VersionSpace <- a list containing every hypothesis in H
2. For each training example, <x, c(x)> remove from VersionSpace any hypothesis h for which h(x)≠c(x).
3. Output the list of hypotheses in VersionSpace
21
Example Version Space
Version Space for our training examples.
Day Sky AirTemp Humidity Wind Water Forecast WaterSport 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes
22
Discussions on List-Then-Eliminate
It works whenever the hypothesis space H is finite.
It is guaranteed to output all hypotheses consistent with the training data.
Shortcoming: the exhaustively enumerating all hypotheses in H is an unrealistic requirement.
23
A Compact Representation for Version Spaces
Instead of enumerating all the hypotheses consistent with a training set, we can represent its most specific and most general boundaries. The hypotheses included in-between these two boundaries can be generated as needed.
Definition: The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D.
Definition: The specific boundary S, with respect to hypothesis space H and training data D, is the set of minimally general (i.e., maximally specific) members of H consistent with D.
24
Version space representation theorem
Every member of the version space lies between these boundaries.
Where x≥y means x is more general or equal to y.Proof:
1. every h satisfying the right-hand side of the above expression is in VSH,D,
2. every member of VSH,D satisfied the right-hand side of expression.
25
Candidate-Elimination Learning Algorithm
26
Candidate-Elimination Trace
27
Candidate-Elimination Trace
28
Candidate-Elimination Trace
29
Candidate-Elimination Trace
30
Candidate-Elimination Trace
31
Remarks on Version Spaces and Candidate-Elimination
The candidate-Elimination algorithm computes the version space containing all (and only those) hypotheses from H that are consistent with an observed sequence of training examples.
The version space learned by the Candidate-Elimination Algorithm will converge toward the hypothesis that correctly describes the target concept provided: (1) There are no errors in the training examples; (2) There is some hypothesis in H that correctly describes the target concept.
Convergence can be speeded up by presenting the data in a strategic order. The best examples are those that satisfy exactly half of the hypotheses in the current version space.
Version-Spaces can be used to assign certainty scores to the classification of new examples
32
How Should These Be Classified?
33
A Biased Hypothesis Space
Given our previous choice of the hypothesis space representation, no hypothesis is consistent with the following examples: we have BIASED the learner to consider only conjunctive hypotheses
In this case, we need a more expressive hypothesis space
34
An Un-Biased Learner
In order to solve the problem caused by the bias of the hypothesis space, we can remove this bias and allow the hypotheses to represent every possible subset of instances by allowing arbitrary disjunctions, conjunctions and negations of previous hypotheses. The previous exmaples could then be expressed as:
<Sunny, ?,?,?,?,?> v <Cloudy,?,?,?,?,?,?>
However, such an unbiased learner is not able to generalize beyond the observed examples!!!! All the non-observed examples will be well-classified by half the hypotheses of the version space and misclassified by the other half.
35
A Example of Un-Biased Learner
Rote-Learner: Learning corresponds simply to storing each observed training example in memory. Subsequent instances are classified by looking them up in memory. If the instance is found in memory, the stored classification is returned. Otherwise, the system refuses to classify the new instance.
36
The Futility of Bias-Free Learning
Fundamental Property of Inductive Learning A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances.In EnjoySport, there are implicit assumption that the target concept could be represented by a conjunction of attribute values.
We constantly have recourse to inductive biases
Example: we all know that the sun will rise tomorrow. Although we cannot deduce that it will do so based on the fact that it rose today, yesterday, the day before, etc., we do take this leap of faith or use this inductive bias, naturally!
37
The Futility of Bias-Free Learning
Notation:A |- B indicates that B follows deductively from A.A B indicates that B follows inductively from A.
The inductive inference step:
),()( ciic DxLxD
38
Inductive Bias
ConsiderConcept learning algorithm LInstances X, target concept cTraining examples Dc = {<x, c(x)>}Let L(xi, Dc) denote the classification assigned to the instance xi by L after training on data Dc
Definition:The inductive bias of L is any minimal set of additional assumptions B, for any target concept c and corresponding training examples Dc, sufficient to justify its inductive inferences as deductive inferences.
Inductive bias of Candidate-Elimination algorithm: The target concept c is contained in the given hypothesis space H.
),(|))[(( ciici DxLxDBXx
39
Inductive Systems and Equivalent Deductive Systems
40
Ranking Inductive Learners according to their Biases
Rote-Learner: This system simply memorizes the training data and their classification--- No generalization is involved.Candidate-Elimination: New instances are classified only if all the hypotheses in the version space agree on the classificationFind-S: New instances are classified using the most specific hypothesis consistent with the training data
BiasStrength
Weak
Strong
41
Summary Points
Concept learning as search through H
General to specific ordering over H
Version space candidate elimination algorithm
S and G boundaries characterize learner’s uncertainty
Learner can generate useful queries
Inductive leaps possible only if learner is biased
Inductive learners can be modeled by equivalent deductive systems.