basics of machine learning

1

Advanced Artificial IntelligenceAdvanced Artificial IntelligenceLecture 3: LearningLecture 3: Learning

Bob McKayBob McKay

School of Computer Science and EngineeringSchool of Computer Science and Engineering

College of EngineeringCollege of Engineering

Seoul National UniversitySeoul National University

•2

OutlineOutline

• Defining LearningDefining Learning

• Kinds of LearningKinds of Learning

• Generalisation and SpecialisationGeneralisation and Specialisation

• Some Simple Learning AlgorithmsSome Simple Learning Algorithms

•3

ReferencesReferences• Mitchell, Tom M: Machine Learning, McGraw-Hill, Mitchell, Tom M: Machine Learning, McGraw-Hill,

1997, ISBN 0 07 115467 11997, ISBN 0 07 115467 1

•4

Defining a Learning System (Mitchell)Defining a Learning System (Mitchell)• ““A program is said to learn from experience E with A program is said to learn from experience E with

respect to some class of tasks T and performance respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measure P, if its performance at tasks in T, as measured by P, improves with experience E”measured by P, improves with experience E”

•5

Specifying a Learning SystemSpecifying a Learning System

• Specifying the task T, the performance P and Specifying the task T, the performance P and the experience E defines the learning problem. the experience E defines the learning problem. Specifying the learning system requires us to Specifying the learning system requires us to define:define:– Exactly what knowledge is to be learntExactly what knowledge is to be learnt

– How this knowledge is to be representedHow this knowledge is to be represented

– How this knowledge is to be learntHow this knowledge is to be learnt

•6

Specifying What is to be LearntSpecifying What is to be Learnt• Usually, the desired knowledge can be Usually, the desired knowledge can be

represented as a target valuation function represented as a target valuation function V: I V: I →→ D D

– It takes in information about the problem and It takes in information about the problem and gives back a desired decisiongives back a desired decision

• Often, it is unrealistic to expect to learn Often, it is unrealistic to expect to learn the ideal function Vthe ideal function V

– All that is required is a ‘good enough’ All that is required is a ‘good enough’ approximation, V’: I approximation, V’: I →→ D D

•7

Specifying How Knowledge is to be Specifying How Knowledge is to be RepresentedRepresented

• The function V’ must be represented The function V’ must be represented symbolically, in some language L symbolically, in some language L

– The language may be a well-known languageThe language may be a well-known language

• Boolean expressionsBoolean expressions

• Arithmetic functionsArithmetic functions

• ……..

– Or for some systems, the language may be Or for some systems, the language may be defined by a grammardefined by a grammar

•8

Specifying How the Knowledge is to Specifying How the Knowledge is to be Learntbe Learnt

• If the learning system is to be implemented, we If the learning system is to be implemented, we must specify an algorithm A, which defines the must specify an algorithm A, which defines the way in which the system is to search the way in which the system is to search the language L for an acceptable V’language L for an acceptable V’– That is, we must specify a search algorithmThat is, we must specify a search algorithm

•9

Structure of a Learning SystemStructure of a Learning System

• Four modulesFour modules

– The Performance SystemThe Performance System

– The CriticThe Critic

– The Generaliser (or sometimes Specialiser)The Generaliser (or sometimes Specialiser)

– The Experiment GeneratorThe Experiment Generator

•10

Performance ModulePerformance Module

• This is the system which actually uses This is the system which actually uses the function V’ as we learn it the function V’ as we learn it

– Learning TaskLearning Task

• Learning to play checkersLearning to play checkers

– Performance modulePerformance module

• System for playing checkersSystem for playing checkers–(I.e. makes the checkers moves)(I.e. makes the checkers moves)

•11

Critic ModuleCritic Module

• The critic module evaluates the The critic module evaluates the performance of the current V’performance of the current V’

– It produces a set of data from which the It produces a set of data from which the system can learn furthersystem can learn further

•12

Generaliser/Specialiser ModuleGeneraliser/Specialiser Module

• Takes a set of data and produces a new V’ Takes a set of data and produces a new V’ for the system to run againfor the system to run again

•13

Experiment GeneratorExperiment Generator

• Takes the new V’Takes the new V’

– Maybe also uses the previous history of the Maybe also uses the previous history of the systemsystem

• Produces a new experiment for the Produces a new experiment for the performance system to undertakeperformance system to undertake

•14

The Importance of BiasThe Importance of Bias• Important theoretical results from learning theory Important theoretical results from learning theory

(PAC learning) tell us that learning without some (PAC learning) tell us that learning without some presuppositions is infeasible. presuppositions is infeasible. – Practical experience, of both machine and human Practical experience, of both machine and human

learning, confirms this. learning, confirms this. • To learn effectively, we must limit the class of V’s.To learn effectively, we must limit the class of V’s.

• Two approaches are used in machine learning:Two approaches are used in machine learning:– Language biasLanguage bias

– Search BiasSearch Bias

– Combined BiasCombined Bias• Language and search bias are not mutually exclusive: Language and search bias are not mutually exclusive:

most learning systems feature bothmost learning systems feature both

•15

Language BiasLanguage Bias

• The language L is restricted so that it cannot The language L is restricted so that it cannot represent all possible target functions Vrepresent all possible target functions V– This is usually on the basis of some knowledge This is usually on the basis of some knowledge

we have about the likely form of V’we have about the likely form of V’

– It introduces riskIt introduces risk• Our system will fail if L does not contain an Our system will fail if L does not contain an

acceptable V’acceptable V’

•16

Search BiasSearch Bias• The order in which the system searches L is The order in which the system searches L is

controlled, so that promising areas for V’ are controlled, so that promising areas for V’ are searched firstsearched first

•17

The Downside:The Downside:No Free LunchesNo Free Lunches

• Wolpert and MacReady’s No Free Lunch Theorem states, Wolpert and MacReady’s No Free Lunch Theorem states, in effect, that averaged over all problems, all biases are in effect, that averaged over all problems, all biases are equally good (or bad). equally good (or bad).

• Conventional view Conventional view

– The choice of a learning system cannot be universalThe choice of a learning system cannot be universal• It must be matched to the problem being solvedIt must be matched to the problem being solved

• In most systems, the bias is not explicitIn most systems, the bias is not explicit

– The ability to identify the language and search biases of a The ability to identify the language and search biases of a particular system is an important aspect of machine learningparticular system is an important aspect of machine learning

• Some more recent systems permit the explicit and flexible Some more recent systems permit the explicit and flexible specification of both language and search biasesspecification of both language and search biases

•18

No Free Lunch:No Free Lunch:Does it Matter?Does it Matter?

• Alternative view Alternative view

– We aren’t interested in all problemsWe aren’t interested in all problems

• We are only interested in prolems which We are only interested in prolems which have solutions of less than some bounded have solutions of less than some bounded complexitycomplexity

–(so that we can understand the solutions)(so that we can understand the solutions)

– The No Free Lunch Theorem may not The No Free Lunch Theorem may not apply in this caseapply in this case

•19

Some Dimensions of LearningSome Dimensions of Learning• Induction vs Discovery:Induction vs Discovery:

• Guided learning vs learning from raw dataGuided learning vs learning from raw data

• Learning How vs Learning That (vs Learning a Better Learning How vs Learning That (vs Learning a Better That)That)

• Stochastic vs Deterministic; Symbolic vs SubsymbolicStochastic vs Deterministic; Symbolic vs Subsymbolic

• Clean vs Noisy DataClean vs Noisy Data

• Discrete vs continuous variablesDiscrete vs continuous variables

• Attribute vs Relational LearningAttribute vs Relational Learning

• The Importance of Background KnowledgeThe Importance of Background Knowledge

•20

Induction vs DiscoveryInduction vs Discovery

• Has the target concept been previously Has the target concept been previously identified?identified?– Pearson: cloud classifications from satellite dataPearson: cloud classifications from satellite data

• vsvs– Autoclass and H - R diagramsAutoclass and H - R diagrams

– AM and prime numbersAM and prime numbers

– BACON and Boyle's LawBACON and Boyle's Law

•21

Guided Learning vs Learning from Guided Learning vs Learning from Raw DataRaw Data

• Does the learning system require carefully Does the learning system require carefully selected examples and counterexamples, selected examples and counterexamples, as in a teacher – student situation?as in a teacher – student situation?

– (allows fast learning)(allows fast learning)

– CIGOL learning sort/mergeCIGOL learning sort/merge

• vsvs

– Garvan institute's thyroid dataGarvan institute's thyroid data

•22

Learning How vs Learning That vs Learning How vs Learning That vs Learning a Better ThatLearning a Better That

– Classifying handwritten symbolsClassifying handwritten symbols

– Distinguishing vowel sounds (Sejnowski & Distinguishing vowel sounds (Sejnowski & Rosenberg)Rosenberg)

– Learning to fly a (simulated!) planeLearning to fly a (simulated!) plane

• vsvs– Michalski & learning diagnosis of soy diseasesMichalski & learning diagnosis of soy diseases

• vsvs– Mitchell & learning about chess forksMitchell & learning about chess forks

•23

Stochastic vs Deterministic;Stochastic vs Deterministic;Symbolic vs SubsymbolicSymbolic vs Subsymbolic

– Classifying handwritten symbols (stochastic, Classifying handwritten symbols (stochastic, subsymbolic)subsymbolic)

• vsvs– Predicting plant distributions (stochastic, symbolic)Predicting plant distributions (stochastic, symbolic)

• vsvs– Cloud classification (deterministic, symbolic)Cloud classification (deterministic, symbolic)

• vsvs– ? (deterministic, subsymbolic)? (deterministic, subsymbolic)

•24

Clean vs Noisy DataClean vs Noisy Data

– Learning to diagnose errors in programsLearning to diagnose errors in programs

• vsvs

– Greater gliders in the CoolangubraGreater gliders in the Coolangubra

•25

Discrete vs Continuous VariablesDiscrete vs Continuous Variables

– Quinlan's chess end gamesQuinlan's chess end games

• vsvs

– Pearson's clouds (eg cloud heights)Pearson's clouds (eg cloud heights)

•26

Attibute vs Relational LearningAttibute vs Relational Learning

–Predicting plant distributionsPredicting plant distributions

• vsvs

–Predicting animal distributions Predicting animal distributions • (because plants can’t move, they don’t (because plants can’t move, they don’t

care - much - about spatial relationships)care - much - about spatial relationships)

•27

The importance of The importance of Background KnowledgeBackground Knowledge

• Learning about faults in a satellite Learning about faults in a satellite power supplypower supply

–general electric circuit theorygeneral electric circuit theory

–knowledge about the particular circuitknowledge about the particular circuit

•28

Generalisation and LearningGeneralisation and Learning• What do we mean when we say of two What do we mean when we say of two

propositions, S and G, that G is a generalisation of propositions, S and G, that G is a generalisation of S?S?– Suppose skippy is a grey kangaroo. Suppose skippy is a grey kangaroo.

– We would regard ‘Kangaroos are grey as a We would regard ‘Kangaroos are grey as a generalisation of ‘Skippy is grey’.generalisation of ‘Skippy is grey’.

– In any world in which ‘kangaroos are grey’ is true, In any world in which ‘kangaroos are grey’ is true, ‘Skippy is grey’ will also be true.‘Skippy is grey’ will also be true.

• In other words, if G is a generalisation of In other words, if G is a generalisation of specialisation S, then G is 'at least as true' as S, specialisation S, then G is 'at least as true' as S, – That is, S is true in all states of the world in which G is, That is, S is true in all states of the world in which G is,

and perhaps in other states as well.and perhaps in other states as well.

•29

Generalisation and InferenceGeneralisation and Inference

• In logic, we assume that if S is true in all worlds in In logic, we assume that if S is true in all worlds in which G is, then which G is, then

– G G →→ S S

• That is, G is a generalisation of S exactly when G That is, G is a generalisation of S exactly when G implies Simplies S

– So we can think of learning from S as a search for a So we can think of learning from S as a search for a suitable G for which G suitable G for which G →→ S S

• In propositional learning, this is often used as a In propositional learning, this is often used as a definition:definition:– G is more general than S if and only if G G is more general than S if and only if G →→ S S

•30

IssuesIssues

• Equating generalisation and logical Equating generalisation and logical implication is only useful if the validity of an implication is only useful if the validity of an implication can be readily computedimplication can be readily computed

– In the propositional calculus, validity is an In the propositional calculus, validity is an exponential problemexponential problem

– in the predicate calculus, validity is an in the predicate calculus, validity is an undecidable problemundecidable problem

• so the definition is not universally useful so the definition is not universally useful – (although for some parts of logic - eg learning (although for some parts of logic - eg learning

rules - it is perfectly adequate).rules - it is perfectly adequate).

•31

A Common MisunderstandingA Common Misunderstanding• Suppose we have two rules,Suppose we have two rules,

– 1)1) A A ∧∧ Β Β →→ G G

– 2)2) A A ∧∧ Β Β ∧∧ C C →→ G G

• Clearly, we would want 1 to be a generalisation of 2Clearly, we would want 1 to be a generalisation of 2

• This is OK with our definition, because This is OK with our definition, because – ((A ^ B ((A ^ B →→ G) G) →→ (A ^ B ^ C (A ^ B ^ C →→ G)) G))

• is validis valid– But the confusing thing is that ((A^B^C) But the confusing thing is that ((A^B^C) →→ (A (A∧∧Β)) is valid Β)) is valid

• Iif you only look at the hypotheses of the rule, rather than the whole Iif you only look at the hypotheses of the rule, rather than the whole rule, the implication is the wrong way around rule, the implication is the wrong way around

• Note that some textbooks are themselves confused about thisNote that some textbooks are themselves confused about this

•32

Defining GeneralisaionDefining Generalisaion• We could try to define the properties that generalisation We could try to define the properties that generalisation

must satisfy, must satisfy,

• So let's write down some axioms. We need some notation. So let's write down some axioms. We need some notation.

– We will write 'S <We will write 'S <GG G' as shorthand for 'S is less general than G' as shorthand for 'S is less general than G'. G'.

• Axioms:Axioms:

– Transitivity: If A <Transitivity: If A <GG B and B < B and B <GG C then also A < C then also A <GG C C

– Antisymmetry: If A <Antisymmetry: If A <GG B then it's not true that B < B then it's not true that B <GG A A

– Top: there is a unique element, Top: there is a unique element, ⊥⊥, for which it is always true that , for which it is always true that A A <<GG ⊥⊥..

– Bottom: there is a unique element, T, for which it is always true Bottom: there is a unique element, T, for which it is always true that T that T <<GG A. A.

•33

Picturing GeneralisaionPicturing Generalisaion• We can draw a 'picture' of a generalisation hierarchy We can draw a 'picture' of a generalisation hierarchy

satisfying these axioms: satisfying these axioms:

⊥

T

•34

Specifying GeneralisaionSpecifying Generalisaion

• In a particular domain, the generalisation In a particular domain, the generalisation hierarchy may be defined in either of two hierarchy may be defined in either of two ways:ways:

– By giving a general definition of what By giving a general definition of what generalisation means in that domain generalisation means in that domain • Example: our earlier definition in terms of Example: our earlier definition in terms of

implicationimplication

– By directly specifying the specialisation and By directly specifying the specialisation and generalisation operators that may be used to generalisation operators that may be used to climb up and down the links in the climb up and down the links in the generalisation hierarchygeneralisation hierarchy

•35

Learning and GeneralisaionLearning and Generalisaion• How does learning relate to generalisation? How does learning relate to generalisation?

– We can view most learning as an attempt to find an We can view most learning as an attempt to find an appropriate generalisation that generalises the examples.appropriate generalisation that generalises the examples.

– In noise free domains, we usually want the generalisation to In noise free domains, we usually want the generalisation to cover all the examples.cover all the examples.

– Once we introduce noise, we want the generalisation to Once we introduce noise, we want the generalisation to cover 'enough' examples, and the interesting bit is in defining cover 'enough' examples, and the interesting bit is in defining what 'enough' is.what 'enough' is.

• In our picture of a generalisation hierarchy, most learning In our picture of a generalisation hierarchy, most learning algorithms can be viewed as methods for searching the algorithms can be viewed as methods for searching the hierarchy. hierarchy. – The examples can be pictured as locations low down in the The examples can be pictured as locations low down in the

hierarchy, and the learning algorithm attempts to find a location hierarchy, and the learning algorithm attempts to find a location that is above all (or 'enough') of them in the hierarchy, but usually, that is above all (or 'enough') of them in the hierarchy, but usually, no higher 'than it needs to be'no higher 'than it needs to be'

•36

Searching the Generalisaion Searching the Generalisaion HierarchyHierarchy

• The commonest approaches are:The commonest approaches are:– generalising searchgeneralising search

• the search is upward from the original examples, the search is upward from the original examples, towards the more general hypothesestowards the more general hypotheses

– specialising searchspecialising search• the search is downward from the most general the search is downward from the most general

hypothesis, towards the more special exampleshypothesis, towards the more special examples

– Some algorithms use different approaches. Some algorithms use different approaches. Mitchell's version space approach, for example, Mitchell's version space approach, for example, tries to 'home in' on the right generalisation from tries to 'home in' on the right generalisation from both directions at once. both directions at once.

•37

Completeness and GeneralisaionCompleteness and Generalisaion• Many approaches to axiomatising generalisation add an Many approaches to axiomatising generalisation add an

extra axiom:extra axiom:– Completeness: For any set Σ of members of the generalisation Completeness: For any set Σ of members of the generalisation

hierarchy, there is a unique 'least general generalisation' L, which hierarchy, there is a unique 'least general generalisation' L, which satisfies two properties:satisfies two properties:

• 1)1) for every S in Σ, S for every S in Σ, S <<GG LL

• 2)2) if any other L' satisfies 1), then L if any other L' satisfies 1), then L <<GG L'L'

– If this definition is hard to understand, compare it with the If this definition is hard to understand, compare it with the definition of 'Least Upper Bound' in set theory, or of 'Least definition of 'Least Upper Bound' in set theory, or of 'Least Common Multiple' in arithmeticCommon Multiple' in arithmetic

•

•38

Restricting GeneralisationRestricting Generalisation

• Let's go back to our original definition of Let's go back to our original definition of generalisation: generalisation: – G generalises S iff G G generalises S iff G →→ S S

• In the general predicate calculus case, this relation In the general predicate calculus case, this relation is uncomputable, so it's not very usefulis uncomputable, so it's not very useful

• One approach to avoiding the problem is to limit One approach to avoiding the problem is to limit the implications allowedthe implications allowed

•39

Generalisation and SubstitutionGeneralisation and Substitution• Very commonly, the generalisations we want to make Very commonly, the generalisations we want to make

involve turning a constant into a variable. involve turning a constant into a variable.

– So we see a particular black crow, fred, so we notice:So we see a particular black crow, fred, so we notice:• crow(fred) crow(fred) →→ black(fred) black(fred)

– and we may wish to generalise this toand we may wish to generalise this to• ∀∀X(crow(X) X(crow(X) →→ black(X)) black(X))

• Notice that the original proposition can be recovered from Notice that the original proposition can be recovered from the generalisation by substituting 'fred' for the variable 'X'the generalisation by substituting 'fred' for the variable 'X'

– The original is a substitution instance of the generalisationThe original is a substitution instance of the generalisation

– So we could define a new, restricted generalisation:So we could define a new, restricted generalisation:• G G subsumessubsumes S if S is a substitution instance of G S if S is a substitution instance of G

• An example of our earlier definition, because a substitution An example of our earlier definition, because a substitution instance is always implied by the original proposition.instance is always implied by the original proposition.

•40

Learning AlgorithmsLearning Algorithms• For the rest of this lecture, we will work with a For the rest of this lecture, we will work with a

specific learning dataset (due to Mitchell):specific learning dataset (due to Mitchell):– ItemItem SkySky AirTAirT HumHum WndWnd WtrWtr FcstFcst EnjyEnjy

– 11 SunSun WrmWrm NmlNml StrStr WrmWrm SamSam YesYes

– 22 SunSun WrmWrm HighHigh StrStr WrmWrm SamSam YesYes

– 33 RainRain ColdCold HighHigh StrStr WrmWrm ChngChng NoNo

– 44 SunSun WrmWrm HighHigh StrStr CoolCool ChngChng YesYes

• First, we look at a really simple algorithm, First, we look at a really simple algorithm, Maximally Specific LearningMaximally Specific Learning

•41

Maximally Specific LearningMaximally Specific Learning• The learning language consists of sets of tuples, The learning language consists of sets of tuples,

representing the values of these attributes representing the values of these attributes – A ‘?’ represents that any value is acceptable for this A ‘?’ represents that any value is acceptable for this

attributeattribute

– A particular value represents that only that value is A particular value represents that only that value is acceptable for this attributeacceptable for this attribute

– A ‘φ’ represents that no value is acceptable for this A ‘φ’ represents that no value is acceptable for this attributeattribute

– Thus (?, Cold, High, ?, ?, ?) represents the hypothesis Thus (?, Cold, High, ?, ?, ?) represents the hypothesis that water sport is enjoyed only on cold, moist days.that water sport is enjoyed only on cold, moist days.

• Note that our language is already heavily biased: Note that our language is already heavily biased: only conjunctive hypotheses (hypotheses built with only conjunctive hypotheses (hypotheses built with ‘^’) are allowed.‘^’) are allowed.

•42

Find-SFind-S

• Find-S is a simple algorithm: its initial Find-S is a simple algorithm: its initial hypothesis is that water sport is never hypothesis is that water sport is never enjoyedenjoyed– It expands the hypothesis as positive data items It expands the hypothesis as positive data items

are notedare noted

•43

Running Find-SRunning Find-S• Initial HypothesisInitial Hypothesis

– The most specific hypothesis (water sports are never The most specific hypothesis (water sports are never enjoyed):enjoyed):

– h h ←← (φ,φ,φ,φ,φ,φ) (φ,φ,φ,φ,φ,φ)

• After First Data ItemAfter First Data Item

– Water sport is enjoyed only under the conditions of the first Water sport is enjoyed only under the conditions of the first item:item:

– h h ←← (Sun,Wrm,Nml,Str,Wrm,Sam) (Sun,Wrm,Nml,Str,Wrm,Sam)

• After Second Data ItemAfter Second Data Item

– Water sport is enjoyed only under the common conditions of Water sport is enjoyed only under the common conditions of the first two items:the first two items:

– h h ←← (Sun,Wrm,?,Str,Wrm,Sam) (Sun,Wrm,?,Str,Wrm,Sam)

•44

Running Find-SRunning Find-S• After Third Data ItemAfter Third Data Item

– Since this item is negative, it has no effect on the Since this item is negative, it has no effect on the learning hypothesis:learning hypothesis:

– h h ←← (Sun,Wrm,?,Str,Wrm,Sam) (Sun,Wrm,?,Str,Wrm,Sam)

• After Final Data ItemAfter Final Data Item

– Further generalises the conditions encountered:Further generalises the conditions encountered:

– h h ←← (Sun,Wrm,?,Str,?,?) (Sun,Wrm,?,Str,?,?)

•45

DiscussionDiscussion• We have found the most specific hypothesis We have found the most specific hypothesis

corresponding to the dataset and the restricted corresponding to the dataset and the restricted (conjunctive) language(conjunctive) language

• It is not clear it is the best hypothesisIt is not clear it is the best hypothesis– If the best hypothesis is not conjunctive (eg if we enjoy swimming If the best hypothesis is not conjunctive (eg if we enjoy swimming

if it’s warm or sunny), it will not be foundif it’s warm or sunny), it will not be found

– Find-S will not handle noise and inconsistencies well.Find-S will not handle noise and inconsistencies well.

– In other languages (not using pure conjunction) there may be In other languages (not using pure conjunction) there may be more than one maximally specific hypothesis; Find-S will not work more than one maximally specific hypothesis; Find-S will not work well herewell here

•46

Version SpacesVersion Spaces

• One possible improvement on Find-S is to search many One possible improvement on Find-S is to search many possible solutions in parallelpossible solutions in parallel

• ConsistencyConsistency

– A hypothesis h is consistent with a dataset D of training A hypothesis h is consistent with a dataset D of training examples iff h gives the same answer on every element of examples iff h gives the same answer on every element of the dataset as the dataset doesthe dataset as the dataset does

• Version SpaceVersion Space

– The version space with respect to the language L and the The version space with respect to the language L and the dataset D is the set of hypotheses h in the language L which dataset D is the set of hypotheses h in the language L which are consistent with Dare consistent with D

•47

List-then-EliminateList-then-Eliminate• Obvious algorithmObvious algorithm

– The list-then-eliminate algorithm aims to find the version space in The list-then-eliminate algorithm aims to find the version space in L for the given dataset DL for the given dataset D

– It can thus return all hypotheses which could explain DIt can thus return all hypotheses which could explain D

• It works by beginning with L as its set of hypotheses HIt works by beginning with L as its set of hypotheses H– As each item d of the dataset D is examined in turn, any As each item d of the dataset D is examined in turn, any

hypotheses in H which are inconsistent with d are eliminatedhypotheses in H which are inconsistent with d are eliminated

• The language L is usually large, and often infinite, so this The language L is usually large, and often infinite, so this algorithm is computationally infeasible as it standsalgorithm is computationally infeasible as it stands

•48

Version Space RepresentationVersion Space Representation• One of the problems with the previous algorithm is the One of the problems with the previous algorithm is the

representation of the search spacerepresentation of the search space

– We need to represent version spaces efficientlyWe need to represent version spaces efficiently

• General BoundaryGeneral Boundary– The general boundary G with respect to language L and dataset D The general boundary G with respect to language L and dataset D

is the set of hypotheses h in L which are consistent with D, and for is the set of hypotheses h in L which are consistent with D, and for which there is no more general hypothesis in L which is consistent which there is no more general hypothesis in L which is consistent with Dwith D

• Specific BoundarySpecific Boundary– The specific boundary S with respect to language L and dataset D The specific boundary S with respect to language L and dataset D

is the set of hypotheses h in L which are consistent with D, and for is the set of hypotheses h in L which are consistent with D, and for which there is no more specific hypothesis in L which is consistent which there is no more specific hypothesis in L which is consistent with Dwith D

•49

Version Space Representation 2Version Space Representation 2

• A version space may be represented by its general A version space may be represented by its general and specific boundaryand specific boundary

• That is, given the general and specific boundaries, That is, given the general and specific boundaries, the whole version space may be recoveredthe whole version space may be recovered

• The Candidate Elimination Algorithm traces the The Candidate Elimination Algorithm traces the general and specific boundaries of the version general and specific boundaries of the version space as more examples and counter-examples of space as more examples and counter-examples of the concept are seenthe concept are seen– Positive examples are used to generalise the specific Positive examples are used to generalise the specific

boundaryboundary

– Negative examples permit the general boundary to be Negative examples permit the general boundary to be specialised.specialised.

•50

Candidate Elimination AlgorithmCandidate Elimination Algorithm

Set G to the set of most general hypotheses in LSet G to the set of most general hypotheses in L

Set S to the set of most specific hypotheses in LSet S to the set of most specific hypotheses in L

For each example d in D:For each example d in D:

•51


If d is a positive exampleIf d is a positive example

Remove from G any hypothesis inconsistent with dRemove from G any hypothesis inconsistent with d

For each hypothesis s in S that is not consistent with dFor each hypothesis s in S that is not consistent with d

Remove s from SRemove s from S

Add to S all minimal generalisations h of s such that h Add to S all minimal generalisations h of s such that h is consistent with d, and some member of G is more is consistent with d, and some member of G is more general than hgeneral than h

Remove from S any hypothesis that is more general Remove from S any hypothesis that is more general than another hypothesis in Sthan another hypothesis in S

•52


If d is a negative exampleIf d is a negative example

Remove from S any hypothesis inconsistent with dRemove from S any hypothesis inconsistent with d

For each hypothesis g in G that is not consistent For each hypothesis g in G that is not consistent with dwith d

Remove g from GRemove g from G

Add to G all minimal specialisations h of g such thatAdd to G all minimal specialisations h of g such thath is consistent with d, and some member of S is h is consistent with d, and some member of S is

moremorespecific than hspecific than h

Remove from G any hypothesis that is less generalRemove from G any hypothesis that is less generalthan another hypothesis in Gthan another hypothesis in G

•53

SummarySummary

• Defining LearningDefining Learning

• Kinds of LearningKinds of Learning

• Generalisation and SpecialisationGeneralisation and Specialisation

• Some Simple Learning AlgorithmsSome Simple Learning Algorithms

– Find-SFind-S

– Version SpacesVersion Spaces

• List-then-EliminateList-then-Eliminate

• Candidate EliminationCandidate Elimination

•54

감사합니다감사합니다

basics of machine learning

Documents