1 machine learning what is learning?. 2 machine learning what is learning? “that is what learning...

45
1 Machine Learning What is learning?

Upload: job-bradford

Post on 30-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

1

Machine Learning

• What is learning?

2

Machine Learning

• What is learning?

• “That is what learning is. You suddenly understand something you've understood all your life, but in a new way.”(Doris Lessing – 2007 Nobel Prize in Literature)

3

Machine Learning

• How to construct programs that automatically improve with experience.

4

Machine Learning

• How to construct programs that automatically improve with experience.

• Learning problem:

– Task T

– Performance measure P

– Training experience E

5

Machine Learning

• Chess game:

– Task T: playing chess games

– Performance measure P: percent of games won

against opponents

– Training experience E: playing practice games

againts itself

6

Machine Learning

• Handwriting recognition:

– Task T: recognizing and classifying handwritten

words

– Performance measure P: percent of words correctly classified

– Training experience E: handwritten words with given classifications

7

Designing a Learning System

• Choosing the training experience:

– Direct or indirect feedback

– Degree of learner's control

– Representative distribution of examples

8

Designing a Learning System

• Choosing the target function:

– Type of knowledge to be learned

– Function approximation

9

Designing a Learning System

• Choosing a representation for the target

function:

– Expressive representation for a close function

approximation

– Simple representation for simple training data and

learning algorithms

10

Designing a Learning System

• Choosing a function approximation algorithm (learning algorithm)

11

Designing a Learning System

• Chess game:

– Task T: playing chess games

– Performance measure P: percent of games won

against opponents

– Training experience E: playing practice games

againts itself

– Target function: V: Board R

12

Designing a Learning System

• Chess game:

– Target function representation:

V^(b) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6

x1: the number of black pieces on the board

x2: the number of red pieces on the board

x3: the number of black kings on the board

x4: the number of red kings on the board

x5: the number of black pieces threatened by red

x6: the number of red pieces threatened by black

13

Designing a Learning System

• Chess game:

– Function approximation algorithm:

(<x1 = 3, x2 = 0, x3 = 1, x4 = 0, x5 = 0, x6 = 0>, 100)

x1: the number of black pieces on the board

x2: the number of red pieces on the board

x3: the number of black kings on the board

x4: the number of red kings on the board

x5: the number of black pieces threatened by red

x6: the number of red pieces threatened by black

14

Designing a Learning System

ExperimentGenerator

PerformanceSystem

Generalizer

Critic

New problem(initial board)

Solution trace(game history)

Hypothesis(V^)

Training examples{(b1, V1), (b2, V2), ...}

15

Issues in Machine Learning

• What learning algorithms to be used?

• How much training data is sufficient?

• When and how prior knowledge can guide the learning process?

• What is the best strategy for choosing a next training experience?

• What is the best way to reduce the learning task to one or more function approximation problems?

• How can the learner automatically alter its representation to improve its learning ability?

16

Concept Learning

• Inferring a boolean-valued function from training examples of its input (instances) and output (classifications).

17

Concept Learning

Example

Sky AirTemp

Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong

Warm Same Yes

2 Sunny Warm High Strong

Warm Same Yes

3 Rainy Cold High Strong

Warm Change No

4 Sunny Warm High Strong

Cool Change Yes

Experience

Prediction5 Sunny Cold Normal Stron

gWarm Same ?

6 Rainy Warm High Strong

Warm Change ?

Low Weak

18

Concept Learning

• Learning problem:

– Task T: classifying days on which my friend enjoys water

sport

– Performance measure P: percent of days correctly

classified

– Training experience E: days with given attributes and

classifications

19

Concept Learning

• Learning problem:

– Concept: a subset of the set of instances X

c: X {0, 1}

– Target function:

Sky AirTemp Humidity Wind Water Forecast {Yes, No}

– Hypothesis:

Characteristics of all instances of the concept to be learned

= Constraints on instance attributes

h: X {0, 1}

20

Concept Learning

• Satisfaction:

h(x) = 1 iff x satisfies all the constraints of h

h(x) = 0 otherwsie

• Consistency:

h(x) = c(x) for every instance x of the training examples

• Correctness:

h(x) = c(x) for every instance x of X

21

Concept Learning

• Hypothesis representation:

<Sky, AirTemp, Humidity, Wind, Water, Forecast>

– ?: any value is acceptable

– single required value

: no value is acceptable

22

Concept Learning

• General-to-specific ordering of hypotheses:

hj g hk iff xX: hk(x) = 1 hj(x) = 1

Specific

General

h1 = <Sunny, ?, ?, Strong, ? , ?>

h2 = <Sunny, ?, ?, ? , ? , ?>

h3 = <Sunny, ?, ?, ? , Cool, ?>

h1 h3

h2

H

23

FIND-S

• Initialize h to the most specific hypothesis in H:

• For each positive training instance x:

For each attribute constraint ai in h:

If the constraint is not satisfied by x

Then replace ai by the next more general constraint satisfied by x

• Output hypothesis h

24

FIND-S

Example

Sky AirTemp

Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong

Warm Same Yes

2 Sunny Warm High Strong

Warm Same Yes

3 Rainy Cold High Strong

Warm Change No

4 Sunny Warm High Strong

Cool Change Yes

h = < , , , , , >

h = <Sunny, Warm, Normal, Strong, Warm, Same>

h = <Sunny, Warm, ? , Strong, Warm, Same>

h = <Sunny, Warm, ? , Strong, ? , ? >

25

FIND-S

• The output hypothesis is the most specific one that satisfies all positive training examples.

26

FIND-S

• The result is consistent with the positive training examples.

• The result is consistent with the negative training examples if the correct target concept is contained in H (and the training examples are correct).

27

FIND-S

• Questions:

– Has the learner converged to the correct target concept, as there can be several consistent hypotheses?

– Why prefer the most specific hypothesis?

– What if there are several maximally specific consistent hypotheses?

– Are the training examples correct?

28

List-then-Eliminate Algorithm

• Version space: a set of all hypotheses that are consistent with the training examples.

• Algorithm:

– Initial version space = set containing every hypothesis in H

– For each training example <x, c(x)>, remove from the version space any hypothesis h for which h(x) c(x)

– Output the hypotheses in the version space

29

List-then-Eliminate Algorithm

• Requires an exhaustive enumeration of all hypotheses in H

30

Compact Representation of Version Space

• G (the generic boundary): set of the most generic hypotheses of H consistent with the training data D:

G = {gH | consistent(g, D) g’H: g’ g g consistent(g’, D)}

• S (the specific boundary): set of the most specific hypotheses of H consistent with the training data D:

S = {sH | consistent(s, D) s’H: s g s’ consistent(s’, D)}

31

Compact Representation of Version Space

• Version space = <G, S> = {hH | gG sS: g g h g s}

S

G

32

Candidate-Elimination Algorithm

• Initialize G to the set of maximally general hypotheses in H

• Initialize S to the set of maximally specific hypotheses in H

33

Candidate-Elimination Algorithm

• For each positive example d:

– Remove from G any hypothesis inconsistent with d

– For each s in S that is inconsistent with d:

Remove s from S

Add to S all least generalizations h of s, such that h is consistent with d and some hypothesis in G is more general than h

Remove from S any hypothesis that is more general than another hypothesis in S

34

Candidate-Elimination Algorithm

• For each negative example d:

– Remove from S any hypothesis inconsistent with d

– For each g in G that is inconsistent with d:

Remove g from G

Add to G all least specializations h of g, such that h is consistent with d and some hypothesis in S is more specific than h

Remove from G any hypothesis that is more specific than another hypothesis in G

35

Candidate-Elimination AlgorithmS0 = {<, , , , , >}G0 = {<?, ?, ?, ?, ?, ?>}

E1 Sunny Warm Normal StrongWarm Same YesS1 = {<Sunny, Warm, Normal, Strong, Warm, Same>}G1 = {<?, ?, ?, ?, ?, ?>}

E2 Sunny Warm High Strong Warm Same YesS2 = {<Sunny, Warm, ?, Strong, Warm, Same>}G2 = {<?, ?, ?, ?, ?, ?>}

E3 Rainy Cold High Strong Warm Change NoS3 = {<Sunny, Warm, ?, Strong, Warm, Same>}G3 = {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?,

Same>}E4 Sunny Warm High Strong Cool Change YesS4 = {<Sunny, Warm, ?, Strong, ?, ? >}G4 = {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

36

Candidate-Elimination AlgorithmS4 = {<Sunny, Warm, ?, Strong, ?, ?>}

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G4 = {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

5 Sunny

Cold Normal Strong

Warm Same ?

6 Rainy Warm High Strong

Warm Change ?

37

Candidate-Elimination Algorithm

• The version space will converge toward the correct target concepts if:– H contains the correct target concept– There are no errors in the training examples

• A training instance to be requested next should discriminate among the alternative hypotheses in the current version space:

• Partially learned concept can be used to classify new instances using the majority rule.

38

Inductive Bias

• Size of the instance space: |X| = 3.2.2.2.2.2 = 96

• Number of possible concepts = 2|X| = 296

• Size of H = (4.3.3.3.3.3) + 1 = 973 << 296

39

Inductive Bias

• Size of the instance space: |X| = 3.2.2.2.2.2 = 96

• Number of possible concepts = 2|X| = 296

• Size of H = (4.3.3.3.3.3) + 1 = 973 << 296

a biased hypothesis space

40

Inductive Bias

• An unbiased hypothesis space H’ that can represent every subset of the instance space X: Propositional logic sentences

• Positive examples: x1, x2, x3

Negative examples: x4, x5

h = x1 x2 x3 x6

41

Inductive Bias

x1 x2 x3 x6

x1 x2 x3

x4 x5

Any new instance x is classified positive by half of the version space, and negative by the other half.

42

Inductive BiasExampl

eDay Actor Price EasyTicket

1 Mon Famous Expensive

Yes

2 Sat Famous Moderate

No

3 Sun Infamous Cheap No

4 Wed Infamous Moderate

Yes

5 Sun Famous Expensive

No

6 Thu Infamous Cheap Yes

7 Tue Famous Expensive

Yes

8 Sat Famous Cheap No

9 Wed Famous Cheap ?

10 Sat Infamous Expensive

?

43

Inductive Bias

Example

Quality Price Buy

1 Good Low Yes

2 Bad High No

3 Good High ?

4 Bad Low ?

44

Inductive Bias

• A learner that makes no prior assumptions regarding the identity of the target concept cannot classify any unseen instances.

45

Homework

Exercises 2-1 2.5 (Chapter 2, ML textbook)

3-1 3.4 (Chapter 3, ML textbook)