tommi s. jaakkola mit csail [email protected]

30
6.867 Machine learning: lecture 1 Tommi S. Jaakkola MIT CSAIL [email protected]

Upload: others

Post on 08-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

6.867 Machine learning: lecture 1

Tommi S. Jaakkola

MIT CSAIL

[email protected]

Page 2: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

6.867 Machine learning: administrivia• Course staff ([email protected])

– Prof. Tommi Jaakkola ([email protected])

– Adrian Corduneanu ([email protected])

– Biswajit (Biz) Bose ([email protected])

• General info

– lectures MW 2.30-4pm in 32-141

– tutorials/recitations, initially F11-12.30 (4-145) / F2.30-4

(4-159)

– website http://www.ai.mit.edu/courses/6.867/

• Grading

– midterm (15%), final (25%)

– 5 (≈ bi-weekly) problem sets (30%)

– final project (30%)

Tommi Jaakkola, MIT CSAIL 2

Page 3: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Why learning?• Example problem: face recognition

Tommi Jaakkola, MIT CSAIL 3

Page 4: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Why learning?• Example problem: face recognition

Jaakkola

Indyk

Barzilay

Collins

Training data: a collection of images and labels (names)

Tommi Jaakkola, MIT CSAIL 4

Page 5: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Why learning?• Example problem: face recognition

Jaakkola

Indyk

Barzilay

Collins

Training data: a collection of images and labels (names)

Evaluation criterion: correct labeling of new images

Tommi Jaakkola, MIT CSAIL 5

Page 6: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Why learning?• Example problem: text/document classification

faculty

faculty

faculty

course

– a few labeled training documents (webpages)

– goal to label yet unseen documents

Tommi Jaakkola, MIT CSAIL 6

Page 7: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Why learning?• There are already a number of applications of this type

– face, speech, handwritten character recognition

– fraud detection (e.g., credit card)

– recommender problems (e.g., which movies/products/etc

you’d like)

– annotation of biological sequences, molecules, or assays

– market prediction (e.g., stock/house prices)

– finding errors in computer programs, computer security

– defense applications

– etc

Tommi Jaakkola, MIT CSAIL 7

Page 8: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning

Jaakkola

Indyk

Barzilay

Collins

faculty

faculty

faculty

course

• Steps

– entertain a (biased) set of possibilities (hypothesis class)

– adjust predictions based on available examples (estimation)

– rethink the set of possibilities (model selection)

Tommi Jaakkola, MIT CSAIL 8

Page 9: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning

Jaakkola

Indyk

Barzilay

Collins

faculty

faculty

faculty

course

• Steps

– entertain a (biased) set of possibilities (hypothesis class)

– adjust predictions based on available examples (estimation)

– rethink the set of possibilities (model selection)

• Principles of learning are “universal”

– society (e.g., scientific community)

– animal (e.g., human)

– machine

Tommi Jaakkola, MIT CSAIL 9

Page 10: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning, biases, representation

“yes”

Tommi Jaakkola, MIT CSAIL 10

Page 11: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning, biases, representation

“yes”

“yes”

Tommi Jaakkola, MIT CSAIL 11

Page 12: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning, biases, representation

“yes”

“yes”

“no”

Tommi Jaakkola, MIT CSAIL 12

Page 13: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning, biases, representation

“yes”

Tommi Jaakkola, MIT CSAIL 13

Page 14: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning, biases, representation

“yes”

“yes”

Tommi Jaakkola, MIT CSAIL 14

Page 15: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Learning, biases, representation

“yes”

“yes”

“no” (oops)

Tommi Jaakkola, MIT CSAIL 15

Page 16: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Representation• There are many ways of presenting the same information

0111111001110010000000100000001001111110111011111001110111110001

• The choice of representation may determine whether the

learning task is very easy or very difficult

Tommi Jaakkola, MIT CSAIL 16

Page 17: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Representation0111111001110010000000100000001001111110111011111001110111110001 “yes”

0001111100000011000001110000011001111110111111001111111101111111 “yes”

1111111000000110000011000111111000000111100000111110001101110001 “no”

Tommi Jaakkola, MIT CSAIL 17

Page 18: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Representation0111111001110010000000100000001001111110111011111001110111110001 “yes”

0001111100000011000001110000011001111110111111001111111101111111 “yes”

1111111000000110000011000111111000000111100000111110001101110001 “no”

“yes”

“yes”

“no”

Tommi Jaakkola, MIT CSAIL 18

Page 19: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Hypothesis class• Representation: examples are binary vectors of length d = 64

x = [111 . . . 0001]T =

and labels y ∈ {−1, 1} (“no”,”yes”)

• The mapping from examples to labels is a “linear classifier”

y = sign ( θ · x ) = sign ( θ1x1 + . . . + θdxd )

where θ is a vector of parameters we have to learn from

examples.

Tommi Jaakkola, MIT CSAIL 19

Page 20: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Linear classifier/experts• We can understand the simple linear classifier

y = sign ( θ · x ) = sign ( θ1x1 + . . . + θdxd )

as a way of combining expert opinion (in this case simple

binary features)

. . .

xdx1 x2

θ1 θd

“votes”

y = sign ( θ1x1 + . . . + θdxd )

combined “votes”

Expert 1θ2

majority rule

1 10

Tommi Jaakkola, MIT CSAIL 20

Page 21: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Estimationx y

0111111001110010000000100000001001111110111011111001110111110001 +10001111100000011000001110000011001111110111111001111111100000011 +11111111000000110000011000111111000000111100000111110001101111111 -1

. . . . . .• How do we adjust the parameters θ based on the labeled

examples?

y = sign ( θ · x )

Tommi Jaakkola, MIT CSAIL 21

Page 22: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Estimationx y

0111111001110010000000100000001001111110111011111001110111110001 +10001111100000011000001110000011001111110111111001111111100000011 +11111111000000110000011000111111000000111100000111110001101111111 -1

. . . . . .• How do we adjust the parameters θ based on the labeled

examples?

y = sign ( θ · x )

For example, we can simply refine/update the parameters

whenever we make a mistake:

θi← θi + y xi, i = 1, . . . , d if prediction was wrong

Tommi Jaakkola, MIT CSAIL 22

Page 23: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Evaluation• Does the simple mistake driven algorithm work?

Tommi Jaakkola, MIT CSAIL 23

Page 24: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Evaluation• Does the simple mistake driven algorithm work?

0 200 400 600 800 1000 1200 14000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

aver

age

erro

r

number of examples

(average classification error as a function of the number of

examples and labels seen so far)

Tommi Jaakkola, MIT CSAIL 24

Page 25: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Model selection• The simple linear classifier cannot solve all the problems

(e.g., XOR)

(0,0)

x1

x2

xx

xx

(1,1)(0,1)

(1,0)

Tommi Jaakkola, MIT CSAIL 25

Page 26: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Model selection• The simple linear classifier cannot solve all the problems

(e.g., XOR)

−1

x1

x2

xx

xx1

−11

Tommi Jaakkola, MIT CSAIL 26

Page 27: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Model selection• The simple linear classifier cannot solve all the problems

(e.g., XOR)

−1

x1

x2

xx

xx1

−11

• Can we rethink the approach to do even better?

Tommi Jaakkola, MIT CSAIL 27

Page 28: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Model selection• The simple linear classifier cannot solve all the problems

(e.g., XOR)

−1

x1

x2

xx

xx1

−11

• Can we rethink the approach to do even better?

We can, for example, add “polynomial experts”

y = sign ( θ1x1 + . . . + θdxd + θ12x1x2 + . . . )

Tommi Jaakkola, MIT CSAIL 28

Page 29: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Model selection cont’d

−1.5 −1 −0.5 0 0.5 1 1.5 2−1

−0.5

0

0.5

1

1.5

2

−1.5 −1 −0.5 0 0.5 1 1.5 2−1

−0.5

0

0.5

1

1.5

2

linear 2nd order polynomial

−1.5 −1 −0.5 0 0.5 1 1.5 2−1

−0.5

0

0.5

1

1.5

2

−1.5 −1 −0.5 0 0.5 1 1.5 2−1

−0.5

0

0.5

1

1.5

2

4th order polynomial 8th order polynomial

Tommi Jaakkola, MIT CSAIL 29

Page 30: Tommi S. Jaakkola MIT CSAIL tommi@csail.mit

Types of learning problems (not exhaustive)• Supervised learning: explicit feedback in the form of examples

and target labels

– goal to make predictions based on examples (classify them,

predict prices, etc)

• Unsupervised learning: only examples, no explicit feedback

– goal to reveal structure in the observed data

• Semi-supervised learning: limited explicit feedback, mostly

only examples

– tries to improve predictions based on examples by making

use of the additional “unlabeled” examples

• Reinforcement learning: delayed and partial feedback, no

explicit guidance

– goal to minimize the cost of a sequence of actions (policy)

Tommi Jaakkola, MIT CSAIL 30