machine learning lecture

22
Machine Learning Roughly speaking, for a given learning task, with a given finite amount of training data, the best generalization performance will be achieved if the right balance is struck between the accuracy attained on that particular training set, and the “capacity” of the machine, that is, the ability of the machine to learn any training set without error. A machine with too much capacity is like a botanist with a photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything she has seen before; a machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree. Neither can generalize well. The exploration and formalization of these concepts has resulted in one of the shining peaks of the theory of statistical learning. (Vapnik, 1979)

Upload: eric-larson

Post on 25-Dec-2014

356 views

Category:

Technology


3 download

DESCRIPTION

A lecture I gave for CSE/EE599 on the basics of machine learning and different toolkits.

TRANSCRIPT

Page 1: Machine Learning Lecture

Machine Learning

Roughly speaking, for a given learning task, with a given finite amount of training data, the

best generalization performance will be achieved if the right balance is struck between the

accuracy attained on that particular training set, and the “capacity” of the machine, that is, the

ability of the machine to learn any training set without error. A machine with too much capacity

is like a botanist with a photographic memory who, when presented with a new tree,

concludes that it is not a tree because it has a different number of leaves from anything she

has seen before; a machine with too little capacity is like the botanist’s lazy brother, who

declares that if it’s green, it’s a tree. Neither can generalize well. The exploration and

formalization of these concepts has resulted in one of the shining peaks of the theory of

statistical learning.

(Vapnik, 1979)

Page 2: Machine Learning Lecture

What is machine learning?

Data

examples

Model

training

Output

Predictions

Classifications

Clusters

OrdinalsWhy: Face Recognition?

Page 3: Machine Learning Lecture

Categories of problems

Clustering

Classification

Regression

Ordinal Reg.

Prediction

By output:

By input:

Vector, X Time Series, x(t)

Page 4: Machine Learning Lecture

One size never fits all…

• Improving an algorithm:

– First option: better features

• Visualize classes

• Trends

• Histograms

– Next: make the algorithm smarter (more complicated)

• Interaction of features

• Better objective and training criteria

WEKA or GGOBI

Page 5: Machine Learning Lecture

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

y=1 + 0.5t + 4t2 - t3

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

Categories of ML algorithms

Non-parametric Parametric

By model:

By training:

Supervised (labeled) Unsupervised (unlabeled)

Raw data only Model parameters only

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

Kernel

methods

Page 6: Machine Learning Lecture

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

0 50 100 150 200 2500

0.05

0.1

0.15

0.2

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

Page 7: Machine Learning Lecture

Training a ML algorithm

• Choose data

• Optimize model parameters according to:

– Objective function

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

Regression Classification

-2 0 2 4 6 8-2

0

2

4

6

8

10

1

2Mean Square Error

Max Margin

Page 8: Machine Learning Lecture

Pitfalls of ML algorithms

• Clean your features:– Training volume: more is better

– Outliers: remove them!

– Dynamic range: normalize it!

• Generalization– Over fitting

– Under fitting

• Speed: parametric vs. non

• What are you learning? …features, features, features…

Page 9: Machine Learning Lecture

outliers

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

input

outp

ut

-4 -2 0 2 4 6-20

-10

0

10

20

30

40

50

input

outp

ut Keep a “good” percentile range!

5-95, 1-99: depends on your data

Page 10: Machine Learning Lecture

Dynamic range

0 0.2 0.4 0.6 0.8 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

f1

f2

1

2

0 200 400 600 800 1000-1

0

1

2

3

4

5

6

f1

f2

1

2

0 200 400 600 800 10000

50

100

150

200

250

300

350

400

f1

f2

1

2

-2 0 2 4 6 8-1

0

1

2

3

4

5

6

f1

f2

1

2

Page 11: Machine Learning Lecture

Over fitting and comparing

algorithms

• Early stop

• Regularization

• Validation Sets

Page 12: Machine Learning Lecture

Under fittingCurse of dimensionality

Page 13: Machine Learning Lecture

Under fittingCurse of dimensionality

Page 14: Machine Learning Lecture

K-Means clustering

•Planar decision boundaries,

depending on space you are in…

•Highly Efficient

•Not always great (but usually

pretty good)

•Needs good starting criteria

Page 15: Machine Learning Lecture

K-Nearest Neighbor

•Arbitrary decision boundaries

•Not so efficient…

•With enough data in each class…

optimal

•Easy to train, known as a lazy classifier

Page 16: Machine Learning Lecture

Mixture of Gaussians•Arbitrary decision boundaries

with enough boundaries

•Efficient, depending on number

of models and Gaussians

•Can represent more than just

Gaussian distributions

•Generative, sometimes tough to

train up

•Spurious singularities

•Can get a distribution for a

specific class and feature(s)… and

get a Bayesian classifier

Page 17: Machine Learning Lecture

Components Analysis

(principal or independent)•Reduces dimensionality

•All other classifiers work in a

rotated space

•Remember Eigen-values and

Vectors?

Page 18: Machine Learning Lecture

Trees Classifiers

•Arbitrary Decision boundaries

•Can be quite efficient (or not!)

•Needs good criteria for splitting

•Easy to visualize

Page 19: Machine Learning Lecture

Multi-Layer Perceptron

•Arbitrary (but linear) Decision

boundaries

•Can be quite efficient (or not!)

•What did it learn?

Page 20: Machine Learning Lecture

Support Vector Machines

•Arbitrary Decision boundaries

•Efficiency depends on support

vector size and feature size

Page 21: Machine Learning Lecture

Hidden Markov Models

•Arbitrary Decision boundaries

•Efficiency depends on state

space and number of models

•Generalizes to incorporate

features that change over time

Page 22: Machine Learning Lecture

More sophisticated approaches

• Graphical models (like an HMM)– Bayesian network

– Markov random fields

• Boosting– Adaboost

• Voting

• Cascading

• Stacking…