machine learning lecture

Machine Learning

Roughly speaking, for a given learning task, with a given finite amount of training data, the

best generalization performance will be achieved if the right balance is struck between the

accuracy attained on that particular training set, and the “capacity” of the machine, that is, the

ability of the machine to learn any training set without error. A machine with too much capacity

is like a botanist with a photographic memory who, when presented with a new tree,

concludes that it is not a tree because it has a different number of leaves from anything she

has seen before; a machine with too little capacity is like the botanist’s lazy brother, who

declares that if it’s green, it’s a tree. Neither can generalize well. The exploration and

formalization of these concepts has resulted in one of the shining peaks of the theory of

statistical learning.

(Vapnik, 1979)

What is machine learning?

examples

training

Output

Predictions

Classifications

Clusters

OrdinalsWhy: Face Recognition?

Categories of problems

Clustering

Classification

Regression

Ordinal Reg.

Prediction

By output:

By input:

Vector, X Time Series, x(t)

One size never fits all…

• Improving an algorithm:

– First option: better features

• Visualize classes

• Trends

• Histograms

– Next: make the algorithm smarter (more complicated)

• Interaction of features

• Better objective and training criteria

WEKA or GGOBI

-4 -2 0 2 4 6-20

y=1 + 0.5t + 4t2 - t3

-4 -2 0 2 4 6-20

Categories of ML algorithms

Non-parametric Parametric

By model:

By training:

Supervised (labeled) Unsupervised (unlabeled)

Raw data only Model parameters only

-4 -2 0 2 4 6-20

Kernel

methods

-4 -2 0 2 4 6-20

0 50 100 150 200 2500

-4 -2 0 2 4 6-20

Training a ML algorithm

• Choose data

• Optimize model parameters according to:

– Objective function

-4 -2 0 2 4 6-20

Regression Classification

-2 0 2 4 6 8-2

2Mean Square Error

Max Margin

Pitfalls of ML algorithms

• Clean your features:– Training volume: more is better

– Outliers: remove them!

– Dynamic range: normalize it!

• Generalization– Over fitting

– Under fitting

• Speed: parametric vs. non

• What are you learning? …features, features, features…

outliers

-4 -2 0 2 4 6-20

ut Keep a “good” percentile range!

5-95, 1-99: depends on your data

Dynamic range

0 0.2 0.4 0.6 0.8 1-0.2

0 200 400 600 800 1000-1

0 200 400 600 800 10000

-2 0 2 4 6 8-1

Over fitting and comparing

algorithms

• Early stop

• Regularization

• Validation Sets

Under fittingCurse of dimensionality

K-Means clustering

•Planar decision boundaries,

depending on space you are in…

•Highly Efficient

•Not always great (but usually

pretty good)

•Needs good starting criteria

K-Nearest Neighbor

•Arbitrary decision boundaries

•Not so efficient…

•With enough data in each class…

optimal

•Easy to train, known as a lazy classifier

Mixture of Gaussians•Arbitrary decision boundaries

with enough boundaries

•Efficient, depending on number

of models and Gaussians

•Can represent more than just

Gaussian distributions

•Generative, sometimes tough to

train up

•Spurious singularities

•Can get a distribution for a

specific class and feature(s)… and

get a Bayesian classifier

Components Analysis

(principal or independent)•Reduces dimensionality

•All other classifiers work in a

rotated space

•Remember Eigen-values and

Vectors?

Trees Classifiers

•Arbitrary Decision boundaries

•Can be quite efficient (or not!)

•Needs good criteria for splitting

•Easy to visualize

Multi-Layer Perceptron

•Arbitrary (but linear) Decision

boundaries

•Can be quite efficient (or not!)

•What did it learn?

Support Vector Machines

•Efficiency depends on support

vector size and feature size

Hidden Markov Models

•Efficiency depends on state

space and number of models

•Generalizes to incorporate

features that change over time

More sophisticated approaches

• Graphical models (like an HMM)– Bayesian network

– Markov random fields

• Boosting– Adaboost

• Voting

• Cascading

• Stacking…

machine learning lecture

Technology

course 395: machine learning – lectures · maja pantic...

machine learning lecture 2 basics

lecture 2 machine learning, probability...

this lecture: advanced machine learning lecture 15

machine learning lecture 5 bayesian learning g53mle |...

machine learning lecture 10 decision trees g53mle machine...

machine learning in practice lecture 14

lecture notes in machine learning

image analysis lecture 9.3 -introduction to machine...

lecture 1: introduction to machine learning

introduction to machine learning lecture...

lecture 22: foundations of machine learning

lecture 0: machine learning - gatech.edu

machine learning lecture outline

machine learning fundamentals: lecture 1

advanced machine learning lecture 3

wald lecture 1 machine learning

introduction to machine learning lecture

machine learning lecture 1

lecture 2 fundamentals of machine learning › ~qf-zhao ›...