1 machine learning overview adapted from sargur n. srihari university at buffalo, state university...

22
1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Upload: linda-jennifer-tyler

Post on 11-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

1

Machine Learning Overview

Adapted from Sargur N. SrihariUniversity at Buffalo, State University of New YorkUSA

Page 2: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Outline1.What is Machine Learning (ML)?2. Types of Information Processing

Problems Solved1.2.3.4.

RegressionClassificationClusteringModeling Uncertainty/Inference

3. New Developments1.Fully Bayesian Approach

4.Summary

2

Page 3: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

3

What is Machine Learning?

• Programming computers to:–Perform tasks that humans perform well but

difficult to specify algorithmically

• Principled way of building high performanceinformation processing systems–search engines, information retrieval

–adaptive user interfaces, personalizedassistants (information systems)

–scientific application (computational science)

–engineering

Page 4: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

4

Example Problem:Handwritten Digit Recognition

• Handcrafted ruleswill result in large noof rules andexceptions

• Better to have amachine that learnsfrom a large trainingset

Wide variability of same numeral

Page 5: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

5

ML History

ML has origins in Computer Science

PR has origins in Engineering

They are different facets of the same field

Methods around for over 50 years

Revival of Bayesian methods–due to availability of computational methods

Page 6: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

6

Some Successful Applications ofMachine Learning

• Learning to recognize spoken words–Speaker-specific strategies for recognizing

primitive sounds (phonemes) and wordsfrom speech signal

–Neural networks and methods for learningHMMs for customizing to individual

speakers, vocabularies and microphonecharacteristics

Table 1.1

Page 7: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

7

Some Successful Applications ofMachine Learning

• Learning to drive anautonomous vehicle– Train computer-controlled

vehicles to steer correctly

– Drive at 70 mph for 90miles on public highways

– Associate steeringcommands with imagesequences

Page 8: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

8

Handwriting Recognition

• Task T– recognizing and classifying

handwritten words withinimages

• Performance measure P– percent of words correctly

classified

• Training experience E– a database of handwritten

words with givenclassifications

Page 9: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Generalization

(Training)

The ML ApproachData Collection

SamplesModel Selection

Probability distribution to model processParameter Estimation

SearchFind optimal solution to problem

Values/distributions

Decision(Inference

ORTesting)

9

Page 10: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Types of Problems machinelearning is used for

10

1. Classification

2. Regression

3. Clustering (Data Mining)

4. Modeling/Inference

Page 11: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

16

Classification

• Three classes (Stratified, Annular, Homogeneos)

• Two variables shown• 100 points

Which classshould xbelong to?

Page 12: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

17

Cell-based Classification

• Naïve approach of cellbased voting will fail– exponential growth of cells

with dimensionality– 12 dimensions discretized

into 6 gives 3 million cells

• Hardly any points in eachcell

Page 13: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Popular Statistical Models• Generative

– Naïve Bayes

– Mixtures ofmultinomials

– Mixtures of Gaussians

– Hidden Markov Models(HMM)

– Bayesian networks

– Markov random fields

18

• Discriminative– Logistic regression

– SVMs

– Traditional neuralnetworks

– Nearest neighbor

– Conditional RandomFields (CRF)

Page 14: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Regression ProblemsForward problemdata set

Red curve is result offitting a two-layerneural networkby minimizingsum-of-squared

19 error

Corresponding inverseproblem by reversingx and t

Very poor fitto data:GMMs used here

Page 15: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

πkare mixing coefficients that sum to one

21

Clustering•

Old Faithful (Hydrothermal Geyser inYellowstone)

– 272 observations– Duration (mins, horiz axis) vs Time to next eruption (vertical axis)– Simple Gaussian unable to capture structure– Linear superposition of two Gaussians is better

Gaussian has limitations in modeling realdata setsGaussian Mixture Models give verycomplex densities

One –dimension– Three Gaussians in blue– Sum in red

Page 16: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Bayesian Representation of Uncertainty

• Use of probabilityto representuncertainty is notan ad-hoc choice

• If numerical valuesrepresent degreesof belief,– then simple axioms

for manipulatingdegrees of beliefleads to sum andproduct rules ofprobability

23

• Not just frequency of random,repeatable event

• It is a quantification ofuncertainty

• Example: Whether Arctic ice capwill disappear by end of century– We have some idea of how

quickly polar ice is melting– Revise it on the basis of fresh

evidence (satellite observations)– Assessment will affect actions

we take (to reduce greenhousegases)

• Handled by general Bayesianinterpretation

Page 17: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Modeling Uncertainty

24

• A Causal Bayesian Network• Example of Inference:

Cancer is independentof Age and Gender givenexposure to Toxicsand Smoking

Page 18: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

25

The Fully Bayesian Approach

Bayes Rule

Bayesian Probabilities

Concept of Conjugacy

Monte Carlo Sampling

Page 19: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

26

Rules of Probability• Given random variables X and Y• Sum Rule gives Marginal Probability

• Product Rule: joint probability in terms of conditional andmarginal

• Combining we get Bayes Rule

where

Viewed asPosterior likelihood prior

Page 20: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

27

Probability Distributions: RelationshipsDiscrete-Binary

BernoulliSingle binary variable

Discrete-Multi-valued

Continuous

Gaussian

Angular Von Mises

BinomialN samples of Bernoulli

BetaContinuous variable

MultinomialOne of K values =K-dimensionalbinary vector

GammaConjugatePrior of univariateGaussian precision

Infinite mixture ofGaussians

DirichletK random variablesbetween [0.1]

ExponentialSpecial case of Gamma

Uniform

N=1ConjugatePrior

ConjugatePrior

WishartConjugate Prior of multivariateGaussian precision matrix

LargeN

Student’s-tGeneralization ofGaussian robust toOutliers

between {0,1]

K=2

Gaussian-GammaConjugate prior of univariate GaussianUnknown mean and precision

Gaussian-WishartConjugate prior of multi-variate GaussianUnknown mean and precision matrix

Page 21: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

28

Fully Bayesian Approach

• Feasible with increased computationalpower

• Intractable posterior distribution handledusing either–variational Bayes or

–stochastic sampling• e.g., Markov Chain Monte Carlo, Gibbs

Page 22: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

29

Summary• ML is a systematic approach instead of ad-hockery

in designing information processing systems• Useful for solving problems of classification,

regression, clustering and modeling• The fully Bayesian approach together with Monte

Carlo sampling is leading to high performingsystems

• Great practical value in many applications– Computer science

• search engines• text processing• document recognition

– Engineering