1 machine learning overview adapted from sargur n. srihari university at buffalo, state university...
TRANSCRIPT
1
Machine Learning Overview
Adapted from Sargur N. SrihariUniversity at Buffalo, State University of New YorkUSA
Outline1.What is Machine Learning (ML)?2. Types of Information Processing
Problems Solved1.2.3.4.
RegressionClassificationClusteringModeling Uncertainty/Inference
3. New Developments1.Fully Bayesian Approach
4.Summary
2
3
What is Machine Learning?
• Programming computers to:–Perform tasks that humans perform well but
difficult to specify algorithmically
• Principled way of building high performanceinformation processing systems–search engines, information retrieval
–adaptive user interfaces, personalizedassistants (information systems)
–scientific application (computational science)
–engineering
4
Example Problem:Handwritten Digit Recognition
• Handcrafted ruleswill result in large noof rules andexceptions
• Better to have amachine that learnsfrom a large trainingset
Wide variability of same numeral
5
ML History
•
•
•
•
•
ML has origins in Computer Science
PR has origins in Engineering
They are different facets of the same field
Methods around for over 50 years
Revival of Bayesian methods–due to availability of computational methods
6
Some Successful Applications ofMachine Learning
• Learning to recognize spoken words–Speaker-specific strategies for recognizing
primitive sounds (phonemes) and wordsfrom speech signal
–Neural networks and methods for learningHMMs for customizing to individual
speakers, vocabularies and microphonecharacteristics
Table 1.1
7
Some Successful Applications ofMachine Learning
• Learning to drive anautonomous vehicle– Train computer-controlled
vehicles to steer correctly
– Drive at 70 mph for 90miles on public highways
– Associate steeringcommands with imagesequences
8
Handwriting Recognition
• Task T– recognizing and classifying
handwritten words withinimages
• Performance measure P– percent of words correctly
classified
• Training experience E– a database of handwritten
words with givenclassifications
Generalization
(Training)
The ML ApproachData Collection
SamplesModel Selection
Probability distribution to model processParameter Estimation
SearchFind optimal solution to problem
Values/distributions
Decision(Inference
ORTesting)
9
Types of Problems machinelearning is used for
10
1. Classification
2. Regression
3. Clustering (Data Mining)
4. Modeling/Inference
16
Classification
• Three classes (Stratified, Annular, Homogeneos)
• Two variables shown• 100 points
Which classshould xbelong to?
17
Cell-based Classification
• Naïve approach of cellbased voting will fail– exponential growth of cells
with dimensionality– 12 dimensions discretized
into 6 gives 3 million cells
• Hardly any points in eachcell
Popular Statistical Models• Generative
– Naïve Bayes
– Mixtures ofmultinomials
– Mixtures of Gaussians
– Hidden Markov Models(HMM)
– Bayesian networks
– Markov random fields
18
• Discriminative– Logistic regression
– SVMs
– Traditional neuralnetworks
– Nearest neighbor
– Conditional RandomFields (CRF)
Regression ProblemsForward problemdata set
Red curve is result offitting a two-layerneural networkby minimizingsum-of-squared
19 error
Corresponding inverseproblem by reversingx and t
Very poor fitto data:GMMs used here
πkare mixing coefficients that sum to one
21
Clustering•
•
•
•
Old Faithful (Hydrothermal Geyser inYellowstone)
– 272 observations– Duration (mins, horiz axis) vs Time to next eruption (vertical axis)– Simple Gaussian unable to capture structure– Linear superposition of two Gaussians is better
Gaussian has limitations in modeling realdata setsGaussian Mixture Models give verycomplex densities
One –dimension– Three Gaussians in blue– Sum in red
Bayesian Representation of Uncertainty
• Use of probabilityto representuncertainty is notan ad-hoc choice
• If numerical valuesrepresent degreesof belief,– then simple axioms
for manipulatingdegrees of beliefleads to sum andproduct rules ofprobability
23
• Not just frequency of random,repeatable event
• It is a quantification ofuncertainty
• Example: Whether Arctic ice capwill disappear by end of century– We have some idea of how
quickly polar ice is melting– Revise it on the basis of fresh
evidence (satellite observations)– Assessment will affect actions
we take (to reduce greenhousegases)
• Handled by general Bayesianinterpretation
Modeling Uncertainty
24
• A Causal Bayesian Network• Example of Inference:
Cancer is independentof Age and Gender givenexposure to Toxicsand Smoking
25
The Fully Bayesian Approach
•
•
•
•
Bayes Rule
Bayesian Probabilities
Concept of Conjugacy
Monte Carlo Sampling
26
Rules of Probability• Given random variables X and Y• Sum Rule gives Marginal Probability
• Product Rule: joint probability in terms of conditional andmarginal
• Combining we get Bayes Rule
where
Viewed asPosterior likelihood prior
27
Probability Distributions: RelationshipsDiscrete-Binary
BernoulliSingle binary variable
Discrete-Multi-valued
Continuous
Gaussian
Angular Von Mises
BinomialN samples of Bernoulli
BetaContinuous variable
MultinomialOne of K values =K-dimensionalbinary vector
GammaConjugatePrior of univariateGaussian precision
Infinite mixture ofGaussians
DirichletK random variablesbetween [0.1]
ExponentialSpecial case of Gamma
Uniform
N=1ConjugatePrior
ConjugatePrior
WishartConjugate Prior of multivariateGaussian precision matrix
LargeN
Student’s-tGeneralization ofGaussian robust toOutliers
between {0,1]
K=2
Gaussian-GammaConjugate prior of univariate GaussianUnknown mean and precision
Gaussian-WishartConjugate prior of multi-variate GaussianUnknown mean and precision matrix
28
Fully Bayesian Approach
• Feasible with increased computationalpower
• Intractable posterior distribution handledusing either–variational Bayes or
–stochastic sampling• e.g., Markov Chain Monte Carlo, Gibbs
29
Summary• ML is a systematic approach instead of ad-hockery
in designing information processing systems• Useful for solving problems of classification,
regression, clustering and modeling• The fully Bayesian approach together with Monte
Carlo sampling is leading to high performingsystems
• Great practical value in many applications– Computer science
• search engines• text processing• document recognition
– Engineering