machine learning - university of massachusetts amherst to ml.pdfibm jeopardy quiz program cmpsci 689...
TRANSCRIPT
Machine LearningProfessor Sridhar Mahadevan
Lecture 1
Home page: www-edlab.cs.umass.edu/cs689
Quizzes, mini-projects: moodle.umass.edu
Discussion forum: piazza.com
CMPSCI 689 – p. 1/35
What is "Learning"?
Motor skills: walk, drive a bicycle, drive, play tennis or
golf, play the piano.
Language: Speech recognition, read and write natural
languages
Spatial knowledge: Navigate between spatial locations,
physical layout of a room.
Symbolic knowledge: algebra, arithmetic, calculus.
Social rules: how to interact with people, animals,
machines....
CMPSCI 689 – p. 2/35
What Activity isShown Here?
32
CMPSCI 689 – p. 3/35
The Challenge ofLearning
How is it possible that animals and humans are able to learn
so much knowledge from a relatively small number of
examples?
Most of what is learned is already built-in (The
Blank Slate, Steve Pinker).
The brain is hardwired to learn specific classes of
functions (e.g., language, faces, motor control).
Evolution has equipped the brain with some
amazingly clever algorithms.
The brain is massively parallel with a 100 billion
“slow" unreliable computing units (neurons).
CMPSCI 689 – p. 4/35
Abstract Definition of"Learning"
"Learning" denotes changes in a system thatare adaptive in that they enable the system toperform the same task or similar tasks drawnfrom the same population better over time(Herbert Simon, 1980).
“Learning” denotes knowledge acquisition inthe absense of explicit programming (Valiant,1986).
CMPSCI 689 – p. 5/35
Why shouldMachines "Learn"?
"Learning" can be viewed as a form of implicitprogramming.
Imagine a robot that learns to play tennis byobserving people play, and by trial and error.
If the task changes over time, learning canmake a machine adaptive.
Learning may enable a machine tooutperform human programming.
CMPSCI 689 – p. 6/35
Why Study MachineLearning?
“If you invent a breakthrough in artificialintelligence, so machines can learn, that isworth 10 Microsofts”.
Bill Gates quoted in NY Times, MondayMarch 3, 2004.
CMPSCI 689 – p. 7/35
IBM Jeopardy QuizProgram
CMPSCI 689 – p. 8/35
Speech Recognitionon Smart Phones
CMPSCI 689 – p. 9/35
Imagenet VisionChallenge
CMPSCI 689 – p. 10/35
Mapping Images toText
CMPSCI 689 – p. 11/35
Autonomous Driving
CMPSCI 689 – p. 12/35
Machine Learning onMars
CMPSCI 689 – p. 13/35
First MachineLearning Program
CMPSCI 689 – p. 14/35
Work done at the ALLLab
CMPSCI 689 – p. 15/35
Google Deep Mind
CMPSCI 689 – p. 16/35
ReinforcementLearning in the Brain
CMPSCI 689 – p. 17/35
Related Fields
Biology: Brain, Development, Evolution, Genetics,
Neuroscience.
Information Theory: Coding Theory, Entropy.
Linguistics: Grammars, Language acquisition
Mathematics: Calculus, Linear Algebra, Optimization.
Psychology: Analogy, Concept Learning, Curiosity,
Discovery, Memory, Reinforcement
Philosophy: Causality, Induction, Theory Formation
Statistics: Probability Distributions, Estimation,
Hypothesis Testing.
CMPSCI 689 – p. 18/35
Learning as Search
The process of learning can be viewed as one of
searching through a space of hypotheses H for one
that “best fits” the data.
The data can be viewed as samples from a (known,
unknown) probability distribution
The data can be discrete (e.g., rooms in a building,
words, web pages), or continuous (sensor
measurements).
The data may be “labeled” (category or reward signal)
or “unlabeled”
CMPSCI 689 – p. 19/35
Data Modeling
Data from a known distribution:
Assumes that the data is coming from a specific
class of distributions P (x|θ) (e.g., Multinomial,
Normal, Poisson)
Models: Logistic regression, Mixure model, Hidden
Markov Model, Dynamic Bayes Nets.
Distribution-free learning:
Examples: Deep learning, Decision trees, Nearest
Neighbor, Support Vector Machines, Manifold
learning.
CMPSCI 689 – p. 20/35
ProblemFormulations
Density estimation: “Unsupervised” learning
Estimate (joint) distribution of the data P (X)
Classification: “Supervised” learning
Estimate conditional distribution P (Y |X)
Regression: Function approximation
Estimate conditional mean E(Y |X)
Reinforcement Learning: Control learning
Learn a policy π mapping states (S) to actions (A)
that maximize long-term rewards (R).
CMPSCI 689 – p. 21/35
The Indus Script
Fig. 1. An example of an Indus seal, showing the three
4000 year old undeciphered language
CMPSCI 689 – p. 22/35
Deciphering theIndus Script
CMPSCI 689 – p. 23/35
Markov Model
CMPSCI 689 – p. 24/35
Is the Indus Script aLanguage?
CMPSCI 689 – p. 25/35
Limitations ofLearning
Computational learning theory (Gold, 1960s; Valiant,
1986; Vapnik and Chervonenkis, 1974)
A "complexity"-theory distribution-free model of
learning.
This theory identifies conditions under which
reliable learning is possible.
Makes rich connections to algorithmic hardness
results (complexity classes).
Led to some of the best machine learning
algorithms (support vector machines).
CMPSCI 689 – p. 26/35
PAC Learning
Given a class H of functions on a space of instances
X, a fixed but unknown distribution P on X, how many
examples are needed to "learn" any f ∈ H?
Learner outputs an approximation h whose true error
w.r.t. P is ≤ ǫ, 0 < ǫ < 1.
Learner converges to a good approximation with
probability ≥ 1− δ, 0 < δ < 1.
Finite H: Learner needs m ≤ 1
ǫ
(
log(1δ) + log(|H|)
)
examples.
General H: m ≈ 1
ǫ
(
log(1δ) + V C(H)
)
CMPSCI 689 – p. 27/35
Administrivia
Class lectures: M/Wed 2:30-3:45, Room 142
My office hours: M/Wed 1:30:-2:30, Room 204
TAs: Clemens Rosenbaum, Francisco Garcia
Get a class account on piazza.com
Ed lab account on elnux*.cs.umass.edu (MATLAB)
CMPSCI 689 – p. 28/35
Recommended Texts
Kevin Murphy, Machine Learning: A Probabilistic
Approach, MIT Press, 2012.
Richard Sutton and Andrew Barto, Reinforcement
Learning: An Introduction, MIT Press, 2009.
Hastie, Tibshirani, and Friedman, Statistical Learning,
Springer-Verlag (2nd edition). (available online)
David Mackay, Information Theory, Inference, and
Machine Learning (Cambridge Univ. Press). (available
online)
CMPSCI 689 – p. 29/35
Background Material
Linear algebra (e.g., Strang)
Statistics (e.g., Casella and Berger)
Optimization (e.g., Boyd and Vanderberghe (available
online))
Multivariate calculus (e.g., Lagrange multipliers)
CMPSCI 689 – p. 30/35
Many SoftwareResources
MATLAB (available on edlab machines)
Python ML packages
R and RStudio statistics package
Weka Java based ML package
Theano, Torch, Caffe, Mocha: deep learning packages
CMPSCI 689 – p. 31/35
Course Outline
September Unsupervised Learning
October Supervised Learning
November Reinforcement Learning
CMPSCI 689 – p. 32/35
Weekly Readings andCourse Project
Readings: See class web page
Final project:
Oct 19th: Preliminary project proposal
Dec 7th, 9th: Final project presentations.
CMPSCI 689 – p. 33/35
Course Grading
Section Weight
Mini projects 30%
Quizzes 30%
Final Project 30%
Independent Activities 10%
CMPSCI 689 – p. 34/35
Reading for NextWeek
Read the survey article "A Few Useful Things to Know
About Machine Learning" by Pedro Domingos (see
Moodle for paper or class web page).
Read Chapter 1 in Murphy textbook
Review basic concepts from linear algebra: matrices,
vector spaces, subspaces, eigenvalues/eigenvectors,
orthogonality.
Review basic probability/statistics: random variables,
distributions, moments (means, variances).
CMPSCI 689 – p. 35/35