machine learning - university of massachusetts amherst to ml.pdfibm jeopardy quiz program cmpsci 689...

Machine LearningProfessor Sridhar Mahadevan

[email protected]

Lecture 1

Home page: www-edlab.cs.umass.edu/cs689

Quizzes, mini-projects: moodle.umass.edu

Discussion forum: piazza.com

CMPSCI 689 – p. 1/35

What is "Learning"?

Motor skills: walk, drive a bicycle, drive, play tennis or

golf, play the piano.

Language: Speech recognition, read and write natural

languages

Spatial knowledge: Navigate between spatial locations,

physical layout of a room.

Symbolic knowledge: algebra, arithmetic, calculus.

Social rules: how to interact with people, animals,

machines....

CMPSCI 689 – p. 2/35

What Activity isShown Here?

32

CMPSCI 689 – p. 3/35

The Challenge ofLearning

How is it possible that animals and humans are able to learn

so much knowledge from a relatively small number of

examples?

Most of what is learned is already built-in (The

Blank Slate, Steve Pinker).

The brain is hardwired to learn specific classes of

functions (e.g., language, faces, motor control).

Evolution has equipped the brain with some

amazingly clever algorithms.

The brain is massively parallel with a 100 billion

“slow" unreliable computing units (neurons).

CMPSCI 689 – p. 4/35

Abstract Definition of"Learning"

"Learning" denotes changes in a system thatare adaptive in that they enable the system toperform the same task or similar tasks drawnfrom the same population better over time(Herbert Simon, 1980).

“Learning” denotes knowledge acquisition inthe absense of explicit programming (Valiant,1986).

CMPSCI 689 – p. 5/35

Why shouldMachines "Learn"?

"Learning" can be viewed as a form of implicitprogramming.

Imagine a robot that learns to play tennis byobserving people play, and by trial and error.

If the task changes over time, learning canmake a machine adaptive.

Learning may enable a machine tooutperform human programming.

CMPSCI 689 – p. 6/35

Why Study MachineLearning?

“If you invent a breakthrough in artificialintelligence, so machines can learn, that isworth 10 Microsofts”.

Bill Gates quoted in NY Times, MondayMarch 3, 2004.

CMPSCI 689 – p. 7/35

IBM Jeopardy QuizProgram

CMPSCI 689 – p. 8/35

Speech Recognitionon Smart Phones

CMPSCI 689 – p. 9/35

Imagenet VisionChallenge

CMPSCI 689 – p. 10/35

Mapping Images toText

CMPSCI 689 – p. 11/35

Autonomous Driving

CMPSCI 689 – p. 12/35

Machine Learning onMars

CMPSCI 689 – p. 13/35

First MachineLearning Program

CMPSCI 689 – p. 14/35

Work done at the ALLLab

CMPSCI 689 – p. 15/35

Google Deep Mind

CMPSCI 689 – p. 16/35

ReinforcementLearning in the Brain

CMPSCI 689 – p. 17/35

Related Fields

Biology: Brain, Development, Evolution, Genetics,

Neuroscience.

Information Theory: Coding Theory, Entropy.

Linguistics: Grammars, Language acquisition

Mathematics: Calculus, Linear Algebra, Optimization.

Psychology: Analogy, Concept Learning, Curiosity,

Discovery, Memory, Reinforcement

Philosophy: Causality, Induction, Theory Formation

Statistics: Probability Distributions, Estimation,

Hypothesis Testing.

CMPSCI 689 – p. 18/35

Learning as Search

The process of learning can be viewed as one of

searching through a space of hypotheses H for one

that “best fits” the data.

The data can be viewed as samples from a (known,

unknown) probability distribution

The data can be discrete (e.g., rooms in a building,

words, web pages), or continuous (sensor

measurements).

The data may be “labeled” (category or reward signal)

or “unlabeled”

CMPSCI 689 – p. 19/35

Data Modeling

Data from a known distribution:

Assumes that the data is coming from a specific

class of distributions P (x|θ) (e.g., Multinomial,

Normal, Poisson)

Models: Logistic regression, Mixure model, Hidden

Markov Model, Dynamic Bayes Nets.

Distribution-free learning:

Examples: Deep learning, Decision trees, Nearest

Neighbor, Support Vector Machines, Manifold

learning.

CMPSCI 689 – p. 20/35

ProblemFormulations

Density estimation: “Unsupervised” learning

Estimate (joint) distribution of the data P (X)

Classification: “Supervised” learning

Estimate conditional distribution P (Y |X)

Regression: Function approximation

Estimate conditional mean E(Y |X)

Reinforcement Learning: Control learning

Learn a policy π mapping states (S) to actions (A)

that maximize long-term rewards (R).

CMPSCI 689 – p. 21/35

The Indus Script

Fig. 1. An example of an Indus seal, showing the three

4000 year old undeciphered language

CMPSCI 689 – p. 22/35

Deciphering theIndus Script

CMPSCI 689 – p. 23/35

Markov Model

CMPSCI 689 – p. 24/35

Is the Indus Script aLanguage?

CMPSCI 689 – p. 25/35

Limitations ofLearning

Computational learning theory (Gold, 1960s; Valiant,

1986; Vapnik and Chervonenkis, 1974)

A "complexity"-theory distribution-free model of

learning.

This theory identifies conditions under which

reliable learning is possible.

Makes rich connections to algorithmic hardness

results (complexity classes).

Led to some of the best machine learning

algorithms (support vector machines).

CMPSCI 689 – p. 26/35

PAC Learning

Given a class H of functions on a space of instances

X, a fixed but unknown distribution P on X, how many

examples are needed to "learn" any f ∈ H?

Learner outputs an approximation h whose true error

w.r.t. P is ≤ ǫ, 0 < ǫ < 1.

Learner converges to a good approximation with

probability ≥ 1− δ, 0 < δ < 1.

Finite H: Learner needs m ≤ 1

ǫ

(

log(1δ) + log(|H|)

)

examples.

General H: m ≈ 1

ǫ

(

log(1δ) + V C(H)

)

CMPSCI 689 – p. 27/35

Administrivia

Class lectures: M/Wed 2:30-3:45, Room 142

My office hours: M/Wed 1:30:-2:30, Room 204

TAs: Clemens Rosenbaum, Francisco Garcia

Get a class account on piazza.com

Ed lab account on elnux*.cs.umass.edu (MATLAB)

CMPSCI 689 – p. 28/35

Recommended Texts

Kevin Murphy, Machine Learning: A Probabilistic

Approach, MIT Press, 2012.

Richard Sutton and Andrew Barto, Reinforcement

Learning: An Introduction, MIT Press, 2009.

Hastie, Tibshirani, and Friedman, Statistical Learning,

Springer-Verlag (2nd edition). (available online)

David Mackay, Information Theory, Inference, and

Machine Learning (Cambridge Univ. Press). (available

online)

CMPSCI 689 – p. 29/35

Background Material

Linear algebra (e.g., Strang)

Statistics (e.g., Casella and Berger)

Optimization (e.g., Boyd and Vanderberghe (available

online))

Multivariate calculus (e.g., Lagrange multipliers)

CMPSCI 689 – p. 30/35

Many SoftwareResources

MATLAB (available on edlab machines)

Python ML packages

R and RStudio statistics package

Weka Java based ML package

Theano, Torch, Caffe, Mocha: deep learning packages

CMPSCI 689 – p. 31/35

Course Outline

September Unsupervised Learning

October Supervised Learning

November Reinforcement Learning

CMPSCI 689 – p. 32/35

Weekly Readings andCourse Project

Readings: See class web page

Final project:

Oct 19th: Preliminary project proposal

Dec 7th, 9th: Final project presentations.

CMPSCI 689 – p. 33/35

Course Grading

Section Weight

Mini projects 30%

Quizzes 30%

Final Project 30%

Independent Activities 10%

CMPSCI 689 – p. 34/35

Reading for NextWeek

Read the survey article "A Few Useful Things to Know

About Machine Learning" by Pedro Domingos (see

Moodle for paper or class web page).

Read Chapter 1 in Murphy textbook

Review basic concepts from linear algebra: matrices,

vector spaces, subspaces, eigenvalues/eigenvectors,

orthogonality.

Review basic probability/statistics: random variables,

distributions, moments (means, variances).

CMPSCI 689 – p. 35/35

machine learning - university of massachusetts amherst to ml.pdfibm jeopardy quiz program cmpsci 689...

Documents