machine learning and the big data challenge max welling uc irvine 1

42
Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

Upload: sophie-hines

Post on 16-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

1

Machine Learning and the Big Data Challenge

Max WellingUC Irvine

Page 2: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

2

AI’s Promise 60 years ago

• Robots that behave and think like humans• Marvin Minsky: Computer Vision will be easy, chess will be hard

Page 3: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

3

What we got

• Deep Blue beat Kasparov in the game of chess […]• Watson won Jeopardy

Page 4: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

4

What is hard for AI?

• Computer Vision and Scene Understanding […]

Page 5: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

5

But we are making Progress

Page 6: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

6

Another Example: Machine Translation

• Tremendous progress has been made

• Main reason: more and better data (e.g. documents from EU)

Page 7: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

7

Language Processing

• Google’s spelling and query correction

• Main reason for progress: Google’s massive datasets

Page 8: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

8

Ingredients of Progress

Better AI Systems

Better Models

More Data

Faster Computatio

n

Page 9: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

9

Computation: Moore’s Law

• Computational power is doubling every two years (approximately).

Page 10: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

10

Trends in Computing: Cloud Computing

• Computing will become similar to electricity: take it as you need it.

• We need global wifi coverage to make this work well.

Page 11: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

11

Trends in Computing: Distributed Computing (e.g. GPUs)

• Cheap and massively parallel computing (up to 300 processing units)• First developed for the gaming community• Now adopted by machine learning for very fast learning (3d ReNNaicance)

Page 12: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

Big Data

That’s 38 images every second if you live for 100 years(that’s more visual data than anyone will actually see …)

• Current data-volume ~ 2 Zettabyte (2 trillion GB), and doubling every 1.5 years.

• Data volume has it’s own Moore’s law!

12

Page 13: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

13

Big Data: The McKinsey 2011 Report

Page 14: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

14

Sensors Everywhere• Internet.

• There are around 1.85 million surveillance cameras in the UK alone.• That is 1 camera for every 32 people !• On a typical day every person will be recorded on around 70 CCTV cameras

• There are about 5.6 billion cellphone users worldwide

Page 15: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

15

Machine Learning

• Algorithms that learn to make predictions from examples (data)

Page 16: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

16

Generalization• Consider the following regression problem:• Predict the real value on the y-axis from the real value on the x-axis.• You are given 6 examples: {Xi,Yi}.• What is the y-value for a new query point X* ?

X*

Page 17: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

17

Generalization

Page 18: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

18

Generalization

Page 19: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

19

Generalization

which curve is best?

Page 20: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

20

• Ockham’s razor: prefer the simplest hypothesis consistent with data.

Generalization

Page 21: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

21

Generalization

Learning is concerned with accurate predictionof future data, not accurate prediction of training data.

Page 22: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

22

Learning as Compression

• Imagine a game where Bob needs to send a dataset to Alice.

• They are allowed to meet once before they see the data.

• The agree on a precision level (quantization level).

• Bob learns a model (red line).

• Bob sends the model parameters (offset and slant) only once

• For every datapoint, Bob sends -distance along line (large number) -orthogonal distance from line (small number) (small numbers are cheaper to encode than large numbers)

Page 23: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

23

Generalization

learning = compression = abstraction

• The man who couldn’t forget …

Page 24: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

24

Types of Learning

• Supervised Learning• Labels are provided, there is a strong learning signal.• e.g. classification, regression.

• Semi-supervised Learning.

• Only part of the data have labels. • e.g. a child growing up.

• Reinforcement learning.• The learning signal is a (scalar) reward and may come with a delay.• e.g. trying to learn to play chess, a mouse in a maze.

• Unsupervised learning• There is no direct learning signal. We are simply trying to find structure in data.• e.g. clustering, dimensionality reduction.

Page 25: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

25

Classification: nearest neighbor

Example: Imagine you want to classify versus

Data: 100 monkey images and 200 human images with labels what is what.

Task: Here is a new image: monkey or human?

Page 26: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

26

1 nearest neighbor

Idea: 1. Find the picture in the database which is closest your query image.

2. Check its label.

3. Declare the class of your query image to be the same as that of the closest picture.

query closest image

Page 27: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

27

kNN Decision Surface

decision curve

Page 28: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

28

Unsupervised Learning: Dimensionality Reduction

(LLE – Roweis & Saul)

Page 29: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

29

Collaborative Filteringm

ovie

s (+

/- 1

7,77

0)

users (+/- 240,000)

total of +/- 400,000,000 nonzero entries(99% sparse)

4

(Netflix Dataset)

4

? 1

1?

Page 30: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

30

Bayes Rule(s)

Riddle: Joe goes to the doctor and tells the doctor he has a stiff neck and a rash.The doctor is worried about meningitis and performs a test that is 80% correct, that is,for 80% of the people that have meningitis it will turn out positive. If 1 in 100,000 peoplehave meningitis in the population and 1 in 1000 people will test positive (sick or not sick)what is the probability that Joe has meningitis?

Answer: Bayes Rule.

P(meningitis | positive test) = P(positive test | meningitis ) P(meningitis) / P(positive test) = 0.8 * 0.00001 / 0.001 = 0.008 < 1%

Page 31: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

31

Bayesian Networks & Graphical Models• Main modeling tool for modern machine learning• Reasoning over large collections of random variables with intricate relations

testresult

meningitis

stiff-neck, rash

Page 32: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

32

Nonparametric Bayes• Assumption: Real world data is infinitely complex.• Consequence: as the dataset grows, so should the model complexity.• Nonparametric Bayesian models do exactly that.

Hierarchical clusteringOf 10000 birds

Page 33: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

33

Trends in ML: Human Computation

• Old paradigm: Computers assist humans

• New paradigm: Humans assist Computers to learn (The raising of the machines)

(Luis Von Ahn)

Page 34: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

34

HC I: Useful Games

EPS game to label images

“LabelMe” to segment & label images

Page 35: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

35

HC II: Crowd-sourced Marketplaces

• Split a problem into many small and simple problems and sell them on a crowd-sourced marketplace such as Amazon’s “Mechanical Turk.

Page 36: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

36

MT Example

Page 37: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

37

HC III: Online Competitions• Netflix organized an online

competition to improve their movie recommender system

• Prize money: 1 million dollars if 10% improvement was achieved.

• It lasted 3 years, at least 20,000 teams registered from 150 countries.

• Kaggle has turned this into a business and hosts numerous competitions.

• Latest: Heritage Healthcare Competition at $3M!

Page 38: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

38

What won the Netflix Prize?

• Ensemble learning: learn many models (e.g. 200) and average their predictions.

• Algorithmic equivalent of “Wisdom of the Crowds”

Page 39: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

39

Wisdom of the Crowds

• Estimate the weight of the Space Shuttle (in tons)

• Take mean or median of answers.

• Does surprisingly well.

• Time for experiment?

• Mechanism: canceling of independent errors

2030 tonsAnswer:

Page 40: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

40

Prediction Markets

• “Idea Futures”

• Use magnitude of the bet to express confidence.

Page 41: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

41

AI Assisted Learning

• Stanford is offering 15 courses online to +/- 100,000 students. Involved homework and exams and a “certificate of achievement”.

• Flipping the classroom: watch lecture video at home, do homework in class.

• AI can find right set of exercises/hints or cyber-partner for each individual student and track progress

Page 42: Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

42

Outlook

• Volume/diversity of data and computing power is growing exponentially.

• Proliferation of sensors, internet and “human computing” allow for AI systems that are very different from human intelligence.

• Future AI’s will:• sense your location, intention, mood, needs.• Anticipate your next action (order nonfat cappuccino from Starbuck at 9am).• Monitor your health.• Monitor environment.

(now: Google glasses)(past vision) (a virtual, connected world)