machine learning - wichita state universitysinha/teaching/fall14/cs697ab/slide/intro.pdf · what is...

Machine Learning Kaushik Sinha Fall 2013

Administrative Stuff

Introduction

✤ Instructor: Asst. Prof. Kaushik Sinha

✤ 2 lectures per week MW 12:30-1:45 pm

✤ Office Hours MW 11:30-12:30 Jabara Hall 243

Study Groups (2-3 people)

✤ This course will cover non-trivial material, learning in a group makes it less hard and more fun!

✤ It is recommended (but not required)

Prerequisites

✤ Three pillars of ML:

✤ Statistics / Probability

✤ Linear Algebra

✤ Multivariate Calculus

✤ Should be confident in at least 1/ 3, ideally 2/ 3.

Grades ...

✤ Your grade is a composite of:

✤ (Homework)

✤ Exams (Mid-term + Final)

✤ Final Project

Homework

✤ You can d iscuss homework with your peers but your submitted answer should be your own!

✤ Make honest attempt on all questions

Exams

✤ Exams will be (to some degree) based on homework assignments

✤ Best preparation: Make sure you really really understand the homework assignments

✤ 2 Exams: Final + Midterm

✤ Will be 40% of your grade.

Final Project

✤ 40% of your grade.

✤ 4 page writeup. Joint effort of 2-3 people.

✤ Come up with your own ideas.

✤ Application: Some interesting application of machine learning

✤ In-depth study: Reproduce results from a high-acclaimed paper

✤ Research: Incorporate ML into a research project

✤ Extra cred it is given for working systems (e.g. iphone-, web app)

✤ Details will be posted on course website later

Cheating

✤ Don’t cheat!

✤ Use your common sense.

✤ I won’t be your friend anymore!

MACHINE LEARNING!!!

What is Machine Learning?

✤ Formally: (Mitchell 1997): A computer program A is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

✤ Informally: Algorithms that improve on some task with experience.

When should we use ML?

✤ Not ML problems: Traveling Salesman, 3-Sat, etc.

✤ ML Problems: Hard to formalize, but human expert can provide examples / feedback.

✤ Computer needs to learn from feedback.

✤ Is there a sign of cancer in this fMRI scan?

✤ What will the Dow Jones be tomorrow?

✤ Teach a robot to ride a unicycle.

Sometimes easy for humans, hard for computers

✤ Even 1 year old children can identify gender pretty reliably

✤ Easy to come up with examples.

✤ But impossible to formalize as a CS problem.

✤ You need machine learning!

Male or Female?

Example:

Clever

Algorithm

Problem: Given an image of a handwritten d igit, what d igit is it?

2

Input:

Output:

Problem:

You have absolutely no

idea how to do this!

Example: Problem: Given an image of a handwritten d igit, what d igit is it?

Clever

Algorithm

2

Input:

Output:

Problem:

You have absolutely no

idea how to do this!

Good news:

You have examples

0

1

2

3

4

5

6

7

8

9

Machine Learning

Algorithm

The Machine Learning Approach:


0

1

2

3

4

5

6

7

8

9

Clever

Algorithm

2

Input:

Output:

Machine Learning

Algorithm


0

1

2

3

4

5

6

7

8

9

Learned

Algorithm

2

Training Testing

Handwritten Digits Recognition

✤ (1990-1995) Pretty much solved in the mid nine-tees. (Lecun et al)

✤ Convolutional Neural Networks

✤ Now used by USPS for zip -codes, ATMs for automatic check cashing etc.

TD-Gammon (1994)

✤ Gerry Tesauro (IBM) teaches a neural network to play Backgammon. The net plays 100K+ games against itself and beats world champion [Neurocomputation 1994]

✤ Algorithm teaches itself how to play so well!!!

Deep Blue (1997)

✤ IBM’s Deep Blue wins against Kasparov in chess. Crucial winning move is made due to Machine Learning (G. Tesauro).

Watson (2011)

✤ IBM’s Watson wins the game show jeopardy against former winners Brad Rutters and Ken Jennings.

✤ Extensive Machine Learning techniques were used .

Face Detection (2001)

✤ Viola Jone’s “solves” face detection

✤ Previously very hard problem in computer vision

✤ Now commodity in off-the-shelf cellphones / cameras

Grand Challenge (2005)

✤ Darpa Grand Challenge: The vehicle must drive autonomously 150 Miles through the dessert along a d ifficult route.

✤ 2004 Darpa Grand Challenge huge d isappointment, best team makes 11.78 / 150 miles

✤ 2005 Darpa Grand Challenge 2 is completed by several ML powered teams.

Speech, Netflix, ...

✤ iPhone ships with built-in speech recognition

✤ Google mobile search speech based (very reliable)

✤ Automatic translation

✤ ....

ML is the engine for many fields...

Machine

Learning

Computer

Vision

Robotics

Computatio

nal

Biology

Natural

Language

Processing

Internet companies

✤ Collecting massive amounts of data

✤ Hoping that some smart Machine Learning person makes money out of it.

✤ Your future job!

Example: Webmail

Spam

filtering Given Email,

predict if it is

spam or not.

Ad -

matching Given user

info predict

which ad

will be

clicked on.

Example: Websearch Ad Matching

Given query, predict which ad will be

clicked on.

Web-search ranking Given query, predict which

document will be clicked

on.

Example: Google News

Document clustering Given news articles,

automatically identify and

sort them by topic.

When will it stop?

✤ The human brain is one big learning machine

✤ We know that we can still do a lot better!

✤ However, it is hard . Very few people can design new ML algorithms.

✤ But many people can use them!

What types of ML are there?

✤ supervised learning: Given labeled examples, find the right pred iction of an unlabeled example. (e.g. Given annotated images learn to detect faces.)

✤ unsupervised learning: Given data try to d iscover similar patterns, structure, low d imensional (e.g. automatically cluster news articles by topic)

As far as this course is concerned:

Basic Setup

Pre-processing

Feature Extraction

Learning

(Post-processing)

Clean up the data.

Boring but necessary.

Use expert knowledge to get

representation of data.

Focus of this course.

Whatever you do when you are done.

Feature Extraction

Feature Extraction

Real World

Represent data in terms of vectors.

Features are statistics that describe the data.

Data Vector Space

Each d imension is

one feature.

✤ Features are statistics that describe the data

✤ Feature: width/height

✤ Pretty good for “1” vs. “2”

✤ Not so good for “2” vs. “3”

16x16

256x1

✤ Feature: raw pixels

✤ Works for d igits (to some degree)

✤ Does not work for trickier stuff

Handwritten digits

Bag of Words for Images

✤ Image: Interest Points 0

1

0

0

0

3

0

0

0

0

✤ Extract interest points and represent the image as a bag of interest points.

Dictionary of possible interest points.

Sparse Vector

Text (Bag of Words)

✤ Text documents: Bag of Words 0

1

0

0

0

2

0

0

0

0

✤ Take d ictionary with n words. Represent a text document as n d imensional vector, where the i-th d imension contains the number of times word i appears in the document.

in

into

...

is

...

...

Audio? Movies?

✤ Use a slid ing window and Fast Fourier Transform

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

✤ Treat it as a sequence of images

Feature Space

✤ Everything that can be stored on a computer can stored as a vector

✤ Representation is critical for successful learning. [Not in this course, though.]

✤ Throughout this course we will assume data is just points in a Feature Space

✤ Important d istinction: sparse / dense

Every

feature is

present

Most

features

are zero

Mini-Quiz

✤ T/F: Every trad itional CS problem is also an ML problem. FALSE

✤ T/F: Image Features are always dense. FALSE

✤ T/F: The feature space can be very high d imensional. TRUE

✤ T/F: Bag of words features are sparse. TRUE

machine learning - wichita state universitysinha/teaching/fall14/cs697ab/slide/intro.pdf · what is...

Documents