lecture 5: statistical methods for classification cap 5415: computer vision fall 2006

34
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Upload: buck-chase

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Motivating Problem Which pixels in this image are “skin pixels”? Useful for tracking, finding people, finding images with too much skin.

TRANSCRIPT

Page 1: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Lecture 5: Statistical Methods for Classification

CAP 5415: Computer VisionFall 2006

Page 2: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Classifiers: The Swiss Army Tool of Vision

A HUGE number of vision problems can be reduced to: Is this a _____ or not?

The next two lectures will focus on making that decision

Classifiers that we will cover Bayesian classification Logistic regression Boosting Support Vector Machines Nearest-Neighbor Classifiers

Page 3: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Motivating Problem Which pixels in this image are “skin pixels”?

Useful for tracking, finding people, finding images with too much skin.

Page 4: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

How could you find skin pixels?

Step 1: Get Data

Label every pixel as skin or not skin

Page 5: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Getting Probabilities

Now that I have a bunch of examples, I can create probability distributions. P([r,g,b]|skin) = Probability of an [r,g,b] tuple given

that the pixel is skin P([r,g,b]|~skin) = Probability of an [r,g,b] tuple given

that the pixel is not skin

Page 6: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

(From Jones and Rehg)

Page 7: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Using Bayes Rule

x – the observation y – some underlying cause (skin/not skin)

Page 8: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Using Bayes Rule

PriorLikelihood

Normalizing Constant

Page 9: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Classification

In this case P[skin|x] = 1-P[~skin|x] So the classifier reduces to

P[skin|x] > 0.5? We can change this to

P[skin|x] > c And vary c

Page 10: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

The effect of varying c This is called a Receiver Operating Curve (or

ROC

From Jones and Rehg

Page 11: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Application: Finding Adult Pictures

Let's say you needed to build a web filter for a library

Could look at a few simple measurements based on the skin model

Page 12: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Example of Misclassified Image

Page 13: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Example of Correctly Classified Image

Page 14: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

ROC Curve

Page 15: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Generative versus Discriminative Models

The classifier that I have just described is known as a generative model

Once you know all of the probabilities, you can generate new samples of the data

May be too much work You could also optimize a function to just

discriminate skin and not skin

Page 16: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Discriminative Classification using Logistic Regression

Imagine we had two measurements and we plotted each sample on a 2D chart

Page 17: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Discriminative Classification using Logistic Regression

Imagine we had two measurements and we plotted each sample on a 2D chart

To separate the two groups, we'll project each point onto a line

Some points will be projected to positive values and some will be projected to negative values

Page 18: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Discriminative Classification using Logistic Regression

This line defines a separating line Each point is classified based on where it falls

on the line

Page 19: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

How do we get the line?

Common Option: Logistic Regression Logistic Function:

Page 20: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

The logistic function Notice that g(x) goes from 0 to 1 We can use this to estimate the probability something being an

x or an o We need to find a function that will have large positive values

for x's And large negative values for o's

Page 21: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Fitting the Line

Remember, we want a line. For the diagram below, x = +1, o = -1 y = label of point (-1 or +1)

Page 22: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Fitting the line

The logistic function gives us an estimate of the probability of an example being either +1 or -1

We can fit the line by maximizing the conditional probability of the correct labeling of the training set

Also called features

Page 23: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Fitting the Line

We have multiple samples that we assume are independent, so the probability of the whole training set is

Page 24: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Fitting the line

It is usually easier to optimize the log conditional probability

Page 25: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Optimizing

Lots of options Easiest option: Gradient ascent

: The Learning Rate parameter, many ways to choose this

Page 26: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Choosing

My (current) personal favorite method Choose some value for Update w, Compute new probability If the new probability does not rise, divide

by 2 Otherwise multiply it by 1.1 (or something

similar) Called “Bold-Driver” heuristic

Page 27: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Faster Option

Computing the gradient requires summing over every training example

Could be slow for a large training set Speed-up: Stochastic Gradient Ascent Instead of computing the gradient over the

whole training set, instead choose one point at random.

Do update based on that one point

Page 28: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Limitations

Remember, we are only separating the two classes with a line

Separate this data with a line:

This is a fundamental problem, most things can't be separated by a line

Page 29: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Overcoming these limitations

Two options: Train on a more complicated function

Quadratic Cubic

Make a new set of features:

Page 30: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Advantages

We achieve non-linear classification by doing linear classification on non-linear transformations of the features

Only have to rewrite feature generation code Learning code stays the same

Page 31: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Nearest Neighbor Classifier

Is the “?” an x or an o?

?

Page 32: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Nearest Neighbor Classifier

Is the “?” an x or an o?

?

Page 33: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Nearest Neighbor Classifier

Is the “?” an x or an o?

?

Page 34: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Basic idea

For your new example, find the k nearest neighbors in the training set

Each neighbor casts a vote Label with the most votes wins Disadvantages:

Have to find the nearest neighbors Can be slow for a large training set Good approximate methods available (LSH - Indyk)