image processing - carnegie mellon university16720.courses.cs.cmu.edu/lec/convolution_lec1.pdf ·...

Image Processing

Outline

• Logistics

• Motivation

• Convolution

• Filtering

Waitlist

• We are at 103 enrolled with 158 students on wait list. This room holds 107.

• I’m getting numerous requests of the form “how likely is it that I’ll get registered?” unlikely :(

• If you are considering dropping, please do so quickly

Some final class philosophies

• Diverse background of class implies folks will find some topics will be redundant/new (e.g., EE folks might be bored by today’s signal processing)

• I think 1-way lectures are boring (and such context can easily be found elsewhere). Discussions are way more fun! I encourage you to come to class.

• I hate power-point. I’d rather write on board, but this room is not conducive for it. I still encourage you to take notes.

• If you are going to come and check e-mail / Facebook, I’d rather you drop now to make room for someone else who’d get more out of lecture.

Outline

• Logistics

• Motivation

• Convolution

• Filtering

Lecture 1 - !!!

Fei-Fei Li & Andrej Karpathy! 5"Jan"15'12'

David Marr, 1970s

David Marr, 1982

Computational perspectiveCredited with early computational approach for vision

David Marr

Low-level Mid-level High-level

Low-level vision

Finding edges, blobs, bars, etc….

Consider family of low-level image processing operations

Photoshop / Instragram filters: blur, sharpen, colorize, etc….

Are certain combinations redundant? Is there a mathematical way to characterize them?

Recall: what is a digital (grayscale) image?

Matrix of integer values

Let’s think of image as zero-padded functions

Images as height fields

F[i,j]

Characterizing image transformations

F[i,j] G[i,j]T

F[i] G[i]T

G[i] = T (F [i])

T (↵F1 + ↵F2) = ↵G1 + �G2

G[i� j] = T (F [i� j])

(Abuse of notation: [i] does not mean transformation is applied at each pixel separately)

G = T (F )

5 4 2 3 7 4 6 5 3 6 5 4 2 3 7 4 6 5 3 6

How do we characterize image processing operations ?

Properties of “nice” functional transformations

Additivity

Scaling

Shift Invariance

T (F1 + F2) = T (F1) + T (F2)

T (↵F ) = ↵T (F )

G[i� j] = T (F [i� j])

Direct consequence: LinearityT (↵F1 + ↵F2) = ↵G1 + �G2

Impulse response= 1 for i = 0 (0 othwerwise)�[i]

[also called delta function]

What does this look like for an image?

Any function can be written as linear combination of shifted and scaled impulse reponses

= ++... + + ...

Figure 1: Staircase approximation to a continuous-time signal.

Representing signals with impulses. Any signal can be expressed as a sum of scaled andshifted unit impulses. We begin with the pulse or “staircase” approximation to a continuoussignal , as illustrated in Fig. 1. Conceptually, this is trivial: for each discrete sample of theoriginal signal, we make a pulse signal. Then we add up all these pulse signals to make up theapproximate signal. Each of these pulse signals can in turn be represented as a standard pulsescaled by the appropriate value and shifted to the appropriate place. In mathematical notation:

As we let approach zero, the approximation becomes better and better, and the in the limitequals . Therefore,

Also, as , the summation approaches an integral, and the pulse approaches the unit impulse:

In other words, we can represent any signal as an infinite sum of shifted and scaled unit impulses. Adigital compact disc, for example, stores whole complex pieces of music as lots of simple numbersrepresenting very short impulses, and then the CD player adds all the impulses back together oneafter another to recreate the complex musical waveform.

This no doubt seems like a lot of trouble to go to, just to get back the same signal that weoriginally started with, but in fact, we will very shortly be able to use Eq. 1 to perform a marveloustrick.

Linear Systems

A system or transform maps an input signal into an output signal :

where denotes the transform, a function from input signals to output signals.

Systems come in a wide variety of types. One important class is known as linear systems. Tosee whether a system is linear, we need to test whether it obeys certain rules that all linear systemsobey. The two basic tests of linearity are homogeneity and additivity.

F[i] = ?

F [i] = F [0]�[i] + F [1]�[i� 1] + . . .

F [i] =X

F [u]�[i� u]

T (F [i]) =X

F [u]T (�[i� u])

G[i] =X

F [u]H[i� u] where H[i] = T (�[i]), G[i] = T (F [i])

G = F ⇤H

Convolution= ++... + + ...

Figure 1: Staircase approximation to a continuous-time signal.

Representing signals with impulses. Any signal can be expressed as a sum of scaled andshifted unit impulses. We begin with the pulse or “staircase” approximation to a continuoussignal , as illustrated in Fig. 1. Conceptually, this is trivial: for each discrete sample of theoriginal signal, we make a pulse signal. Then we add up all these pulse signals to make up theapproximate signal. Each of these pulse signals can in turn be represented as a standard pulsescaled by the appropriate value and shifted to the appropriate place. In mathematical notation:

As we let approach zero, the approximation becomes better and better, and the in the limitequals . Therefore,

Also, as , the summation approaches an integral, and the pulse approaches the unit impulse:

In other words, we can represent any signal as an infinite sum of shifted and scaled unit impulses. Adigital compact disc, for example, stores whole complex pieces of music as lots of simple numbersrepresenting very short impulses, and then the CD player adds all the impulses back together oneafter another to recreate the complex musical waveform.

This no doubt seems like a lot of trouble to go to, just to get back the same signal that weoriginally started with, but in fact, we will very shortly be able to use Eq. 1 to perform a marveloustrick.

Linear Systems

A system or transform maps an input signal into an output signal :

where denotes the transform, a function from input signals to output signals.

Systems come in a wide variety of types. One important class is known as linear systems. Tosee whether a system is linear, we need to test whether it obeys certain rules that all linear systemsobey. The two basic tests of linearity are homogeneity and additivity.

impulse response, filter, kernel

F [i] = F [0]�[i] + F [1]�[i� 1] + . . .

F [i] =X

F [u]�[i� u]

T (F [i]) =X

F [u]T (�[i� u])

G[i] =X

F [u]H[i� u] where H[i] = T (�[i]), G[i] = T (F [i])

G = F ⇤H

Example

5 4 2 3 7 4 6 5 3 61 2 3 *

Template

Deva Ramanan

January 20, 2015

G[i] = F [i] ⇤H[i] =X

F [u]H[i� u]

= H[i] ⇤ F [i] =X

H[u]F [i� u]

G[i] = F [i]⌦H[i] =X

H[u]F [i+ u]

= F [i] ⇤H[�i]

G[i, j] = F ⇤H =X

F [u, v]H[i� u, j � v]

G[i, j] = F ⇤H = H ⇤ F =X

H[u, v]F [i� u, j � v]

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = F ⇤H =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

0 1 2 3 4 5 6 7 8 90 1 2

G[0] = ? G[1] = ?

Example5 4 2 3 7 4 6 5 3 61 2 3 *

G[0] = 5x1 = 5 G[1] = 5x2+ 4x1 = 14 G[2] = 5x3 + 4x2 + 2x1 = 25 …

-3 -2 -1 0 1 2 3 4 5 6 7 8 9

Preview of 2D

Properties of convolution

Commutative

Associative

Distributive

Implies that we can efficiently implement complex operations

F ⇤H = H ⇤ F(F ⇤H) ⇤G = F ⇤ (H ⇤G)

(F ⇤G) + (H ⇤G) = (F +H) ⇤G

Powerful way to think about any image transformation that satisfies additivity, scaling, and shift-invariance

Proof: commutativityH ⇤ F =

H[u]F [i� u] =X

H[i� u0]F [u0] where u0 = i� u

F [u]H[i� u] = F ⇤H

Conceptually wacky: allows us to interchange the filter and image

SizeGiven F of length N and H of length M, what’s size of G = F * H?

>>conv(F,H,’full’) >>conv(F,H,’valid’) >>conv(F,H,’same’)

N+M-1N-M+1

A simpler approach

5 4 2 3 7 4 6 5 3 61 2 3

0 1 2 3 4 5 6 7 8 9-1 0 1

Template

Deva Ramanan

January 14, 2015

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = G ⇤H =X

H[u, v]F [i� u, j � v]

Template

Deva Ramanan

January 14, 2015

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = G ⇤H =X

H[u, v]F [i� u, j � v]

Scan original F instead of flipped version. What’s the math?

(Cross) correlation

5 4 2 3 7 4 6 5 3 61 2 3

0 1 2 3 4 5 6 7 8 9-1 0 1

Template

Deva Ramanan

January 14, 2015

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = G ⇤H =X

H[u, v]F [i� u, j � v]

Template

Deva Ramanan

January 14, 2015

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = G ⇤H =X

H[u, v]F [i� u, j � v]

Scan original F instead of flipped version. What’s the math?

F [i]⌦H[i] =u=kX

u=�k

H[u]F [i+ u]

Properties

Associativity, Commutative properties do not hold

… but correlation is easier to think about

Convolution vs correlation (1-d)

(commutative property)

(convolution)

Template

Deva Ramanan

January 20, 2015

G[i] = F [i] ⇤H[i] =X

F [u]H[i� u]

= H[i] ⇤ F [i] =X

F [i� u]H[u]

G[i] = F [i]⌦H[i] =X

F [i+ u]H[u]

= F [i] ⇤H[�i]

G[i, j] = F ⇤H =X

F [u, v]H[i� u, j � v]

G[i, j] = F ⇤H =X

H[u, v]F [i� u, j � v]

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = F ⇤H =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

(exercise for reader!)

Template

Deva Ramanan

January 20, 2015

G[i] = F [i] ⇤H[i] =X

F [u]H[i� u]

= H[i] ⇤ F [i] =X

H[u]F [i� u]

G[i] = F [i]⌦H[i] =X

H[u]F [i+ u]

= F [i] ⇤H[�i]

G[i, j] = F ⇤H =X

F [u, v]H[i� u, j � v]

G[i, j] = F ⇤H = H ⇤ F =X

H[u, v]F [i� u, j � v]

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = F ⇤H =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

Template

Deva Ramanan

January 20, 2015

G[i] = F [i] ⇤H[i] =X

F [u]H[i� u]

= H[i] ⇤ F [i] =X

H[u]F [i� u]

G[i] = F [i]⌦H[i] =X

H[u]F [i+ u]

= F [i] ⇤H[�i]

G[i, j] = F ⇤H =X

F [u, v]H[i� u, j � v]

G[i, j] = F ⇤H = H ⇤ F =X

H[u, v]F [i� u, j � v]

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = F ⇤H =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

(cross-correlation)

2D correlation

0 0 0 0 0 0 0 0 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 10 20 30 30 30 20 10

0 20 40 60 60 60 40 20

0 30 60 90 90 90 60 30

0 30 50 80 80 90 60 30

0 20 30 50 50 60 40 20

10 20 30 30 30 30 20 10

10 10 10 0 0 0 0 0

[.,.]g[.,.]f

Image filtering 1 1 1 1 1 1 1 1 1 ],[ ⋅⋅h

Credit: S. Seitz

],[],[],[,

lnkmflkhnmglk

++=∑60

Gaussian filtering

A Gaussian kernel gives less weight to pixels further from the center of the window

This kernel is an approximation of a Gaussian function:

0 0 0 0 0 0 0 0 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Slide by Steve Seitz

G[i, j] = F ⌦H =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

Convolution vs correlation (2-d)

>> conv2(H,F) >> filter2(H,F)

Convolution:

Correlation:

Template

Deva Ramanan

January 14, 2015

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = G ⇤H =X

H[u, v]F [i� u, j � v]

Template

Deva Ramanan

January 20, 2015

G[i] = F [i] ⇤H[i] =X

F [u]H[i� u]

= H[i] ⇤ F [i] =X

H[u]F [i� u]

G[i] = F [i]⌦H[i] =X

H[u]F [i+ u]

= F [i] ⇤H[�i]

G[i, j] = F ⇤H =X

F [u, v]H[i� u, j � v]

G[i, j] = F ⇤H = H ⇤ F =X

H[u, v]F [i� u, j � v]

G[i, j] = F ⌦H =X

H[u, v]F [i+ u, j + v]

G[i, j] = F ⇤H =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

convolutioncorrelation

Can we compute correlation with convolution?

Border effects

Annoying detailsWhat is the size of the output?• MATLAB: filter2(g, f, shape)

• shape = ‘full’: output size is sum of sizes of f and g• shape = ‘same’: output size is same as f• shape = ‘valid’: output size is difference of sizes of f and g

full same valid

Border paddingBorders!

From Szeliski, Computer Vision, 2010

Examples of correlationLinear filters: examples

Original

1 1 1 1 1 1 1 1 1

Blur (with a mean filter)

Source: D. Lowe

Practice with linear filters

0 0 0 0 1 0 0 0 0

Original

Source: D. Lowe

Examples of correlation

0 0 0 0 1 0 0 0 0

Original Filtered (no change)

Source: D. Lowe

0 0 0 0 0 1 0 0 0

Original

Source: D. Lowe

0 0 0 1 0 0 0 0 0

Original Shifted left By 1 pixel

Source: D. Lowe

0 0 0 1 0 0 0 0 0

Source: D. Lowe What would this look like for convolution?

0 0 0 0 0 1 0 0 0

Original

Source: D. Lowe

Examples of correlationPractice with linear filters

0 0 0 1 0 0 0 0 0

Source: D. Lowe

0 0 0 0 0 1 0 0 0

Original

Source: D. Lowe

Examples of correlationPractice with linear filters

0 0 0 1 0 0 0 0 0

Source: D. Lowe

0 0 0 1 0 0 0 0 0

Source: D. Lowe

0 0 0 0 1 0 0 0 0

Source: D. Lowe

1 2 12 4 21 2 1

What would this look like for convolution?

0 0 0 0 1 0 0 0 0

Source: D. Lowe

0 0 00 2 00 0 0

- ?0 0 00 1 00 0 0

0 0 0 0 1 0 0 0 0

Source: D. Lowe

0 0 0 0 1 0 0 0 0

Source: D. Lowe

0 0 00 1 00 0 0

-1 2 12 4 21 2 1

/16- )( +0 0 00 1 00 0 0

Unsharp filter

Sharpen filter

Gaussianscaled impulseLaplacian of Gaussian

imageblurredimage unit impulse

(identity)

ExamplesImage!rota>on!

g[m,n]

h[m,n]

f[m,n]

It is linear, but not a spatially invariant operation. There is not convolution.

Image!rota>on!

g[m,n]

h[m,n]

f[m,n]

It is linear, but not a spatially invariant operation. There is not convolution.

? ? ?? ? ?? ? ?

Can rotations be represented with a convolution? Are they linear shift-invariant (LSI) operations G[i,j] = T(F[i,j])?

Derivative filters (correlation)

⇥�1 1

⇤�11

Question: what happens as we repeatedly convolve an image F with filter H?

0 0 0 0 1 0 0 0 0

Source: D. Lowe

Aside for the probability junkies: The PDF of the sum of two random variables = convolution of their PDFs functions. Repeated convolutions => repeated sums => CLT

Gaussian

41 2 12 4 21 2 1

Gaussian filters

= 30 pixels = 1 pixel = 5 pixels = 10 pixels

Implementation

Gaussian Kernel

• Standard deviation σ: determines extent of smoothing

Source: K. Grauman

σ = 2 with 30 x 30 kernel

σ = 5 with 30 x 30 kernel

Matlab: >> G = FSPECIAL('gaussian',HSIZE,SIGMA)

41 2 12 4 21 2 1

Finite-support filters

Choosing kernel width

• The Gaussian function has infinite support, but discrete filters use finite kernels

Source: K. Grauman What should HSIZE be?

Rule-of-thumb

Set radius of filter to be 3 sigma

Useful representation: Gaussian pyramid

Figure 1: Gaussian Pyramid. Depicted are four levels of the Gaussian pyamid,levels 0 to 3 presented from left to right.

[2] P.J. Burt. Fast filter transforms for image processing. Computer Graphics

and Image Processing, 1981.

[3] P.J. Burt. Fast algorithms for estimating local image properties. Computer

Graphics and Image Processing, 1983.

[4] P.J. Burt and E.H. Adelson. The laplacian pyramid as a compact imagecode. IEEE Transactions on Communication, 31(4):532–540, April 1983.

[5] L.I. Larkin and P.J. Burt. Multi-resolution texture energy measures. InIEEE Conference on Computer Vision and Pattern Recognition, 1983.

Filter + subsample (to exploit redundancy in output)

http://persci.mit.edu/pub_pdfs/pyramid83.pdfBurt & Adelson 83

Smoothing vs edge filtersGaussian filters

How should filters behave on a flat region with value ‘v’ ?

Smoothing vs edge filtersGaussian filters

How should filters behave on a flat region with value ‘v’ ?

Output ‘v’ Output 0

H[i, j] = 1X

H[i, j] = 0

Template matching with filtersTemplate matching

Goal: find in image Main challenge: What is a

good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation

Side by Derek Hoiem

Template matching Goal: find in image Main challenge: What is a

Side by Derek Hoiem

Can we use filtering to build detectors?

H[i,j]

F[i,j]

Matching with filters Goal: find in image Method 0: filter the image with eye patch

Input Filtered Image

],[],[],[,

lnkmflkgnmhlk

++=∑

What went wrong?

f = image g = filter

Side by Derek Hoiem

Attempt 1: correlate with eye patch

Side by Derek Hoiem

G[i, j] =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

= HTFij = ||H||||Fij || cos ✓, H, Fij 2 R(2K+1)2

Matching with filters Goal: find in image Method 0: filter the image with eye patch

Input Filtered Image

],[],[],[,

lnkmflkgnmhlk

++=∑

What went wrong?

f = image g = filter

Side by Derek Hoiem

Attempt 1: correlate with eye patch

Side by Derek Hoiem

G[i, j] =kX

u=�k

v=�k

H[u, v]F [i+ u, j + v]

= HTFij = ||H||||Fij || cos ✓, H, Fij 2 R(2K+1)2

Useful to think about correlation and convolution

Attempt 1.5: correlate with transformed eye patchTemplate matching

Goal: find in image Main challenge: What is a

Side by Derek Hoiem

Let’s transform filter such that response on a flat region is 0

Matching with filters

Goal: find in image Method 1: filter the image with zero-mean eye

Input Filtered Image (scaled) Thresholded Image

)],[()],[(],[,

lnkmgflkfnmhlk

++−=∑

True detections

False detections

mean of f

Attempt 1.5: correlate with zero-mean eye patch

G[i, j] =kX

u=�k

v=�k

(H[u, v]� H̄)F [i+ u, j + v]

u=�k

v=�k

H[u, v]F [i+ u, j + v]� H̄kX

u=�k

v=�k

F [i+ u, j + v]

Attempt 2: SSDMatching with filters

Goal: find in image Method 2: SSD

Input 1- sqrt(SSD) Thresholded Image

)],[],[(],[ lnkmflkgnmhlk

++−=∑

True detections

Can this be implemented with filtering?

SSD[i, j] = ||H � Fij ||2

= (H � Fij)T (H � Fij)

Matching with filters Goal: find in image Method 2: SSD

Input 1- sqrt(SSD)

)],[],[(],[ lnkmflkgnmhlk

++−=∑

What’s the potential downside of SSD?

Side by Derek Hoiem

What will SSD find here?

(where eyes have been darkened by .5 scale factor)

SSD will fire on shirt

Normalized cross correlation

Matching with filters

Goal: find in image Method 3: Normalized cross-correlation

Input Normalized X-Correlation Thresholded Image

True detections Template matching Goal: find in image Main challenge: What is a

Side by Derek Hoiem

where H, F are mean-centered

NCC[i, j] =HTFij

||H||||Fij ||

HTFijpHTH

qFTijFij

= cos ✓

8 SERGE BELONGIE, CS 6670: COMPUTER VISION

image that returns the median value. Another example of a nonlinear filteris “Non-Local Means,” which we describe next.

In Non-Local Means, for every pixel p we look for patches elsewhere inthe image that look similar to the patch surrounding p. We then averagethis set of patches to determine the filtered value of p.

One nice feature of NL-Means is that it is “edge preserving,” while othermethods of smoothing/de-noising can result in blurry edges.

8.5. Looking ahead: modern applications of filter banks

The above approaches to filtering were largely hand designed. This is partlydue to limitations in computing power and lack of access to large datasets inthe 80s and 90s. In modern approaches to image recognition the convolutionkernels/filtering operations are often learned from huge amounts of trainingdata.

In 1998 Yann LeCun created a Convolutional Network (named “LeNet”)that could recognize hand-written digits using a sequence of filtering op-erations, subsampling and assorted nonlinearities the parameters of whichwere learned via stochastic gradient descent on a large,labeled training set.Rather than hand selecting the filters to use, part of LeNet’s training was topick for itself the most e↵ective set of filters. Modern ConvNets use basicallythe same structure as LeNet but because of richer training sets and greatercomputing power we can recognize far more complex objects than handwrit-ten digits (see, for example, GoogLeNet in 2014 and other submissions toImageNet Large-Scale Visual Recognition Challenge).

Modern filter banksLearn filters from training data to look for low, mid, and high-level features

Convolutional Neural Nets (CNNs) Lecun et al 98

• Any linear shift-invariant operation can be characterized by a convolution • (Convolution) correlation intuitively corresponds to (flipped) matched-filters • Derive filters by continuous operations (derivative, Gaussian, …) • Contemporary application: convolutional neural networks

A look back

image processing - carnegie mellon university16720.courses.cs.cmu.edu/lec/convolution_lec1.pdf ·...

Documents

lecture 25: spark - carnegie mellon...

introduction to geometry - computer...

lecture 10: snooping-based cache...

carnegie mellon...

lecture 24: the future of high- performance...

lecture 3: parallel programming...

lecture 3: parallel programming...

introduction to...

lecture 18: interconnection...

lecture 15 - advanced correlation...

opengl tutorial - carnegie mellon...

cmu scs today s class carnegie mellon univ. dept. of...

variance reduction - carnegie mellon...

lecture 6: ( how to be l33t ) performance optimization...

crash recovery - part 1 - cmu...

bag of words - carnegie mellon...

math (p)review part ii - carnegie mellon...

image formation lec9 - carnegie mellon...

lecture 10 - interest point detectors &...

lecture 8: parallel programming case...