Download - Machine learning techniques in image analysis

Page 1: Machine learning techniques in image analysis

Semi-supervised learning

Learning from both labeled and unlabeled data

Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity

COMP 875 Machine learning techniques in image analysis

Page 2: Machine learning techniques in image analysis

Semi-supervised learning

Learning from both labeled and unlabeled data

Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity

Page 3: Machine learning techniques in image analysis

How can unlabeled data help?

Page 4: Machine learning techniques in image analysis

Page 5: Machine learning techniques in image analysis

Page 6: Machine learning techniques in image analysis

Example: Text classification Source: J. Zhu

Classify astronomy vs. travel articles

Similarity measured by word overlap

Page 7: Machine learning techniques in image analysis

When labeled data alone fails:

What if there are no overlapping words?

Page 8: Machine learning techniques in image analysis

Unlabeled data as stepping stones:

Labels “propagate” via similar unlabeled articles

Page 9: Machine learning techniques in image analysis

Another example Source: J. Zhu

Handwritten digits recognition with pixel-wise Euclidean distance

not similar indirectly similar with stepping stones

Page 10: Machine learning techniques in image analysis

Types of semi-supervised learning

Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .

Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).

Page 11: Machine learning techniques in image analysis

Types of semi-supervised learning

Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .

Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).

Page 12: Machine learning techniques in image analysis

Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Learn predictor f from labeled data L using supervisedlearning

2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred

labels to L

How might we select this subset?

Advantages/disadvantages of this scheme?

Page 13: Machine learning techniques in image analysis

2 Apply f to the unlabeled instances in U

3 Remove a subset from U and add that subset and its inferredlabels to L

Page 14: Machine learning techniques in image analysis

labels to L

Page 15: Machine learning techniques in image analysis

labels to L

Page 16: Machine learning techniques in image analysis

labels to L

Page 17: Machine learning techniques in image analysis

Self-training with nearest-neighbor classifier Source: J. Zhu

1 Find unlabeled point x that is closest to a labeled point x′

and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.

Page 18: Machine learning techniques in image analysis

Self-training with nearest-neighbor classifier Source: J. Zhu

1 Find unlabeled point x that is closest to a labeled point x′

and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.

Page 19: Machine learning techniques in image analysis

Propagating nearest-neighbor: Example Source: J. Zhu

(a) Iteration 1 (b) Iteration 25

(c) Iteration 74 (d) Final

Page 20: Machine learning techniques in image analysis

(a) (b)

(c) (d)

Page 21: Machine learning techniques in image analysis

(a) (b)

(c) (d)

Page 22: Machine learning techniques in image analysis

Another simple approach: Cluster-and-label Source: J. Zhu

1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that

cluster

3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster

What is the underlying assumption here?

Page 23: Machine learning techniques in image analysis

cluster

Page 24: Machine learning techniques in image analysis

cluster

Page 25: Machine learning techniques in image analysis

Cluster-and-label: Examples Source: J. Zhu

Hierarchical clustering, majority vote predictor within cluster

Page 26: Machine learning techniques in image analysis

Cluster-and-label: Examples Source: J. Zhu

Hierarchical clustering, majority vote predictor within cluster

Page 27: Machine learning techniques in image analysis

Generative models Source: J. Zhu

Labeled data (Xl, Yl):

Assuming each class has a Gaussian distribution, how do we findthe decision boundary?

Page 28: Machine learning techniques in image analysis

Labeled data (Xl, Yl):

The most likely model, and its decision boundary

Page 29: Machine learning techniques in image analysis

Labeled data (Xl, Yl) and unlabeled data Xu:

What is the most likely decision boundary now?

Page 30: Machine learning techniques in image analysis

Labeled data (Xl, Yl) and unlabeled data Xu:

What is the most likely decision boundary now?

Page 31: Machine learning techniques in image analysis

The two boundaries are different because they maximize differentquantities:

p(Xl, Yl|θ) p(Xl, Yl, Xu|θ)

Gaussian mixture model: θ are the component weights, means, andcovariances

Page 32: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ)

=∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

Page 33: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ)

=∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 34: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 35: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

ML estimate for θ:

sample means, covariances, proportions foreach of the classes

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 36: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 37: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xl, Yl, Xu|θ)

= p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 38: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 39: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

)

∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 40: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 41: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ:

use EM (Yu are hidden variables)

Page 42: Machine learning techniques in image analysis

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(Xu, Yu|θ)

=

( ∏i labeled

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

Page 43: Machine learning techniques in image analysis

The EM algorithm for Gaussian mixtures Source: J. Zhu

1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):

pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c

Repeat:

2 The E-step: compute the expected label p(y|x, θ) for all x inXu.

3 The M-step: update MLE θ with the “softly labeled” Xu.

Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.

Can also be viewed as a special case of self-training.

Page 44: Machine learning techniques in image analysis

Repeat:

Page 45: Machine learning techniques in image analysis

Repeat:

Page 46: Machine learning techniques in image analysis

Repeat:

Page 47: Machine learning techniques in image analysis

Repeat:

Page 48: Machine learning techniques in image analysis

Limitations of mixture models Source: J. Zhu

Assumption: mixture components correspond toclass-conditional distributions.

When the assumption is wrong:

Page 49: Machine learning techniques in image analysis

Discriminative approach: Semi-supervised SVMs Source: J. Zhu

Idea: try to keep labeled points outside the margin, whilemaximizing the margin.

Page 50: Machine learning techniques in image analysis

Discriminative approach: Semi-supervised SVMs Source: J. Zhu

Idea: try to keep labeled points outside the margin, whilemaximizing the margin.

Page 51: Machine learning techniques in image analysis

Review: Standard SVMs

Classification function: f(x) = wTx + w0.

Standard SVM objective function:

minw,w0

‖w‖2 + λ1

∑i

(1− yif(xi))+

Page 52: Machine learning techniques in image analysis

Semi-supervised SVMs Source: J. Zhu

Classification function: f(x) = wTx + w0.

To incorporate unlabeled points, assign to them putativelabels sgn(f(x)).

Semi-supervised SVM objective function:

minw,w0

‖w‖2+λ1

∑i labeled

(1−yif(xi))+ + λ2

∑j unlabeled

(1− |f(xj)|)+

Page 53: Machine learning techniques in image analysis

Graph-based semi-supervised learning Source: J. Zhu

Idea: construct graph where nodes are labeled and unlabeledexamples, and edges are weighted by the similarity ofexamples.Unlabeled data can help “glue” the objects of the same classtogether.Assumption: items connected by “heavy” edges are likely tohave the same label.

Page 54: Machine learning techniques in image analysis

The mincut algorithm:

Assume binary classification (class labels are 0, 1).

Approach: fix Yl, find Yu to minimize∑i∼j

wij |yi − yj |.

Combinatorial problem, but has polynomial-time solution.

Harmonic functions:

Let’s relax discrete labels to continuous values in R.

We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑

i∼j

wij(f(xi)− f(xj))2.

Page 55: Machine learning techniques in image analysis

wij |yi − yj |.

Harmonic functions:

i∼j

Page 56: Machine learning techniques in image analysis

wij |yi − yj |.

Harmonic functions:

i∼j

Page 57: Machine learning techniques in image analysis

A random walk interpretation Source: J. Zhu

Randomly walk from node i to j with probabilitywij∑k wik

.

Stop if we hit a labeled node.

The harmonic function has the following interpretation:f(xi) = P (hit label 1|start from i).

Page 58: Machine learning techniques in image analysis

The harmonic solution Source: J. Zhu

We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑

i∼j

It can be shown that f(xi) =∑

j∼i wijf(xj)∑j∼i wij

at all unlabeled

points xi.

Iterative algorithm to compute harmonic function:

Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:

f(xi) =

∑j∼i wijf(xj)∑

j∼i wij.

Page 59: Machine learning techniques in image analysis

i∼j

at all unlabeled

points xi.

f(xi) =

j∼i wij.

Page 60: Machine learning techniques in image analysis

i∼j

at all unlabeled

points xi.

f(xi) =

j∼i wij.

Page 61: Machine learning techniques in image analysis

The graph Laplacian Source: J. Zhu

Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =

∑j wij .

The graph Laplacian matrix is defined as L = D −W .

Then we can write∑i,j

wij(f(xi)− f(xj))2 = fTLf.

We want to minimize fTLf subject to constraints f(xi) = yi

on labeled data.

Solution: fu = −L−1uuLul yl, where yl are the labels for labeled

data, and

L =[Lll Llu

Lul Luu

].

Page 62: Machine learning techniques in image analysis

Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =

∑j wij .

The graph Laplacian matrix is defined as L = D −W .

Then we can write∑i,j

wij(f(xi)− f(xj))2 = fTLf.

We want to minimize fTLf subject to constraints f(xi) = yi

on labeled data.

Solution: fu = −L−1uuLul yl, where yl are the labels for labeled

data, and

L =[Lll Llu

Lul Luu

].

Page 63: Machine learning techniques in image analysis

Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:

minf

∑i labeled

c(f(xi)− yi)2 + fTLf.

Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as

minf

(f − y)TC(f − y) + fTLf

where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.

Then the solution is given by the linear system

(C + L)f = Cy.

Page 64: Machine learning techniques in image analysis

minf

∑i labeled

minf

(C + L)f = Cy.

Page 65: Machine learning techniques in image analysis

minf

∑i labeled

minf

(C + L)f = Cy.

Page 66: Machine learning techniques in image analysis

Graph spectrum Source: J. Zhu

The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n

i=1 of the Laplacian L.

Properties of the graph spectrum:

A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.

L =∑n

i=1 λiφiφTi .

Any function f on the graph can be written as a linearcombination of eigenvectors: f =

∑ni=1 aiφi.

The “smoothness” of f can be written as fTLf =∑n

i=1 a2iλi.

Page 67: Machine learning techniques in image analysis

L =∑n

i=1 λiφiφTi .

∑ni=1 aiφi.

i=1 a2iλi.

Page 68: Machine learning techniques in image analysis

L =∑n

i=1 λiφiφTi .

∑ni=1 aiφi.

i=1 a2iλi.

Page 69: Machine learning techniques in image analysis

L =∑n

i=1 λiφiφTi .

∑ni=1 aiφi.

i=1 a2iλi.

Page 70: Machine learning techniques in image analysis

L =∑n

i=1 λiφiφTi .

∑ni=1 aiφi.

i=1 a2iλi.

Page 71: Machine learning techniques in image analysis

Using the graph spectrum

Objective function

minf

∑i labeled

c(f(xi)− yi)2 + fTLf

= (f − y)TC(f − y) + fTLf.

We can restrict our solution to “smooth” functions f , i.e.,linear combinations of the first k eigenvectors associated withthe smallest eigenvalues: f =

∑ki=1 aiφi.

Now we can obtain f by solving a k × k linear system insteadof an n× n linear system.

Page 72: Machine learning techniques in image analysis

References

J. Zhu, Semi-supervised learning survey, University of Wisconsin technicalreport, 2008.http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html

J. Zhu, Semi-supervised learning tutorial, Chicago Machine Learning SummerSchool, 2009.http://pages.cs.wisc.edu/~jerryzhu/pub/sslchicago09.pdf

Download - Machine learning techniques in image analysis

Top Related