![Page 1: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/1.jpg)
Semi-supervised learning
Learning from both labeled and unlabeled data
Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity
COMP 875 Machine learning techniques in image analysis
![Page 2: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/2.jpg)
Semi-supervised learning
Learning from both labeled and unlabeled data
Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity
COMP 875 Machine learning techniques in image analysis
![Page 3: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/3.jpg)
How can unlabeled data help?
COMP 875 Machine learning techniques in image analysis
![Page 4: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/4.jpg)
How can unlabeled data help?
COMP 875 Machine learning techniques in image analysis
![Page 5: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/5.jpg)
How can unlabeled data help?
COMP 875 Machine learning techniques in image analysis
![Page 6: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/6.jpg)
Example: Text classification Source: J. Zhu
Classify astronomy vs. travel articles
Similarity measured by word overlap
COMP 875 Machine learning techniques in image analysis
![Page 7: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/7.jpg)
Example: Text classification Source: J. Zhu
When labeled data alone fails:
What if there are no overlapping words?
COMP 875 Machine learning techniques in image analysis
![Page 8: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/8.jpg)
Example: Text classification Source: J. Zhu
Unlabeled data as stepping stones:
Labels “propagate” via similar unlabeled articles
COMP 875 Machine learning techniques in image analysis
![Page 9: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/9.jpg)
Another example Source: J. Zhu
Handwritten digits recognition with pixel-wise Euclidean distance
not similar indirectly similar with stepping stones
COMP 875 Machine learning techniques in image analysis
![Page 10: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/10.jpg)
Types of semi-supervised learning
Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .
Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).
COMP 875 Machine learning techniques in image analysis
![Page 11: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/11.jpg)
Types of semi-supervised learning
Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .
Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).
COMP 875 Machine learning techniques in image analysis
![Page 12: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/12.jpg)
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
![Page 13: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/13.jpg)
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U
3 Remove a subset from U and add that subset and its inferredlabels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
![Page 14: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/14.jpg)
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
![Page 15: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/15.jpg)
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
![Page 16: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/16.jpg)
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
![Page 17: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/17.jpg)
Self-training with nearest-neighbor classifier Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Find unlabeled point x that is closest to a labeled point x′
and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.
COMP 875 Machine learning techniques in image analysis
![Page 18: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/18.jpg)
Self-training with nearest-neighbor classifier Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Find unlabeled point x that is closest to a labeled point x′
and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.
COMP 875 Machine learning techniques in image analysis
![Page 19: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/19.jpg)
Propagating nearest-neighbor: Example Source: J. Zhu
(a) Iteration 1 (b) Iteration 25
(c) Iteration 74 (d) Final
COMP 875 Machine learning techniques in image analysis
![Page 20: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/20.jpg)
Another example Source: J. Zhu
(a) (b)
(c) (d)
COMP 875 Machine learning techniques in image analysis
![Page 21: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/21.jpg)
Another example Source: J. Zhu
(a) (b)
(c) (d)
COMP 875 Machine learning techniques in image analysis
![Page 22: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/22.jpg)
Another simple approach: Cluster-and-label Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that
cluster
3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster
What is the underlying assumption here?
COMP 875 Machine learning techniques in image analysis
![Page 23: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/23.jpg)
Another simple approach: Cluster-and-label Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that
cluster
3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster
What is the underlying assumption here?
COMP 875 Machine learning techniques in image analysis
![Page 24: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/24.jpg)
Another simple approach: Cluster-and-label Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that
cluster
3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster
What is the underlying assumption here?
COMP 875 Machine learning techniques in image analysis
![Page 25: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/25.jpg)
Cluster-and-label: Examples Source: J. Zhu
Hierarchical clustering, majority vote predictor within cluster
COMP 875 Machine learning techniques in image analysis
![Page 26: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/26.jpg)
Cluster-and-label: Examples Source: J. Zhu
Hierarchical clustering, majority vote predictor within cluster
COMP 875 Machine learning techniques in image analysis
![Page 27: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/27.jpg)
Generative models Source: J. Zhu
Labeled data (Xl, Yl):
Assuming each class has a Gaussian distribution, how do we findthe decision boundary?
COMP 875 Machine learning techniques in image analysis
![Page 28: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/28.jpg)
Generative models Source: J. Zhu
Labeled data (Xl, Yl):
The most likely model, and its decision boundary
COMP 875 Machine learning techniques in image analysis
![Page 29: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/29.jpg)
Generative models Source: J. Zhu
Labeled data (Xl, Yl) and unlabeled data Xu:
What is the most likely decision boundary now?
COMP 875 Machine learning techniques in image analysis
![Page 30: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/30.jpg)
Generative models Source: J. Zhu
Labeled data (Xl, Yl) and unlabeled data Xu:
What is the most likely decision boundary now?
COMP 875 Machine learning techniques in image analysis
![Page 31: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/31.jpg)
Generative models Source: J. Zhu
The two boundaries are different because they maximize differentquantities:
p(Xl, Yl|θ) p(Xl, Yl, Xu|θ)
Gaussian mixture model: θ are the component weights, means, andcovariances
COMP 875 Machine learning techniques in image analysis
![Page 32: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/32.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ)
=∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 33: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/33.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ)
=∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 34: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/34.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 35: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/35.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ:
sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 36: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/36.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 37: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/37.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ)
= p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 38: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/38.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 39: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/39.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
)
∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 40: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/40.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 41: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/41.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ:
use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 42: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/42.jpg)
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
![Page 43: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/43.jpg)
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
![Page 44: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/44.jpg)
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
![Page 45: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/45.jpg)
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
![Page 46: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/46.jpg)
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
![Page 47: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/47.jpg)
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
![Page 48: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/48.jpg)
Limitations of mixture models Source: J. Zhu
Assumption: mixture components correspond toclass-conditional distributions.
When the assumption is wrong:
COMP 875 Machine learning techniques in image analysis
![Page 49: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/49.jpg)
Discriminative approach: Semi-supervised SVMs Source: J. Zhu
Idea: try to keep labeled points outside the margin, whilemaximizing the margin.
COMP 875 Machine learning techniques in image analysis
![Page 50: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/50.jpg)
Discriminative approach: Semi-supervised SVMs Source: J. Zhu
Idea: try to keep labeled points outside the margin, whilemaximizing the margin.
COMP 875 Machine learning techniques in image analysis
![Page 51: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/51.jpg)
Review: Standard SVMs
Classification function: f(x) = wTx + w0.
Standard SVM objective function:
minw,w0
‖w‖2 + λ1
∑i
(1− yif(xi))+
COMP 875 Machine learning techniques in image analysis
![Page 52: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/52.jpg)
Semi-supervised SVMs Source: J. Zhu
Classification function: f(x) = wTx + w0.
To incorporate unlabeled points, assign to them putativelabels sgn(f(x)).
Semi-supervised SVM objective function:
minw,w0
‖w‖2+λ1
∑i labeled
(1−yif(xi))+ + λ2
∑j unlabeled
(1− |f(xj)|)+
COMP 875 Machine learning techniques in image analysis
![Page 53: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/53.jpg)
Graph-based semi-supervised learning Source: J. Zhu
Idea: construct graph where nodes are labeled and unlabeledexamples, and edges are weighted by the similarity ofexamples.Unlabeled data can help “glue” the objects of the same classtogether.Assumption: items connected by “heavy” edges are likely tohave the same label.
COMP 875 Machine learning techniques in image analysis
![Page 54: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/54.jpg)
Graph-based semi-supervised learning Source: J. Zhu
The mincut algorithm:
Assume binary classification (class labels are 0, 1).
Approach: fix Yl, find Yu to minimize∑i∼j
wij |yi − yj |.
Combinatorial problem, but has polynomial-time solution.
Harmonic functions:
Let’s relax discrete labels to continuous values in R.
We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
COMP 875 Machine learning techniques in image analysis
![Page 55: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/55.jpg)
Graph-based semi-supervised learning Source: J. Zhu
The mincut algorithm:
Assume binary classification (class labels are 0, 1).
Approach: fix Yl, find Yu to minimize∑i∼j
wij |yi − yj |.
Combinatorial problem, but has polynomial-time solution.
Harmonic functions:
Let’s relax discrete labels to continuous values in R.
We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
COMP 875 Machine learning techniques in image analysis
![Page 56: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/56.jpg)
Graph-based semi-supervised learning Source: J. Zhu
The mincut algorithm:
Assume binary classification (class labels are 0, 1).
Approach: fix Yl, find Yu to minimize∑i∼j
wij |yi − yj |.
Combinatorial problem, but has polynomial-time solution.
Harmonic functions:
Let’s relax discrete labels to continuous values in R.
We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
COMP 875 Machine learning techniques in image analysis
![Page 57: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/57.jpg)
A random walk interpretation Source: J. Zhu
Randomly walk from node i to j with probabilitywij∑k wik
.
Stop if we hit a labeled node.
The harmonic function has the following interpretation:f(xi) = P (hit label 1|start from i).
COMP 875 Machine learning techniques in image analysis
![Page 58: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/58.jpg)
The harmonic solution Source: J. Zhu
We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
It can be shown that f(xi) =∑
j∼i wijf(xj)∑j∼i wij
at all unlabeled
points xi.
Iterative algorithm to compute harmonic function:
Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:
f(xi) =
∑j∼i wijf(xj)∑
j∼i wij.
COMP 875 Machine learning techniques in image analysis
![Page 59: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/59.jpg)
The harmonic solution Source: J. Zhu
We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
It can be shown that f(xi) =∑
j∼i wijf(xj)∑j∼i wij
at all unlabeled
points xi.
Iterative algorithm to compute harmonic function:
Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:
f(xi) =
∑j∼i wijf(xj)∑
j∼i wij.
COMP 875 Machine learning techniques in image analysis
![Page 60: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/60.jpg)
The harmonic solution Source: J. Zhu
We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
It can be shown that f(xi) =∑
j∼i wijf(xj)∑j∼i wij
at all unlabeled
points xi.
Iterative algorithm to compute harmonic function:
Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:
f(xi) =
∑j∼i wijf(xj)∑
j∼i wij.
COMP 875 Machine learning techniques in image analysis
![Page 61: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/61.jpg)
The graph Laplacian Source: J. Zhu
Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =
∑j wij .
The graph Laplacian matrix is defined as L = D −W .
Then we can write∑i,j
wij(f(xi)− f(xj))2 = fTLf.
We want to minimize fTLf subject to constraints f(xi) = yi
on labeled data.
Solution: fu = −L−1uuLul yl, where yl are the labels for labeled
data, and
L =[Lll Llu
Lul Luu
].
COMP 875 Machine learning techniques in image analysis
![Page 62: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/62.jpg)
The graph Laplacian Source: J. Zhu
Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =
∑j wij .
The graph Laplacian matrix is defined as L = D −W .
Then we can write∑i,j
wij(f(xi)− f(xj))2 = fTLf.
We want to minimize fTLf subject to constraints f(xi) = yi
on labeled data.
Solution: fu = −L−1uuLul yl, where yl are the labels for labeled
data, and
L =[Lll Llu
Lul Luu
].
COMP 875 Machine learning techniques in image analysis
![Page 63: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/63.jpg)
The graph Laplacian Source: J. Zhu
Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:
minf
∑i labeled
c(f(xi)− yi)2 + fTLf.
Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as
minf
(f − y)TC(f − y) + fTLf
where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.
Then the solution is given by the linear system
(C + L)f = Cy.
COMP 875 Machine learning techniques in image analysis
![Page 64: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/64.jpg)
The graph Laplacian Source: J. Zhu
Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:
minf
∑i labeled
c(f(xi)− yi)2 + fTLf.
Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as
minf
(f − y)TC(f − y) + fTLf
where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.
Then the solution is given by the linear system
(C + L)f = Cy.
COMP 875 Machine learning techniques in image analysis
![Page 65: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/65.jpg)
The graph Laplacian Source: J. Zhu
Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:
minf
∑i labeled
c(f(xi)− yi)2 + fTLf.
Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as
minf
(f − y)TC(f − y) + fTLf
where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.
Then the solution is given by the linear system
(C + L)f = Cy.
COMP 875 Machine learning techniques in image analysis
![Page 66: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/66.jpg)
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
![Page 67: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/67.jpg)
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
![Page 68: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/68.jpg)
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
![Page 69: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/69.jpg)
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
![Page 70: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/70.jpg)
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
![Page 71: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/71.jpg)
Using the graph spectrum
Objective function
minf
∑i labeled
c(f(xi)− yi)2 + fTLf
= (f − y)TC(f − y) + fTLf.
We can restrict our solution to “smooth” functions f , i.e.,linear combinations of the first k eigenvectors associated withthe smallest eigenvalues: f =
∑ki=1 aiφi.
Now we can obtain f by solving a k × k linear system insteadof an n× n linear system.
COMP 875 Machine learning techniques in image analysis
![Page 72: Machine learning techniques in image analysis](https://reader031.vdocuments.net/reader031/viewer/2022030322/5895ad421a28ab43538bd5e2/html5/thumbnails/72.jpg)
References
J. Zhu, Semi-supervised learning survey, University of Wisconsin technicalreport, 2008.http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html
J. Zhu, Semi-supervised learning tutorial, Chicago Machine Learning SummerSchool, 2009.http://pages.cs.wisc.edu/~jerryzhu/pub/sslchicago09.pdf
COMP 875 Machine learning techniques in image analysis