semi-supervised learning and clustering using deep ...stat.snu.ac.kr/idea/seminar/20161203/ssl and...
TRANSCRIPT
Semi-supervised learning Deep clustering
Semi-supervised learning and clustering using deepstructure literature review
IDEA SeminarSpeaker : Dongha Kim
Department of Statistics, Seoul National University, South Korea
December 3, 2016
Semi-supervised learning Deep clustering
1 Semi-supervised learning
2 Deep clustering
Semi-supervised learning Deep clustering
1 Semi-supervised learning
2 Deep clustering
Semi-supervised learning Deep clustering
Notations
• Labeled data
T = {(x(1)l , y(1)), ..., (x
(n)l , y(n))}
• Unlabeled dataU = {x(1)
u , ...,x(m)u }
Semi-supervised learning Deep clustering
Pseudo-Label (D.Lee, 2013)
• Similar to self-training.1 Assign the class which has maximum predicted probability for
each unlabeled data.2 Update deep classifier using the objective function:
J =1
n
n∑i=1
l((x(i)l , y(i))) + α · 1
m
m∑j=1
l(x(j)u , y(j))
where y is predicted class of unlabeled data.
Semi-supervised learning Deep clustering
Compact representation (M.Ranzato and M.Szummer, 2008)
• Developed to classify document label.• Using semi-supervised learning layer by layer:
J = ER + α · EC
where ER and EC are terms measuring the reconstruction andclassification error respectively.
Semi-supervised learning Deep clustering
Self-taught learning (R.Raina et al., 2007)
• Not assume that the unlabeled data can be assigned to thesupervised learning task’s class labels.
Semi-supervised learning Deep clustering
Self-taught learning (R.Raina et al., 2007)
1 Learn higher-level Representations using unlabeled data.• Sparse coding using unlabeled data U :
minimizeb,a∑i
||x(i)u −
∑j
a(i)j bj ||2 + λ||a(i)||1
s.t.||bj ||2 ≤ 1, for all j
• Example
Semi-supervised learning Deep clustering
Self-taught learning (R.Raina et al., 2007)
2 Extract higher-level feature in labeled data.• Extract features of labeled data using learned bases b1, ..., bs:
a(x(i)l ) = argmin
a(i)
||x(i)l −
∑j
a(j)j bj ||22 + λ||a(i)||1
3 Learn a classifier C by applying supervised learning algorithm.
• Using modified labeled training data T :
T = {(a(x(1)l ), y(1)), ..., (a(x
(n)l ), y(n))}
Semi-supervised learning Deep clustering
Semi-supervised embedding (J.Weston et al., 2012)
Embedding Algorithms• An optimization problem: given the data, find an embeddingf(xi) of each point xi by
minimizef∑i,j
L(f(xi), f(xj),Wij)
where the matrix W of weights Wij specified the similarity ordissimilarity between examples xi and xj .
• Example : MDS(Multi-Dimensional Scaling), PCA..
Semi-supervised learning Deep clustering
Semi-supervised embedding (J.Weston et al., 2012)
(a) Add a unsupervised loss to the supervised loss on the entirenetworks’s output:
minimizen∑
i=1
l(x(i)l , y(i)) + λ
n+m∑i,j=1
L(f(xi), f(xj),Wij)
(b) Regularize the kth hidden layer directly:
minimizen∑
i=1
l(x(i)l , y(i)) + λ
n+m∑i,j=1
L(h(k)(xi),h(k)(xj),Wij)
Semi-supervised learning Deep clustering
Semi-supervised embedding (J.Weston et al., 2012)
(c) Create an auxiliary network which shares the first k layers of theoriginal network but has a new final set of weights:
g(x) = W(AUX)h(k)(x) + b(AUX)
minimizen∑
i=1
l(x(i)l , y(i)) + λ
n+m∑i,j=1
L(g(xi),g(xj),Wij)
Semi-supervised learning Deep clustering
Using generative models (D.Kingma et al., 2014)
Model based semi-supervised learning methodM1 Latent-feature discriminative model
p(z) = N (z|0, I)pθ(x|z) = f(x; z,θ)
where f is suitable likelihood function of non-linear function ofthe z, using dnn.
M2 Generative semi-supervised model
p(y) = Cat(y|π)p(z) = N (z|0, I)pθ(x|y, z) = f(x; y, z,θ)
where f is suitable likelihood function of non-linear function ofthe z and y, using dnn.
Semi-supervised learning Deep clustering
Using generative models (D.Kingma et al., 2014)
Use variational inferenceM1
qφ(z|x) = N (z|µφ(x), diag(σ2φ(x)))
where µφ(x) and σφ(x) are represented as dnns.
M2 Let assume that qφ(z, y|x) = qφ(z|x)qφ(y|x).
qφ(z|x) = N (z|µφ(x), diag(σ2φ(x)))
qφ(y|x) = Cat(y|πφ(x))
where µφ(x),σφ(x) and πφ(x) are represented as dnns.
Semi-supervised learning Deep clustering
Using generative models (D.Kingma et al., 2014)
M1
log pθ(x) ≥ Eqφ(z|x)[log pθ(x|z)
]−KL
[qφ(z|x)||pθ(z)
]= −J (x)
• Minimize the following function:
minθ,φ1
m
m∑j=1
J (x(j)u )
using gradient based method.
• Get z(i)l from qφ(z|x(i)l ) and train classifier C using
{z(i)l , y(i)}ni=1.
Semi-supervised learning Deep clustering
Using generative models (D.Kingma et al., 2014)
M2
log pθ(x, y) ≥ E[log pθ(x|y, z) + log pθ(y) + log p(z)−
log qφ(z|x, y)]= −J1(x, y)
log pθ(x) ≥ −∑y
qφ(y|x)J1(x, y) +H(qφ(y|x))
= −J2(x)
• Minimize the following function:
minθ,φ1
n
n∑i=1
J1(x(i)l , y(i)) +
1
m
m∑j=1
J2(x(j)u )
using gradient based method.• Use qφ(y|x) to predict the class of input x.
Semi-supervised learning Deep clustering
1 Semi-supervised learning
2 Deep clustering
Semi-supervised learning Deep clustering
Deterministic clustering 1. (Y.Ma et al., 2014)
• Calculate reconstruction error
J1 =1
m
m∑j=1
||H(G(x(j)u ))− x(j)
u ||2
Semi-supervised learning Deep clustering
Deterministic clustering 1. (Y.Ma et al., 2014)
• Assume that µk is the center of cluster Ck and Tk is the numberof elements in Ck.
J2 =1
K
K∑k=1
maxl
{Sk + SlMkl
|l 6= k
}
where Sk =
√1Tk
∑G(x
(j)u )∈Ck
||G(x(j)u )− µk||2 and
Mkl = ||µk − µl||.• minimize following function:
J = J1 + λ · J2
using gradient based method.
Semi-supervised learning Deep clustering
Deterministic clustering 2. (J.Xie et al., 2016)
• Given an initial estimate of the dnn mapping fθ and the initialcluster centroids {µk}Kk=1, calculate
qjk =(1 + ||zj − µk||2/α)−
α+12∑
l(1 + ||zj − µl||2/α)−α+12
where z = fθ(x).• minimize the following function∑
j
KL(Pj ||Qj) =∑j
∑k
pjk logpjkqjk
where P is the target distribution. (pjk =q2jk/fj∑l q
2jl/fl
, fk =∑
j qjk)
• Similar to self-training.
Semi-supervised learning Deep clustering
Stochastic clustering (G.Chen, 2015)
• DNN+Nonparametric mixture model
P (z, {θ}Kk=1|{h(L)}ni=1) ∝ p(z)K∏k=1
p(θk)
[n∏
i=1
p(xi|θzi)
]
→ Chinese Restaurant process Mixture• Gibbs sampling(z) + Online maximum margin learning(θ) +
fine-tuning(WL,θ)
Semi-supervised learning Deep clustering
Reference
• G.Chen. (2015). Deep Learning with Nonparametric Clustering.arXiv:1501.03084
• D.P.Kingma, D.J.Rezende, S.Mohamed and M.Welling. (2014).Semi-supervised learning with deep generative models.Advances in Neural Information Processing Systems.pp.3581-3589.
• Y.Ma, C.Shang, F.Yang and D.Huang. (2014). Latent SubspaceClustering based on Deep Neural Networks. Proceedings of thefifth ADCONIP.
• D.Lee. (2013). Pseudo-label: The simple and efficientsemi-supervised learning method for deep neural networks.Workshop on Challenges in Representation Learning, ICML.3(2).
Semi-supervised learning Deep clustering
Reference
• R.Raina, A.Battle, H.Lee, B.Packer and A.Y.Ng. (2007).Self-taught learning: transfer learning from unlabeled data.Proceedings of the 24th international conference on Machinelearning. pp.759-766.
• M.Ranzato and M.Szummer. (2008). Semi-supervised learningof compact document representations with deep networks.Proceedings of the 25th international conference on Machinelearning. pp.792-799.
• J.Weston, F.Ratle, H.Mobahi and R.Collobert. (2012). Deeplearning via semi-supervised embedding. Neural Networks:Tricks of the Trade. pp.639-655.
• J.Xie, R.Girshick and A.Farhadi. (2016). Unsupervised DeepEmbedding for Clustering Analysis. arXiv:1511.06335.
Semi-supervised learning Deep clustering