semi-supervised learning and clustering using deep ...stat.snu.ac.kr/idea/seminar/20161203/ssl and...

Semi-supervised learning Deep clustering

Semi-supervised learning and clustering using deepstructure literature review

IDEA SeminarSpeaker : Dongha Kim

Department of Statistics, Seoul National University, South Korea

December 3, 2016


1 Semi-supervised learning

2 Deep clustering


Notations

• Labeled data

T = {(x(1)l , y(1)), ..., (x

(n)l , y(n))}

• Unlabeled dataU = {x(1)

u , ...,x(m)u }


Pseudo-Label (D.Lee, 2013)

• Similar to self-training.1 Assign the class which has maximum predicted probability for

each unlabeled data.2 Update deep classifier using the objective function:

J =1

n

n∑i=1

l((x(i)l , y(i))) + α · 1

m

m∑j=1

l(x(j)u , y(j))

where y is predicted class of unlabeled data.


Compact representation (M.Ranzato and M.Szummer, 2008)

• Developed to classify document label.• Using semi-supervised learning layer by layer:

J = ER + α · EC

where ER and EC are terms measuring the reconstruction andclassification error respectively.


Self-taught learning (R.Raina et al., 2007)

• Not assume that the unlabeled data can be assigned to thesupervised learning task’s class labels.



1 Learn higher-level Representations using unlabeled data.• Sparse coding using unlabeled data U :

minimizeb,a∑i

||x(i)u −

∑j

a(i)j bj ||2 + λ||a(i)||1

s.t.||bj ||2 ≤ 1, for all j

• Example



2 Extract higher-level feature in labeled data.• Extract features of labeled data using learned bases b1, ..., bs:

a(x(i)l ) = argmin

a(i)

||x(i)l −

∑j

a(j)j bj ||22 + λ||a(i)||1

3 Learn a classifier C by applying supervised learning algorithm.

• Using modified labeled training data T :

T = {(a(x(1)l ), y(1)), ..., (a(x

(n)l ), y(n))}


Semi-supervised embedding (J.Weston et al., 2012)

Embedding Algorithms• An optimization problem: given the data, find an embeddingf(xi) of each point xi by

minimizef∑i,j

L(f(xi), f(xj),Wij)

where the matrix W of weights Wij specified the similarity ordissimilarity between examples xi and xj .

• Example : MDS(Multi-Dimensional Scaling), PCA..



(a) Add a unsupervised loss to the supervised loss on the entirenetworks’s output:

minimizen∑

i=1

l(x(i)l , y(i)) + λ

n+m∑i,j=1

L(f(xi), f(xj),Wij)

(b) Regularize the kth hidden layer directly:

minimizen∑

i=1

l(x(i)l , y(i)) + λ

n+m∑i,j=1

L(h(k)(xi),h(k)(xj),Wij)



(c) Create an auxiliary network which shares the first k layers of theoriginal network but has a new final set of weights:

g(x) = W(AUX)h(k)(x) + b(AUX)

minimizen∑

i=1

l(x(i)l , y(i)) + λ

n+m∑i,j=1

L(g(xi),g(xj),Wij)


Using generative models (D.Kingma et al., 2014)

Model based semi-supervised learning methodM1 Latent-feature discriminative model

p(z) = N (z|0, I)pθ(x|z) = f(x; z,θ)

where f is suitable likelihood function of non-linear function ofthe z, using dnn.

M2 Generative semi-supervised model

p(y) = Cat(y|π)p(z) = N (z|0, I)pθ(x|y, z) = f(x; y, z,θ)

where f is suitable likelihood function of non-linear function ofthe z and y, using dnn.



M1

log pθ(x) ≥ Eqφ(z|x)[log pθ(x|z)

]−KL

[qφ(z|x)||pθ(z)

]= −J (x)

• Minimize the following function:

minθ,φ1

m

m∑j=1

J (x(j)u )

using gradient based method.

• Get z(i)l from qφ(z|x(i)l ) and train classifier C using

{z(i)l , y(i)}ni=1.



M2

log pθ(x, y) ≥ E[log pθ(x|y, z) + log pθ(y) + log p(z)−

log qφ(z|x, y)]= −J1(x, y)

log pθ(x) ≥ −∑y

qφ(y|x)J1(x, y) +H(qφ(y|x))

= −J2(x)

• Minimize the following function:

minθ,φ1

n

n∑i=1

J1(x(i)l , y(i)) +

1

m

m∑j=1

J2(x(j)u )

using gradient based method.• Use qφ(y|x) to predict the class of input x.


1 Semi-supervised learning

2 Deep clustering


Deterministic clustering 1. (Y.Ma et al., 2014)

• Calculate reconstruction error

J1 =1

m

m∑j=1

||H(G(x(j)u ))− x(j)

u ||2


Deterministic clustering 1. (Y.Ma et al., 2014)

• Assume that µk is the center of cluster Ck and Tk is the numberof elements in Ck.

J2 =1

K

K∑k=1

maxl

{Sk + SlMkl

|l 6= k

}

where Sk =

√1Tk

∑G(x

(j)u )∈Ck

||G(x(j)u )− µk||2 and

Mkl = ||µk − µl||.• minimize following function:

J = J1 + λ · J2

using gradient based method.


Deterministic clustering 2. (J.Xie et al., 2016)

• Given an initial estimate of the dnn mapping fθ and the initialcluster centroids {µk}Kk=1, calculate

qjk =(1 + ||zj − µk||2/α)−

α+12∑

l(1 + ||zj − µl||2/α)−α+12

where z = fθ(x).• minimize the following function∑

j

KL(Pj ||Qj) =∑j

∑k

pjk logpjkqjk

where P is the target distribution. (pjk =q2jk/fj∑l q

2jl/fl

, fk =∑

j qjk)

• Similar to self-training.


Stochastic clustering (G.Chen, 2015)

• DNN+Nonparametric mixture model

P (z, {θ}Kk=1|{h(L)}ni=1) ∝ p(z)K∏k=1

p(θk)

[n∏

i=1

p(xi|θzi)

]

→ Chinese Restaurant process Mixture• Gibbs sampling(z) + Online maximum margin learning(θ) +

fine-tuning(WL,θ)


Reference

• G.Chen. (2015). Deep Learning with Nonparametric Clustering.arXiv:1501.03084

• D.P.Kingma, D.J.Rezende, S.Mohamed and M.Welling. (2014).Semi-supervised learning with deep generative models.Advances in Neural Information Processing Systems.pp.3581-3589.

• Y.Ma, C.Shang, F.Yang and D.Huang. (2014). Latent SubspaceClustering based on Deep Neural Networks. Proceedings of thefifth ADCONIP.

• D.Lee. (2013). Pseudo-label: The simple and efficientsemi-supervised learning method for deep neural networks.Workshop on Challenges in Representation Learning, ICML.3(2).


Reference

• R.Raina, A.Battle, H.Lee, B.Packer and A.Y.Ng. (2007).Self-taught learning: transfer learning from unlabeled data.Proceedings of the 24th international conference on Machinelearning. pp.759-766.

• M.Ranzato and M.Szummer. (2008). Semi-supervised learningof compact document representations with deep networks.Proceedings of the 25th international conference on Machinelearning. pp.792-799.

• J.Weston, F.Ratle, H.Mobahi and R.Collobert. (2012). Deeplearning via semi-supervised embedding. Neural Networks:Tricks of the Trade. pp.639-655.

• J.Xie, R.Girshick and A.Farhadi. (2016). Unsupervised DeepEmbedding for Clustering Analysis. arXiv:1511.06335.

semi-supervised learning and clustering using deep ...stat.snu.ac.kr/idea/seminar/20161203/ssl and...

Documents