various applications of restricted boltzmann machines for ...aldous/pitman_conference/slides/... ·...
TRANSCRIPT
Wrocław University of Technology
Various applications of restricted Boltzmannmachines for bad quality training data
Maciej Zięba
Wroclaw University of Technology
20.06.2014
MotivationBig data - 7 dimensions1
Volume: size of data.
Velocity: speed, displacement of data.
Variety: diversity of data.
Viscosity: measures the resistance to flow inthe volume of data.
Virality: measures how fast data is distributedunique and shared between nodes in a network(e.g. the Internet).
Veracity: trust and quality of the data.
Value: what is the added value that Big Datashould bring?
1According to ATOS company2/7
MotivationBig data - 7 dimensions1
Volume: size of data.
Velocity: speed, displacement of data.
Variety: diversity of data.
Viscosity: measures the resistance to flow inthe volume of data.
Virality: measures how fast data is distributedunique and shared between nodes in a network(e.g. the Internet).
Veracity: trust and quality of the data.
Value: what is the added value that Big Datashould bring?
1According to ATOS company2/7
Veracity of DataTypical problems with data - training context
Imbalanced data problem. One classdominates another in the training data.
Noisy labels problem. Some of the examplesin training data contain incorrectly assignedlabels.
Missing values issue. Values of somefeatures are unknown.
Unstructured data. The data is representedin unprocessed form: images, videos,documents, XML structures.
Semi-supervised data. Some portion oftraining data is unlabelled.
Example ofimbalanced data
3/7
Veracity of DataTypical problems with data - training context
Imbalanced data problem. One classdominates another in the training data.
Noisy labels problem. Some of the examplesin training data contain incorrectly assignedlabels.
Missing values issue. Values of somefeatures are unknown.
Unstructured data. The data is representedin unprocessed form: images, videos,documents, XML structures.
Semi-supervised data. Some portion oftraining data is unlabelled.
Example ofimbalanced data
3/7
MethodsRestricted Boltzmann Machines (RBM)
RBM is a bipartie Markov Random Field withvisible and hidden units.
The joint distribution of visible and hidden unitsis the Gibbs distribution:
p(x,h|θ) =1Z
exp(− E(x,h|θ)
) For binary visible x ∈ {0, 1}D and hidden units
h ∈ {0, 1}M th energy function is as follows:
E(x,h|θ) = −x>Wh− b>x− c>h,
Because of no visible to visible, or hidden tohidden connection we have:
p(xi = 1|h,W,b) = sigm(Wi·h + bi
)p(hj = 1|x,W, c) = sigm
((W·j)>x + cj
)
984Chapter
27.Latent
variablem
odelsfor
discretedata
(a)
HV
(b)
Figure
27.30(a)
Ageneral
Boltzmann
machine,
with
anarbitrary
graphstructure.
Theshaded
(visible)nodes
arepartitioned
intoinput
andoutput,although
them
odelis
actuallysym
metric
anddefines
ajoint
densityon
allthe
nodes.(b)
Arestricted
Boltzmann
machine
with
abipartite
structure.Note
thelack
ofintra-layer
connections.
andthen
normalizing,
whereas
ina
mixture
ofexperts,
we
takea
convexcom
binationof
normalized
distributions.The
intuitivereason
why
PoEm
odelsm
ightw
orkbetter
thana
mixture
isthat
eachexpert
canenforce
aconstraint
(ifthe
experthas
avalue
which
is!1
or"1)
ora
“don’tcare”
condition(if
theexpert
hasvalue
1).By
multiplying
theseexperts
togetherin
di!erent
ways
we
cancreate
“sharp”distributions
which
predictdata
which
satisfiesthe
specifiedconstraints
(Hinton
andTeh
2001).For
example,
considera
distributedm
odelof
text.A
givendocum
entm
ighthave
thetopics
“government”,
“mafia”
and“playboy”.
Ifw
e“m
ultiply”the
predictionsof
eachtopic
together,the
model
may
givevery
highprobability
tothe
word
“Berlusconi” 9(Salakhutdinov
andH
inton2010).
Bycontrast,
addingtogether
expertscan
onlym
akethe
distributionbroader
(seeFigure
14.17).Typically
thehidden
nodesin
anRBM
arebinary,so
hspecifies
which
constraintsare
active.It
isw
orthcom
paringthis
with
thedirected
models
we
havediscussed.
Ina
mixture
model,w
ehave
onehidden
variableq#
{1,...,K
}.W
ecan
representthis
usinga
setof
Kbits,w
iththe
restrictionthat
exactlyone
bitis
onat
atim
e.This
iscalled
alocalist
encod
ing,
sinceonly
onehidden
unitis
usedto
generatethe
responsevector.
Thisis
analogousto
thehypothetical
notionof
grandm
other
cellsin
thebrain,
thatare
ableto
recognizeonly
onekind
ofobject.
Bycontrast,an
RBMuses
adistribu
teden
codin
g,where
many
unitsare
involvedin
generatingeach
output.M
odelsthat
usedvector-valued
hiddenvariables,
suchas
!#
SK
,as
inm
PCA/
LDA,or
z#
RK
,asin
ePCA
alsouse
distributedencodings.
Them
aindi!
erencebetw
eenan
RBMand
directedtw
o-layerm
odelsis
thatthe
hiddenvariables
areconditionally
independentgiven
thevisible
variables,sothe
posteriorfactorizes:
p(h|v
,")
=!
k
p(h
k |v,"
)(27.96)
Thism
akesinference
much
simpler
thanin
adirected
model,
sincew
ecan
estimate
eachh
k
9.Silvio
Berlusconiis
thecurrent
(2011)prim
em
inisterof
Italy.
4/7
MethodsRBM for imbalanced data
Train the model on examples from minority class by application ofMLL (scaled):
1N
log(p(XNn=1|θ)
)=
1N
N∑n=1
log(∑h
p(xn,h|θ))
Generate artificial examples X̄Mm=1 using Synthetic OversamplingTEchnique (SMOTE).
For each of the newly created example xm apply Gibbs sampling:
hm ∼ p(h|x̄m, θ)x̃m ∼ p(x|hm, θ)
Label newly created example x̃m and store in training data.
5/7
MethodsRBM for imbalanced data - example
SMOTE procedure:
A
B
Generating artificial examples on MNIST data:
6/7
MethodsRBM for imbalanced data - example
SMOTE procedure:
A
B
Generating artificial examples on MNIST data:
EXAMPLE 1
EXAMPLE 2
SMOTE
SMOTE RBM
6/7
RBM for other raw data issues Problem of missing values.
RBM is trained for each of the classes separately.
Gibbs sampling is applied to uncover unknown values.
RBM models are iteratively updated while new training example iscompleted.
Problem of noisy labels.
RBM is trained for each of the classes separately.
Each of the trained models is used as an oracle to detectuncorrected labelled data.
Reconstruction error is used to determine unlabelled examples.
Problem of unstructured data.
RBM is used as domain-independent feature extractor thattransforms raw data into hidden units.
7/7