a generic approach to topic models and its application to virtual communities

A generic approach to topic modelsand its application to virtual communities

Gregor Heinrich

PhD presentation (English translation incl. backup slides, 45min)Faculty of Mathematics and Computer Science

University of Leipzig

28 November 2012Version 2.9 EN BU

Gregor Heinrich A generic approach to topic models 1 / 35

Overview

Introduction

Generic topic models

Inference methods

Application to virtual communities

Conclusions and outlook


Motivation: Virtual communities

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the



cooperation

authorship

citationauthorship

recommendation

authorshipcitation

annotationsimilarity

annotation

Virtual communities = groups of persons who exchange informationand knowledge electronically

Examples: organisations, digital libraries, Web 2.0 applicationsincl. social networks

Data are multimodal: text content; authorship, citation, annotationsand recommendations; cooperation and other social relations

Typical case: discrete data with high dynamics and large volumes


Motivation: Unsupervised mining of discrete data

Identification of relationships in large data volumesOnly data (and possibly model) required (information retrieval,network analysis, clustering, NLP methods)Density problem: Features too sparse for analysis inhigh-dimensional feature spaceVocabulary problem: Semantic similarity , lexical similarity(polysemy, synonymy, etc.)

4/24/12 1:58 AM

Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/bar.svg

judicial assembly

location

long objectpressure unit

furnitureverb

long object

verb

verb

people

people

location

bank

computer

furniture

bar

court

restaurant

stickatm

counter

prevent

staff

adhere

glue

personnel

employees

yard

teller

network

table

barbar


Topic models as approach

...

...

words documents

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction



...

...

wordstopics

documents






...

...

wordstopics

documents

bar

rhythm

drum






...

...

wordstopics

documents

bar

restaurant

wine

rhythm

drum






...

...

wordstopics

documents

Rhythm & Spice

Playing Drums

Leipzigs Barsand Restaurants

for Beginners

Jamaican Grillbar

restaurant

wine

rhythm

drum






...

...

wordstopics

documents





Language models: Unigram model

wm,n

zz

w2,2 w2,3w2,1w1,2 w1,3w1,1

document 1

document 2

p(w | z)

document m

word n

1 2 3

One distribution for all data


Language models: Unigram mixture model

wm,n

zmz2z1

w2,2 w2,3w2,1w1,2 w1,3w1,1

document 1

document 2

p(w | z1) p(w | z2)

document m

word n

1 2 3 1 2 3

One distribution per document


Language models: Unigram admixture model

wm,n

zm,n

document m

word n

z2,2z1,2

w2,2 w2,3w2,1w1,2 w1,3w1,1

document 1

document 2

z1,1 z1,3 z2,1 z2,3

p(w | z1,1) . . . . . . p(w | z2,3)

One distribution per word basic topic model


Language models: Unigram admixture model

wm,n

zm,nz2,2z1,2

w2,2 w2,3w2,1w1,2 w1,3w1,1

document 1

document 2

z1,1 z1,3 z2,1 z2,3

p(w | z1,1) . . . . . . p(w | z2,3)

document m

word n

One distribution per word basic topic model


Bayesian topic models: The Dirichlet distribution

p3

p2

p1

p(w | z) Dir(~)

~ = (4, 4, 2)

1 2 3 1 2 3 1 2 3

1 = 1 2 = 1

3 = 1

1 = 1 2 = 1

3 = 1

1 = 1 2 = 1

3 = 1

Bayesian methodology:

Distributions generatedfrom prior distributionsSpeech + other discretedata: Dirichlet distributionimportant prior:

Defined on simplex:Surface containing alldiscrete distributionsParameter ~ controlsbehaviour

Bayesian topic model:Latent Dirichlet Allocation(LDA) (Blei et al. 2003)


Latent Dirichlet Allocation

wm,n

zm,n

~k

~m

document m

word ntopic k

Dir()

Dir()

Latent Dirichlet Allocation (Blei et al. 2003)



wm,n

zm,n

~k

~m

document m

word ntopic k

Dir()

Dir()

Concert tonight at Rhythm and Spice Restaurant . . .

Latent Dirichlet Allocation (Blei et al. 2003)



wm,n

zm,n

~k

~m

document m

word ntopic k

Dir()

restaurant

bar food

grill

concertmusic

rhythm ba

r

. . .

. . .

topic 2

topic 1


~2

~1

Dir()

Generating word distributions for all topics



wm,n

zm,n

~k

~m

document m

word ntopic k

Dir()

restaurant

bar food

grill

concertmusic

rhythm ba

r

. . .

. . .

topic 2

topic 1


~2

~1

Dir()

topic 1

~1

topic 2 . . .

document 1

Generating topic distribution for document



wm,n

zm,n

~k

~m

document m

word ntopic k

Dir()

restaurant

bar food

grill

concertmusic

rhythm ba

r

. . .

. . .

topic 2

topic 1


~2

~1

Dir()

topic 1

~1

topic 2 . . .

document 1

Sampling the topic index for first word, z = 2



wm,n

zm,n

~k

~m

document m

word ntopic k

Dir()

restaurant

bar food

grill

concertmusic

rhythm ba

r

. . .

. . .

topic 2

topic 1


~2

~1

Dir()

topic 1

~1

topic 2 . . .

document 1

Sampling a word from term distribution for topic 2, concert


State of the art

Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods


State of the art




wm,n

zm,n

cm, j

ym, j

~am

~k ~k

~x

xm, j xm,n

m [1,M]n [1,Nm]j [1, Jm] k [1,K]k [1,K]

x [1, A]

Experttagtopic model

(Heinrich 2011)


State of the art





Research questions

How can topic models be described in a generic way in order to usetheir properties across particular applications?

Can generic topic models be implemented generically and, if so,can repeated structures be exploited for optimisations?

How can generic models be applied to data in virtual communities?


Overview

Introduction


Inference methods




How can topic models be described in a generic way in order to usetheir properties across particular applications?


Topic models: Example structures

m[1,M]

k[1,K] n[1,Nm]

zm,n

~k wm,n

~m

(a) Latent Dirichlet allocation (LDA)

m[1,M ]

k[1,K] n[1,Nm]

zm,n

cm,n

~c

c[1,C]

~k wm,n

~m

(b) Authortopic model (ATM)

m[1,M]n[1,Nm]

~ ~k

[1,L]

wm,n

y[1,K]

~x

~r

z1m,n

z2m,n~rm

~m,x z3m,n

(c) Pachinko allocation model (PAM4)

m[1,M]n[1,Nm]

wm,n

T[1,|T |]

~0 zTm,n~m,0

~m,T ztm,n~T

~ ~k

k[1,|t|+|T |+1]

m,n ~T,t

T,t[1,|T ||t|]

(d) Hierarchical PAM (hPAM)(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)


Generic topic models NoMMs

xoutK

xink= f (xin)

~k

Dir(~k |)~3

~2

~1

Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.



xoutK

xink= f (xin)

~k

xoutK

xin1

xin2

k= f (xin1 , xin2)

xout1K

xin

xout2

k= f (xin)

~k ~k





Kk= f (xin)

~kK

xin1

xin2

k= f (xin1 , xin2)

xout1K

xout2

k= f (xin)

~k ~k

x1 x2





~k |xin1 x1xin2 xout2

xout1~k |~k | x

2





~k |xin1 x1xin2 xout2

xout1~k |~k | x

2

Dir(~k |)Dir(~k |)Dir(~k |)





~m |m

~k | zm,n

Dir(~k |)Dir(~k |)

wm,n

LDA





~m |m=1

~k | zm,n

Dir(~k |)Dir(~k |)

wm,n

LDA~1





~m |m=1

~k | z1,1=3

Dir(~k |)Dir(~k |)

wm,n

LDA~1





~m |m=1

~k | z1,1=3

Dir(~k |)Dir(~k |)

wm,n

LDA~3





~m |m=1

~k | z1,1=3

Dir(~k |)Dir(~k |)

w1,1=2

LDA~3




Topic models as NoMMs

~m | zm,n=k

[V][K]m

[M]

wm,n=t~k |

(a) Latent Dirichlet allocation, LDA

m

[M]

xm,n=x

[A]~x|

zm,n=k[V][K]

wm,n=t~k | ~am

(b) Authortopic model, ATM

~rm | ~rm

[M]z2m,n=x

[s1]~m,x|~x

z3m,n=y[V][s2]

wm,n=t~y | ~

(c) Pachinko allocation model, PAMm

[M]zTm,n=T

~m,T |~Tztm,n=t

~T,t |

~m,0|~0

m

[M]

[V]

wm,n~`,T,t|

`m,n[|T |]

[|t|]

[3][|T |+|t|+1]

zTm,n=T

(d) Hierarchical pachinko allocation model, hPAM(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)

m[1,M]

k[1,K] n[1,Nm]

zm,n

~k wm,n

~m

m[1,M ]

k[1,K] n[1,Nm]

zm,n

cm,n

~c

c[1,C]

~k wm,n

~m

m[1,M]n[1,Nm]

~ ~k

[1,L]

wm,n

y[1,K]

~x

~r

z1m,n

z2m,n~rm

~m,x z3m,n

m[1,M]n[1,Nm]

wm,n

T[1,|T |]

~0 zTm,n~m,0

~m,T ztm,n~T

~ ~k

k[1,|t|+|T |+1]

m,n ~T,t

T,t[1,|T ||t|]


Overview

Introduction


Inference methods




Can generic topic models be implemented generically. . . ?


Bayesian inference problem and Gibbs sampler

Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A

= Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)

xm y w

V1 H1 2 H2 3

~rm |r ~m,x| ~y |






~rm |rm

~m,x|w

~y | 1 H1 2 H2 3 V

? ? ?

? ?

p( , , , , | , A)Gregor Heinrich A generic approach to topic models 18 / 35





~rm |rm

~m,x|w

~y | ? ?

H1i H2i Vp( , | , A)Hi ,H1i ,H2i


Sampling distribution for NoMMs

xm y w~rm |r ~m,x| ~y |

Gibbs sampler can be generically derived (Heinrich 2009)

Typical case: Quotients of factors over levels `:

p(Hi|Hi,V ,A) `

nik,t + t nik,t +

[`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)

More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )


Sampling distribution for NoMMs

xm y wq(m, x) q((m,x), y) q(y,w)

Gibbs sampler can be generically derived (Heinrich 2009)

Typical case: Quotients of factors over levels `:

p(Hi|Hi,V ,A) `

nik,t + t nik,t +

[`] = `

[q(k, t)

][`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)

More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )


Typology of NoMM substructures

~a |

~b |

a

b~k |

z

a~z |

b

~a |~z |

c

c

~a |

~b |

a

b~z |

c

x

z1

yza

~z |b

~a |

N1. Dirichletmultinomial

v

za~z |~ca

b

x

a~x |

b

~a | y~y |

c

q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)

ca,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z1) q(b, z2) q(z1, c c) q(z2, c c)

E2. Autonomous edges C2. Combined indices

N2. Observed parameters E3. Coupled edges C3. Interleaved indices

z2

NoMM substructures: Nodes, edges/branches, componentindices/merging of edges:

Representation via q-functions and likelihoodMultiple samples per data point: q(a, x y) for respective level

Library incl. additional structures: alternative distributions,regression, aggregation etc. q-functions + other factors


Implementation: Gibbs meta-sampler

NoMM codegenerator

C/Java codeprototype

Java VM native platform

optimise

codetemplates

generate

Java modelinstance

C/Java codemodule

validate deploydata =compile

data =

specificationtopic model

Code generator for topic models in Java and C

Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture


Implementation: Gibbs meta-sampler

codegenerator

C/Java codeprototype

Java VM native platform

optimise

codetemplates

generate

Java modelinstance

C/Java codemodule

validate deploydata =compile

data =

specificationtopic model

NoMM

Code generator for topic models in Java and C

Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture


Example NoMM script and generated kernel: hPAM2

~k | wmn

xmn=x

[V]

~m |m[M]

[Y]

[X]ymn=y

~xm,x|xxm[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + xx, y , 0 : k = 1 + X + y

model = HPAM2

description:

Hierarchical PAM model 2 (HPAM2)

sequences:

# variables sampled for each (m,n)

w, x, y : m, n

network:

# each line one NoMM node

m >> theta | alpha >> x

m,x >> thetax | alphax[x] >> y

x,y >> phi[k] >> w

# java code to assign k

k : {

if (x==0) { k = 0; }

else if (y==0) k = 1 + x;

else k = 1 + X + y;

}.

// hidden edge

for (hx = 0; hx < X; hx++) {

// hidden edge

for (hy = 0; hy < Y; hy++) {

mxsel = X * m + hx;

mxjsel = hx;

if (hx == 0)

ksel = 0;

else if (hy == 0)

ksel = 1 + hx;

else

ksel = 1 + X + hy;

pp[hx][hy] = (nmx[m][hx] + alpha[hx])

* (nmxy[mxsel][hy] + alphax[mxjsel][hy])

/ (nmxysum[mxsel] + alphaxsum[mxjsel])

* (nkw[ksel][w[m][n]] + beta)

/ (nkwsum[ksel] + betasum);

psum += pp[hx][hy];

} // for h

} // for h

sub

root

sup

sub sub

supwords

words

words

words words words


Example NoMM script and generated kernel: hPAM2

~k | wmn

xmn=x

[V]

~m |m[M]

[Y]

[X]ymn=y

~xm,x|xxm[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + xx, y , 0 : k = 1 + X + y

model = HPAM2

description:

Hierarchical PAM model 2 (HPAM2)

sequences:

# variables sampled for each (m,n)

w, x, y : m, n

network:

# each line one NoMM node

m >> theta | alpha >> x

m,x >> thetax | alphax[x] >> y

x,y >> phi[k] >> w

# java code to assign k

k : {

if (x==0) { k = 0; }

else if (y==0) k = 1 + x;

else k = 1 + X + y;

}.

// hidden edge

for (hx = 0; hx < X; hx++) {

// hidden edge

for (hy = 0; hy < Y; hy++) {

mxsel = X * m + hx;

mxjsel = hx;

if (hx == 0)

ksel = 0;

else if (hy == 0)

ksel = 1 + hx;

else

ksel = 1 + X + hy;

pp[hx][hy] = (nmx[m][hx] + alpha[hx])

* (nmxy[mxsel][hy] + alphax[mxjsel][hy])

/ (nmxysum[mxsel] + alphaxsum[mxjsel])

* (nkw[ksel][w[m][n]] + beta)

/ (nkwsum[ksel] + betasum);

psum += pp[hx][hy];

} // for h

} // for h


DocumentTopic distribution in Gibbs sampler

Iteration 1

Documenttopic matrix (200 documents, 50 topics)



Iteration 5




Iteration 10




Iteration 15




Iteration 20




Iteration 30




Iteration 40




Iteration 50




Iteration 60




Iteration 80




Iteration 100




Iteration 120




Iteration 150




Iteration 200




Iteration 300




Iteration 500, converged stationary state



Fast sampling: Hybrid scaling methods

1 10 100 1000iterations

depend.indep.

5000

perplexity

PAM4

model dim. speedup

LDA 500 30.2

PAM4 40 40 7.4PAM4 24.1

PAM4 49.8

par. indep.ser.

40 4040 40

Serial and parallel scaling methods:Generalised results for LDA to generic NoMMs, specifically (Porteouset al. 2008; Newman et al. 2009) + novel approach

Problem: Sampling space for stat. dependent variables: K L . . . Independence assumption: Separate samplers with dimensions

K + L + . . . K L . . .Empirical result: Iterations , but topic quality comparable

Hybrid approaches with independent samplers highly effectiveImplementation: complexity covered by meta-sampler


Overview

Introduction


Inference methods




How can generic models be applied to data in virtual communities?


NoMM design process~a |

~b |

a

b~k |

z

a~z |

b

~a |~z |

c

c

~a |

~b |

a

b~z |

c

x

z1

yza

~z |b

~a |

N1. Dirichletmultinomial

v

za~z |~ca

b

x

a~x |

b

~a | y~y |

c

q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)

ca,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z1) q(b, z2) q(z1, c c) q(z2, c c)

E2. Autonomous edges C2. Combined indices

N2. Observed parameters E3. Coupled edges C3. Interleaved indices

z2

Typology Library of NoMM substructures Idea: Construct models from simple substructures that connect

terminal nodes:Terminal nodes multimodal data (virtual communities...)Substructures relationships in data; latent semantics

Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)

Gibbs distribution known! model behaviour: q(x, y) = rich get richerImplementation and test with Gibbs meta-sampler; possibly iteration


NoMM design process

1 2 3 4 51 2 3 4 5Define Definemodelling task Createand metricsevidence model terminals Formulatemodelassumptions Composemodel andpredict properties

6 7 8 9 106 7 8 9 10Write Generateand adapt ImplementNoMM script target metric Evaluatebased ontest corpus Optimiseand integrate fortarget platformGibbs samplerTypology Library of NoMM substructures

Idea: Construct models from simple substructures that connectterminal nodes:

Terminal nodes multimodal data (virtual communities...)Substructures relationships in data; latent semantics

Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)

Gibbs distribution known! model behaviour: q(x, y) = rich get richerImplementation and test with Gibbs meta-sampler; possibly iteration


Application: Expert finding with tag annotations

authorship

annotationRetrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

authorship

Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags

Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output


Application: Expert finding with tag annotations


14~wm 13

5 14 33AB

12

TH

AB

TH

5

33

~am ~cm

document m

authors tags

Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags

Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output


Model assumptions


14~wm 13

5 14 33AB

12

TH

AB

TH

5

33

~am ~cm

document m

authors tags

(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y


Model assumptions


14~wm 13

5 14 33

AB

TH

5

33~cm

AB

12

TH

~am

document m

authors tags

(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y


Model construction

~cm

~am wm,n[M,Nm]

p(. . . |~a, ~w, ~c) . . .

authors word

tags

(5) Model construction: (a) Start with terminal nodes (from step 3)


Model construction

~cm

. . .xm,n

~amm wm,n

[M,Nm]

p(x, . . . |) am,x q(x, . . . ) . . .

document author word

tags

(b) Authorship ~am given as observed distribution node sampels author x of a word


Model construction

~cm

q(x, z)xm,n

~amm zm,n wm,n

[M,Nm]

p(x, z, . . . |) am,x q(x, z) . . .

document author word topic word

tags

(c) Each author has only a single field of expertise (topic distribution) q(x, z) associates (word-)topics with sampled authors x (cf. ATM)


Model construction

~cm

q(x, z)xm,n

~amm

q(z,w)zm,n wm,n

[M,Nm]

p(x, z, . . . |) am,x q(x, z) q(z,w) . . .

document author word topic word

tags

(d) Topic distribution over terms connect z and w via q(z,w)


Model construction

cm, jq(x,zy)

xm,nj~am

mq(z,w)zm,n

wm,n[M,Nm]

p(x, z, y |) am,x q(x, z y) q(z,w) q(y, c)

q(y, c) [M, Jm]ym, j

document author

word topic

tag topic

word

tag

(e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z y) overlays values for z and y


Model construction

cm, jq(x,zy)

xm,n~am

mq(z,w)zm,n

wm,n[M,Nm]



document author

word topic

tag topic

word

tag



Model construction

cm, jq(x,zy)

xm, j~am

mq(z,w)zm,n

wm,n[M,Nm]



document author

word topic

tag topic

word

tag



Model construction ordinary approach

wm,n

zm,n

cm, j

ym, j

~am

~k ~k

~x

xm, j xm,n

m [1,M]n [1,Nm]j [1, Jm] k [1,K]k [1,K]

x [1, A]

Experttagtopic model

(Heinrich 2011)


Experttagtopic model: Evaluation

ATM ETT

AP@

10

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ETT

word queries tag queries

ATM ETT-500

-450

-400

-350

-300

-250

-200

-150

coherencescore

NIPS Corpus: 2.3 million words, 2037 authors, 165 tagsRetrieval: Average Precision @10:

Term queries: ETT > ATMTag queries: Similarly good AP values

Topic coherence (Mimno et al. 2011): ETT > ATMSemi-supervised learning: Tag queries retrieve items without tags


Overview

Introduction


Inference methods




Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

Design process based on typology of NoMM substructures

Application to virtual communities: Experttagtopic model forexpert finding with annotated documentsContribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios





+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures







Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for

scenario modelling (Heinrich 2010)







Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for

scenario modelling (Heinrich 2010)

Application to virtual communities: Experttagtopic model forexpert finding with annotated documents

+ Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)

Contribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios





+ Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures

+ AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)



+ Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models



Conclusions: Resesarch contributions4/15/12 1:47 AM

Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/fmica.svg

1.

2.

4.

8.

3.

6.

9.

7.

10.

5.

authors

authors

authorsauthors

cites

authorsauthors

cites

cites

authors

authors

authors

authors

cites

cites

authors

cites

authors

cites

authors

cites

cites

authorscites

cites

authors

authors

authors

authorscites

authorsauthors

cites cites

authors

cites

authors

citescites authorsauthors

authors

authors

authors

cites

cites

authors

[blind source separation] [independent component analysis]










Yang_H1


Cichocki_A

Bell_A2

Rokni_U

Lee_T1

Parra_L1

Shouval_H

Oja_E1

Lin_J

Hyvarinen_A

A New Learning Algorithm for Blind Signal Separation

9

Independent Component Analysis of Eiectroencephalographic Data

4

Unmixing Hyperspectral Data,

4

Factorizing Multivariate Function Classes,

1

A Non-linear Information Maximisation Algorithm that Performs Blind Separation

4

Maximum Likelihood Blind Source Separation A Context-Sensitive Generalization of ICA,

4

Semiparametric Approach to Multichannel Blind Deconvolution of Nonminimum Phase Systems,

2

One-unit Learning Rules for Independent Component Analysis,

Extended ICA Removes Artifacts from Electroencephalographic Recordings,

8

Multiplicative Updating Rule for Blind Separation Derived from the Method of Scoring,

1

Algorithms for Independent Components Analysis and Higher Order Statistics,

2

Edges are the "Independent Components" of Natural Scenes,

2

Blind Separation of Delayed and Convolved Sources,

2

The Efficiency and the Robustness of Natural Gradient Descent Learning Rule,

2

Receptive Field Formation in Natural Scene Environments Comparison of Single Cell Learning Rules,

4

New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit,

Search for Information Bearing Components in Speech,

1

Source Separation and Density Estimation by Faithful Equivariant SOM,

2

Data Visualization and Feature Selection New Algorithms for Nongaussian Data,

1

Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings,

5

Unsupervised Classification with Non-Gaussian Mixture Models Using ICA,

3

Sparse Code Shrinkage.' Denoising by Nonlinear Maximum Likelihood Estimation,

2

Symplectic Nonlinear Component Analysis

Blind Separation of Filtered Sources Using State-Space Approach,

1




Figure: ETT1: Expert search in community browser





+ Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures

+ AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)



+ Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models



Outlook

New applications and NoMM structures, e.g., time as variableAlternative inference methods:

Generic Collapsed Variational Bayes (Teh et al. 2007): Structuresimilar to Collapsed Gibbs-SamplerNon-parametric methods: Learning model dimensions using Dirichletor PitmanYor process priors (Teh et al. 2004; Buntine and Hutter2010), NoMM polymorphism (Heinrich 2011a)

Improved support in design process:Data-driven design: Search over model structures to obtain bestmodel for data setArchitecture-specific Gibbs meta-sampler, e.g., massively-parallel orFPGA, cf. (Heinrich et al. 2011)

Integration with interactive user interfaces: Models can be createdon the fly, e.g., for visual analytics


Thank you!

Q + A


References I

References

Barnard, K., P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan (2003, August).Matching words and pictures.JMLR Special Issue on Machine Learning Methods for Text and Images 3(6), 11071136.

Bellegarda, J. (2000, August).Exploiting latent semantic information in statistical language modeling.Proc. IEEE 88(8), 12791296.

Blei, D., A. Ng, and M. Jordan (2003, January).Latent Dirichlet allocation.Journal of Machine Learning Research 3, 9931022.

Buntine, W. and M. Hutter (2010).A Bayesian review of the Poisson-Dirichlet process.arXiv:1007.0296v1 [math.ST].

Chang, J., J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei (2009).Reading tea leaves: How humans interpret topic models.In Proc. Neural Information Processing Systems (NIPS).


References II

Dietz, L., S. Bickel, and T. Scheffer (2007, June).Unsupervised prediction of citation influences.In Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon,

USA.

Heinrich, G. (2009).A generic approach to topic models.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases

(ECML/PKDD), Part 1, pp. 517532.

Heinrich, G. (2010).Actorsmediaqualities: a generic model for information retrieval in virtual communities.In Proc. 7th International Workshop on Innovative Internet Community Systems (I2CS 2007), part

of I2CS Jubilee proceedings, Lecture Notes in Informatics, GI.

Heinrich, G. (2011a, March).Infinite LDA Implementing the HDP with minimum code complexity.Technical note TN2011/1, arbylon.net.

Heinrich, G. (2011b).Typology of mixed-membership models: Towards a design method.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases

(ECML/PKDD).


References III

Heinrich, G. and M. Goesele (2009).Variational Bayes for generic topic models.In Proc. 32nd Annual German Conference on Artificial Intelligence (KI2009).

Heinrich, G., J. Kindermann, C. Lauth, G. Paa, and J. Sanchez-Monzon (2005).Investigating word correlation at different scopes a latent concept approach.In Workshop Lexical Ontology Learning at Int. Conf. Mach. Learning.

Heinrich, G., F. Logemann, V. Hahn, C. Jung, G. Figueiredo, and W. Luk (2011).HW/SW co-design for heterogeneous multi-core platforms: The hArtes toolchain, Chapter Audio

array processing for telepresence, pp. 173207.Springer.

Li, W., D. Blei, and A. McCallum (2007).Mixtures of hierarchical topics with pachinko allocation.In International Conference on Machine Learning.

Li, W. and A. McCallum (2006).Pachinko allocation: DAG-structured mixture models of topic correlations.In ICML 06: Proceedings of the 23rd international conference on Machine learning, New York,

NY, USA, pp. 577584. ACM.


References IV

Mimno, D., H. M. Wallach, E. Talley, M. Leenders, and A. McCallum (2011, July).Optimizing semantic coherence in topic models.In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing,

Edinburgh, UK, pp. 262272.

Newman, D., A. Asuncion, P. Smyth, and M. Welling (2009, August).Distributed algorithms for topic models.JMLR 10, 18011828.

Porteous, I., D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling (2008).Fast collapsed Gibbs sampling for latent Dirichlet allocation.In KDD 08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge

discovery and data mining, New York, NY, USA, pp. 569577. ACM.

Rosen-Zvi, M., T. Griffiths, M. Steyvers, and P. Smyth (2004).The author-topic model for authors and documents.In Proc. 20th Conference on Uncertainty in Artificial Intelligence (UAI).

Teh, Y., M. Jordan, M. Beal, and D. Blei (2004).Hierarchical Dirichlet processes.Technical Report 653, Department of Statistics, University of California at Berkeley.

Teh, Y. W., D. Newman, and M. Welling (2007).A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation.In Advances in Neural Information Processing Systems, Volume 19.


Appendix


Example: Text mining for semantic clusters

Topic label Dominant terms according to k,t = p(term |topic)Bundesliga FC SC Munchen Borussia SV VfL Kickers SpVgg Uhr Koln Bochum Freiburg VfB

Eintracht Bayern Hamburger Bayern+Munchen

Polizei / Unfall Polizei verletzt schwer Auto Unfall Fahrer Angaben schwer+verletzt Menschen Wa-gen Verletzungen Lawine Mann vier Meter Strae

Tschetschenien Rebellen russischen Grosny russische Tschetschenien Truppen Kaukasus MoskauAngaben Interfax tschetschenischen Agentur

Politik / Hessen FDP Koch Hessen CDU Koalition Gerhardt Wagner Liberalen hessischen Wester-welle Wolfgang Roland+Koch Wolfgang+Gerhardt

Wetter Grad Temperaturen Regen Schnee Suden Norden Sonne Wetter Wolken Deutsch-land zwischen Nacht Wetterdienst Wind

Politik / Kroatien Parlament Partei Stimmen Mehrheit Wahlen Wahl Opposition Kroatien PrasidentParlamentswahlen Mesic Abstimmung HDZ

Die Grunen Grunen Parteitag Atomausstieg Trittin Grune Partei Trennung Mandat Ausstieg AmtRoestel Jahren Muller Radcke Koalition

Russische Politik Russland Putin Moskau russischen russische Jelzin Wladimir Tschetschenien Rus-slands Wladimir+Putin Kreml Boris Prasidenten

Polizei / Schulen Polizei Schulen Schuler Tater Polizisten Schule Tat Lehrer erschossen BeamtenMann Polizist Beamte verletzt Waffe

Bigram-LDA: Topics from 18400 dpa news messages, Jan. 2000 (Heinrich et al. 2005)


Notation: Bayesian network vs. NoMM levels

~k

xi

K

ki

~k | [T ]xiki

[K]

parameters + hyperparameters nodes ( |)variables ki, xi edges ki, xi

plates (i.i.d. repetitions) i, k indexes i + dimensions k


NoMM representation: Variable dependencies

~k ~k ~k

~k

vi

~k

~k

`=1 `=2 `=3 `=4 `=5 `=6

~k

A

X

vi

`=8`=7

~k

vihi

hihi

hi hi

~k |

`=1

`=2 `=3

`=4

`=5 `=6

~k | ~k |~k |~k | ~k |

~k

`=8

h1i

h2i

h4ih3i v5i h6i

v8i

`=7

~k |v7i

k1i

k2i


Collapsed Gibbs sampler

~x (0)

x2

x1

~x

p(~x |V)

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)



~x (0)

x2

x1

~x ~x continuous, i [1, 2]H discrete, i [1,W]

l






x2

x1i = 1

~xp(x1 | x2)






x2

x1i = 1

~x






x2

x1i = 2

~x

p(x2 | x1)






x2

x1i = 2

~x






x2

x1i = 1

~x






x2

x1i = 2

~x






x2

x1i = 1

~x






x2

x1i = 2

~x






x2

x1i = 1

~x






x2

x1i = 2

~x






x2

x1i = 1

~x





Generic topic models: Generative process

~j

~kK

J

xi

xi

Ixi

...

X X

A

Generative process on level `:

xi Mult(xi | ~k) ~k Dir(~k | ~j) (2)k = fk(parents(xi), i) j = fj(known parents(xi), i) . (3)


Generic topic models: Complete-data likelihood

Likelihood of all hidden and visible data X = {H,V} and parameters :p(X, |A) =

`L

[ i

p(xi,out|~xi,in) data items Discrete unionsq

k

p(~k|~) components Dirichlet 4

][`]

levels

=`L

[k

f (~k, ~nk, ~)][`]

~nk = (nk,1, nk,2, ) (4)

Product dependent on co-occurrences nk,t between input and outputvalues, xi,in=k and xi,out=t, on each level `

There are variants to component selection xi,in=k

There are mixture node variants, e.g., observed components


Generic topic models: Complete-data likelihood

The conjugacy between the multinomial and Dirichlet distributions ofmodel levels leads to a simple complete-data likelihood:

p(X, |A) =`

i

Mult(x`i |`, k`i )k

Dir(~`k | ~`j ) (5)

=`

i

ki,xi

k

1B(~j)

t

j1k,xi

` (6)=

`

k

1B(~j)

t

j+nk,t1k,xi

` (7)=

`

k

B(~nk + ~j)B(~j)

Dir(~k |~nk + ~j)` (8)

where brackets []` enclose a particular level `.nk,t is how often k and t co-occur.


Inference: Generic full conditionals

Gibbs full conditionals are derived for groups of dependent hidden edges,Hdi Hd X and surrounding edges Sdi Sd considered observed. Alltokens co-located with a particular observation: Xdi = {Hdi , Sdi }.Full conditional via chain rule applied to (8) with integrated out:

p(Hdi |X\Hdi ,A) =p(Hdi , S

di |X\{Hdi , Sdi },A)

p(Sdi |X\{Hdi , Sdi },A)(9)

p(Xdi |X\Xdi ,A) =p(X |A)

p(X\Xdi |A)(10)

=`

[k

B(~nk + ~j)

B(~nk\Xdi + ~j)]`

(11)

`{Hd ,Sd}

[B(~nk + ~j)

B(~nk\Xdi + ~j)]`

(12)

H11H21

H12H22

H13H23

H14H24

Hd2

H1Hd


Inference: q-functions

q(k, t) =B(~nk + ~j)

B(~nk\xdi + ~j)|xdi |=1=

nik,t + t nik,t +

|xdi |=2=

nk,t\xdi,1 + t nk,t\xdi,1 +

nk,t\xdi,2 + + (xdi,1xdi,2)

t nk,t\xdi,2 + + 1. . .


q-functions: Polya urn and sampling weights

E{~k}

Figure: Polya urn: sampling with over-replacement.

q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +

ntik + T= smoothed ratio of occurrences

t={u,v}=

nuik,u +

nuik + Tnvik,v + + (u v)nvik + T + 1

, q(k, u v). . .



E{~k}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .



E{~k}


q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .8 6 4 +

8 6 5 +B( )

B( )

ti =

ti



E{~k}


q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .

8 6 4 +

8 +

ti



E{~k}


q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .7 6 4 +

8 6 5 +B( )

B( )

ti = { , }ui viui vi



E{~k}


q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .

ui

ui



E{~k}


q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .

ui vi

ui q(k, )u=v1



E{~k}


q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .

ui vi

ui q(k, )u=vvi1



E{~k}


q(k, t) ,B(~nk + )B(~nik + )

|t|=1=

ntik,t +


t={u,v}=

nuik,u +


, q(k, u v). . .

ui vi

ui q(k, )u=v1


NoMM substructure library: Gibbs weights and likelihood9.3. SUB-STRUCTURE LIBRARY 171ID.Name

Structure diagram Gibbs sampler weight w, Likelihood p for single token iModelled aspect, example models

N1,E1,C1.DirMultnodes,unbranched

=1 =2

ziaik |

bi,na | j

C1B: k = ziN1B: j = f (ai, i)

E1S(zi): n i

C1A: k = i

E1

N1A: j 1

w(z|) = q(a, z)q(z, b) E1S(zi): q(a, z)q(z, b1 b2 . . . bNi )p(b|a) = z a,zz,bMixture/admixture: LDA [Blei et al. 2003b], PAM [Li & McCallum 2006]; LDCC[Shafiei & Milios 2006] (E1S)

N2.

Observedparameters =1 =2

ziaiz |ca

bi w(z|ca, ) = ca,z q(z, b)p(b|a) = z ca,zz,bLabel distribution: ATM [Rosen-Zvi et al. 2004]

N3.

Non-Dirichletprior

=2

ziaiz |

bi

=1

a | a p(a |)

w(z|a, ) = p(zi |ai, a)q(z, b) ;M-step: estimate a [Blei & Laerty 2007]

p(b|a) = z a,zz,bAlternative distributions on the simplex: CTM [Blei & Laerty 2007]: a exp, N (,); TLM [Wallach 2008]: hierarchy of Dirichlet priors

N4.

Non-discreteoutput

=1

ziaia |

=2

z | vi

z p(z |)

w(z|, ) = q(a, z)p(vi | z) ; M-step: estimate zp(v|a) = z a,z p(v | z)Non-multinomial observ.: Corr-LDA [Barnard et al. 2003], GMM [McLachlan &Peel 2000]: p(v|) = N (x | ,)

N5+E4.

Aggregation

=2zi

z |bi

=3

=1

aia | zm

m| ,vm

m jm (z zj) regression

w(z|zm, vm, ) = q(a, z) q(z,w)N (vm |v m,2) ;M-step: estimate v,

2 |z,v (for linear regression, N5B)prediction: vm =

vm

Regression/supervised learning: Supervised LDA [Blei & McAulie 2007], Rela-tional topic model [Chang & Blei 2009]

E2.

Autonomousedges

=1

=2xi

ai jx |

bi

a |

=3

y jy |

c jE2A: i jE2B: i = j

w(x, y|) = q(a, xy)q(x, b)q(y, c) E2A: q(ai j, xiy j) = q(ai, xiy j)q(a j, xiy j)p(b, c|a) = x a,xx,by a,yy,cCommon mixture of causes: Multimodal LDA [Ramage et al. 2009]

E3.

Couplededges

=1

=2zi

aiz |

bi

a |

=3

z |ci

w(z|) = q(a, z)q(z, b)q(z, c)p(b, c|a) = z a,zz,bz,cCommon cause for observations: Hidden relational model (HRM) [Xu et al. 2006],Link-LDA [Erosheva et al. 2004]

C2.

Combinedindices

=1a |

=2

b |

ai

b j

=3

kci j

xi

y j |C2B: k = (xi, y j)C2A: k = (i, xi)

C2C: k = g(i, j, xi, y j)

w(x, y|) = q(a, x)q(b, y)q(k, c)p(c|a, b) = x[a,xy b,yk,c] , k = g(x, y, i, j)Dierent dependent causes, relation: hPAM [Li et al. 2007a], HRM [Xu et al.2006], Multi-LDA [Porteous et al. 2008a]

C3.

Interleavedindices

=1a |

=2

b |

ai

b j

=3

|ci j

xi

z

y j C3A: i jC3B: i = j

C3A: w(xi, y j |) = q(ai, xi)q(b j, y j)q(xi, ci c j)q(y j, ci c j)C3B: w(x, y|) = q(a, x)q(b, y)[q(x, c c)](xy) [q(x, c c)q(y, c c)]1(xy)C3A: p(c |a) = x a,xx,c, p(c |b) sim., C3B: p(c|a, b) = x a,xx,cy b,yy,cDierent causes, same eect: proposed here

C4.

Switch

=1a |

=2

b |

ai

bi

z |

z |

=3ci

=4

di

zi

si

s ?=0

s ?=1

w(z, s|) = q(a, z)[q(b, 1)q(z, c)](s1) [q(b, 2)q(z, d)](s2)p(c, d|a, b) = z a,z[b,0z,c + b,1z,d]Select complex submodels: Multi-grain LDA [Titov & McDonald 2008], Entity-topic model

a generic approach to topic models and its application to virtual communities

Documents

generic approach

asthe ratio

topic modelsand

large data volumesonly

network analysis

outlookgregor heinrich

large volumesgregor

en bugregor heinrich