a generic approach to topic models and its application to virtual communities

162
A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English translation incl. backup slides, 45min) Faculty of Mathematics and Computer Science University of Leipzig 28 November 2012 Version 2.9 EN BU Gregor Heinrich A generic approach to topic models 1 / 35

Upload: gregorxx

Post on 16-Nov-2015

227 views

Category:

Documents


0 download

DESCRIPTION

PhD defense slides, translated to English.

TRANSCRIPT

  • A generic approach to topic modelsand its application to virtual communities

    Gregor Heinrich

    PhD presentation (English translation incl. backup slides, 45min)Faculty of Mathematics and Computer Science

    University of Leipzig

    28 November 2012Version 2.9 EN BU

    Gregor Heinrich A generic approach to topic models 1 / 35

  • A generic approach to topic modelsand its application to virtual communities

    Gregor Heinrich

    PhD presentation (English translation incl. backup slides, 45min)Faculty of Mathematics and Computer Science

    University of Leipzig

    28 November 2012Version 2.9 EN BU

    Gregor Heinrich A generic approach to topic models 1 / 35

  • Overview

    Introduction

    Generic topic models

    Inference methods

    Application to virtual communities

    Conclusions and outlook

    Gregor Heinrich A generic approach to topic models 2 / 35

  • Motivation: Virtual communities

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    cooperation

    authorship

    citationauthorship

    recommendation

    authorshipcitation

    annotationsimilarity

    annotation

    Virtual communities = groups of persons who exchange informationand knowledge electronically

    Examples: organisations, digital libraries, Web 2.0 applicationsincl. social networks

    Data are multimodal: text content; authorship, citation, annotationsand recommendations; cooperation and other social relations

    Typical case: discrete data with high dynamics and large volumes

    Gregor Heinrich A generic approach to topic models 3 / 35

  • Motivation: Unsupervised mining of discrete data

    Identification of relationships in large data volumesOnly data (and possibly model) required (information retrieval,network analysis, clustering, NLP methods)Density problem: Features too sparse for analysis inhigh-dimensional feature spaceVocabulary problem: Semantic similarity , lexical similarity(polysemy, synonymy, etc.)

    4/24/12 1:58 AM

    Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/bar.svg

    judicial assembly

    location

    long objectpressure unit

    furnitureverb

    long object

    verb

    verb

    people

    people

    location

    bank

    computer

    furniture

    bar

    court

    restaurant

    stickatm

    counter

    prevent

    staff

    adhere

    glue

    personnel

    employees

    yard

    teller

    network

    table

    barbar

    Gregor Heinrich A generic approach to topic models 4 / 35

  • Motivation: Unsupervised mining of discrete data

    Identification of relationships in large data volumesOnly data (and possibly model) required (information retrieval,network analysis, clustering, NLP methods)Density problem: Features too sparse for analysis inhigh-dimensional feature spaceVocabulary problem: Semantic similarity , lexical similarity(polysemy, synonymy, etc.)

    4/24/12 1:58 AM

    Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/bar.svg

    judicial assembly

    location

    long objectpressure unit

    furnitureverb

    long object

    verb

    verb

    people

    people

    location

    bank

    computer

    furniture

    bar

    court

    restaurant

    stickatm

    counter

    prevent

    staff

    adhere

    glue

    personnel

    employees

    yard

    teller

    network

    table

    barbar

    Gregor Heinrich A generic approach to topic models 4 / 35

  • Topic models as approach

    ...

    ...

    words documents

    Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

    Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

    Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction

    Gregor Heinrich A generic approach to topic models 5 / 35

  • Topic models as approach

    ...

    ...

    wordstopics

    documents

    Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

    Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

    Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction

    Gregor Heinrich A generic approach to topic models 5 / 35

  • Topic models as approach

    ...

    ...

    wordstopics

    documents

    bar

    rhythm

    drum

    Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

    Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

    Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction

    Gregor Heinrich A generic approach to topic models 5 / 35

  • Topic models as approach

    ...

    ...

    wordstopics

    documents

    bar

    restaurant

    wine

    rhythm

    drum

    Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

    Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

    Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction

    Gregor Heinrich A generic approach to topic models 5 / 35

  • Topic models as approach

    ...

    ...

    wordstopics

    documents

    Rhythm & Spice

    Playing Drums

    Leipzigs Barsand Restaurants

    for Beginners

    Jamaican Grillbar

    restaurant

    wine

    rhythm

    drum

    Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

    Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

    Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction

    Gregor Heinrich A generic approach to topic models 5 / 35

  • Topic models as approach

    ...

    ...

    wordstopics

    documents

    Rhythm & Spice

    Playing Drums

    Leipzigs Barsand Restaurants

    for Beginners

    Jamaican Grillbar

    restaurant

    wine

    rhythm

    drum

    Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

    Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

    Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction

    Gregor Heinrich A generic approach to topic models 5 / 35

  • Topic models as approach

    ...

    ...

    wordstopics

    documents

    Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

    Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

    Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction

    Gregor Heinrich A generic approach to topic models 5 / 35

  • Language models: Unigram model

    wm,n

    zz

    w2,2 w2,3w2,1w1,2 w1,3w1,1

    document 1

    document 2

    p(w | z)

    document m

    word n

    1 2 3

    One distribution for all data

    Gregor Heinrich A generic approach to topic models 6 / 35

  • Language models: Unigram mixture model

    wm,n

    zmz2z1

    w2,2 w2,3w2,1w1,2 w1,3w1,1

    document 1

    document 2

    p(w | z1) p(w | z2)

    document m

    word n

    1 2 3 1 2 3

    One distribution per document

    Gregor Heinrich A generic approach to topic models 6 / 35

  • Language models: Unigram admixture model

    wm,n

    zm,n

    document m

    word n

    z2,2z1,2

    w2,2 w2,3w2,1w1,2 w1,3w1,1

    document 1

    document 2

    z1,1 z1,3 z2,1 z2,3

    p(w | z1,1) . . . . . . p(w | z2,3)

    One distribution per word basic topic model

    Gregor Heinrich A generic approach to topic models 6 / 35

  • Language models: Unigram admixture model

    wm,n

    zm,nz2,2z1,2

    w2,2 w2,3w2,1w1,2 w1,3w1,1

    document 1

    document 2

    z1,1 z1,3 z2,1 z2,3

    p(w | z1,1) . . . . . . p(w | z2,3)

    document m

    word n

    One distribution per word basic topic model

    Gregor Heinrich A generic approach to topic models 6 / 35

  • Bayesian topic models: The Dirichlet distribution

    p3

    p2

    p1

    p(w | z) Dir(~)

    ~ = (4, 4, 2)

    1 2 3 1 2 3 1 2 3

    1 = 1 2 = 1

    3 = 1

    1 = 1 2 = 1

    3 = 1

    1 = 1 2 = 1

    3 = 1

    Bayesian methodology:

    Distributions generatedfrom prior distributionsSpeech + other discretedata: Dirichlet distributionimportant prior:

    Defined on simplex:Surface containing alldiscrete distributionsParameter ~ controlsbehaviour

    Bayesian topic model:Latent Dirichlet Allocation(LDA) (Blei et al. 2003)

    Gregor Heinrich A generic approach to topic models 7 / 35

  • Latent Dirichlet Allocation

    wm,n

    zm,n

    ~k

    ~m

    document m

    word ntopic k

    Dir()

    Dir()

    Latent Dirichlet Allocation (Blei et al. 2003)

    Gregor Heinrich A generic approach to topic models 8 / 35

  • Latent Dirichlet Allocation

    wm,n

    zm,n

    ~k

    ~m

    document m

    word ntopic k

    Dir()

    Dir()

    Concert tonight at Rhythm and Spice Restaurant . . .

    Latent Dirichlet Allocation (Blei et al. 2003)

    Gregor Heinrich A generic approach to topic models 8 / 35

  • Latent Dirichlet Allocation

    wm,n

    zm,n

    ~k

    ~m

    document m

    word ntopic k

    Dir()

    restaurant

    bar food

    grill

    concertmusic

    rhythm ba

    r

    . . .

    . . .

    topic 2

    topic 1

    Concert tonight at Rhythm and Spice Restaurant . . .

    ~2

    ~1

    Dir()

    Generating word distributions for all topics

    Gregor Heinrich A generic approach to topic models 8 / 35

  • Latent Dirichlet Allocation

    wm,n

    zm,n

    ~k

    ~m

    document m

    word ntopic k

    Dir()

    restaurant

    bar food

    grill

    concertmusic

    rhythm ba

    r

    . . .

    . . .

    topic 2

    topic 1

    Concert tonight at Rhythm and Spice Restaurant . . .

    ~2

    ~1

    Dir()

    topic 1

    ~1

    topic 2 . . .

    document 1

    Generating topic distribution for document

    Gregor Heinrich A generic approach to topic models 8 / 35

  • Latent Dirichlet Allocation

    wm,n

    zm,n

    ~k

    ~m

    document m

    word ntopic k

    Dir()

    restaurant

    bar food

    grill

    concertmusic

    rhythm ba

    r

    . . .

    . . .

    topic 2

    topic 1

    Concert tonight at Rhythm and Spice Restaurant . . .

    ~2

    ~1

    Dir()

    topic 1

    ~1

    topic 2 . . .

    document 1

    Sampling the topic index for first word, z = 2

    Gregor Heinrich A generic approach to topic models 8 / 35

  • Latent Dirichlet Allocation

    wm,n

    zm,n

    ~k

    ~m

    document m

    word ntopic k

    Dir()

    restaurant

    bar food

    grill

    concertmusic

    rhythm ba

    r

    . . .

    . . .

    topic 2

    topic 1

    Concert tonight at Rhythm and Spice Restaurant . . .

    ~2

    ~1

    Dir()

    topic 1

    ~1

    topic 2 . . .

    document 1

    Sampling a word from term distribution for topic 2, concert

    Gregor Heinrich A generic approach to topic models 8 / 35

  • State of the art

    Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

    Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

    Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

    Gregor Heinrich A generic approach to topic models 9 / 35

  • State of the art

    Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

    Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

    Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

    wm,n

    zm,n

    cm, j

    ym, j

    ~am

    ~k ~k

    ~x

    xm, j xm,n

    m [1,M]n [1,Nm]j [1, Jm] k [1,K]k [1,K]

    x [1, A]

    Experttagtopic model

    (Heinrich 2011)

    Gregor Heinrich A generic approach to topic models 9 / 35

  • State of the art

    Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

    Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

    Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

    Gregor Heinrich A generic approach to topic models 9 / 35

  • State of the art

    Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

    Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

    Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

    Gregor Heinrich A generic approach to topic models 9 / 35

  • Research questions

    How can topic models be described in a generic way in order to usetheir properties across particular applications?

    Can generic topic models be implemented generically and, if so,can repeated structures be exploited for optimisations?

    How can generic models be applied to data in virtual communities?

    Gregor Heinrich A generic approach to topic models 10 / 35

  • Overview

    Introduction

    Generic topic models

    Inference methods

    Application to virtual communities

    Conclusions and outlook

    Gregor Heinrich A generic approach to topic models 11 / 35

  • How can topic models be described in a generic way in order to usetheir properties across particular applications?

    Gregor Heinrich A generic approach to topic models 12 / 35

  • Topic models: Example structures

    m[1,M]

    k[1,K] n[1,Nm]

    zm,n

    ~k wm,n

    ~m

    (a) Latent Dirichlet allocation (LDA)

    m[1,M ]

    k[1,K] n[1,Nm]

    zm,n

    cm,n

    ~c

    c[1,C]

    ~k wm,n

    ~m

    (b) Authortopic model (ATM)

    m[1,M]n[1,Nm]

    ~ ~k

    [1,L]

    wm,n

    y[1,K]

    ~x

    ~r

    z1m,n

    z2m,n~rm

    ~m,x z3m,n

    (c) Pachinko allocation model (PAM4)

    m[1,M]n[1,Nm]

    wm,n

    T[1,|T |]

    ~0 zTm,n~m,0

    ~m,T ztm,n~T

    ~ ~k

    k[1,|t|+|T |+1]

    m,n ~T,t

    T,t[1,|T ||t|]

    (d) Hierarchical PAM (hPAM)(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)

    Gregor Heinrich A generic approach to topic models 13 / 35

  • Generic topic models NoMMs

    xoutK

    xink= f (xin)

    ~k

    Dir(~k |)~3

    ~2

    ~1

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    xoutK

    xink= f (xin)

    ~k

    xoutK

    xin1

    xin2

    k= f (xin1 , xin2)

    xout1K

    xin

    xout2

    k= f (xin)

    ~k ~k

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    Kk= f (xin)

    ~kK

    xin1

    xin2

    k= f (xin1 , xin2)

    xout1K

    xout2

    k= f (xin)

    ~k ~k

    x1 x2

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~k |xin1 x1xin2 xout2

    xout1~k |~k | x

    2

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~k |xin1 x1xin2 xout2

    xout1~k |~k | x

    2

    Dir(~k |)Dir(~k |)Dir(~k |)

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~m |m

    ~k | zm,n

    Dir(~k |)Dir(~k |)

    wm,n

    LDA

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~m |m

    ~k | zm,n

    Dir(~k |)Dir(~k |)

    wm,n

    LDA

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~m |m=1

    ~k | zm,n

    Dir(~k |)Dir(~k |)

    wm,n

    LDA~1

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~m |m=1

    ~k | z1,1=3

    Dir(~k |)Dir(~k |)

    wm,n

    LDA~1

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~m |m=1

    ~k | z1,1=3

    Dir(~k |)Dir(~k |)

    wm,n

    LDA~3

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Generic topic models NoMMs

    ~m |m=1

    ~k | z1,1=3

    Dir(~k |)Dir(~k |)

    w1,1=2

    LDA~3

    Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x

    Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

    Gregor Heinrich A generic approach to topic models 14 / 35

  • Topic models as NoMMs

    ~m | zm,n=k

    [V][K]m

    [M]

    wm,n=t~k |

    (a) Latent Dirichlet allocation, LDA

    m

    [M]

    xm,n=x

    [A]~x|

    zm,n=k[V][K]

    wm,n=t~k | ~am

    (b) Authortopic model, ATM

    ~rm | ~rm

    [M]z2m,n=x

    [s1]~m,x|~x

    z3m,n=y[V][s2]

    wm,n=t~y | ~

    (c) Pachinko allocation model, PAMm

    [M]zTm,n=T

    ~m,T |~Tztm,n=t

    ~T,t |

    ~m,0|~0

    m

    [M]

    [V]

    wm,n~`,T,t|

    `m,n[|T |]

    [|t|]

    [3][|T |+|t|+1]

    zTm,n=T

    (d) Hierarchical pachinko allocation model, hPAM(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)

    m[1,M]

    k[1,K] n[1,Nm]

    zm,n

    ~k wm,n

    ~m

    m[1,M ]

    k[1,K] n[1,Nm]

    zm,n

    cm,n

    ~c

    c[1,C]

    ~k wm,n

    ~m

    m[1,M]n[1,Nm]

    ~ ~k

    [1,L]

    wm,n

    y[1,K]

    ~x

    ~r

    z1m,n

    z2m,n~rm

    ~m,x z3m,n

    m[1,M]n[1,Nm]

    wm,n

    T[1,|T |]

    ~0 zTm,n~m,0

    ~m,T ztm,n~T

    ~ ~k

    k[1,|t|+|T |+1]

    m,n ~T,t

    T,t[1,|T ||t|]

    Gregor Heinrich A generic approach to topic models 15 / 35

  • Overview

    Introduction

    Generic topic models

    Inference methods

    Application to virtual communities

    Conclusions and outlook

    Gregor Heinrich A generic approach to topic models 16 / 35

  • Can generic topic models be implemented generically. . . ?

    Gregor Heinrich A generic approach to topic models 17 / 35

  • Bayesian inference problem and Gibbs sampler

    Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A

    = Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

    In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)

    xm y w

    V1 H1 2 H2 3

    ~rm |r ~m,x| ~y |

    Gregor Heinrich A generic approach to topic models 18 / 35

  • Bayesian inference problem and Gibbs sampler

    Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A

    = Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

    In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)

    xm y w

    V1 H1 2 H2 3

    ~rm |r ~m,x| ~y |

    Gregor Heinrich A generic approach to topic models 18 / 35

  • Bayesian inference problem and Gibbs sampler

    Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A

    = Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

    In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)

    ~rm |rm

    ~m,x|w

    ~y | 1 H1 2 H2 3 V

    ? ? ?

    ? ?

    p( , , , , | , A)Gregor Heinrich A generic approach to topic models 18 / 35

  • Bayesian inference problem and Gibbs sampler

    Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A

    = Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

    In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)

    ~rm |rm

    ~m,x|w

    ~y | 1 H1 2 H2 3 V

    ? ? ?

    ? ?

    p( , , , , | , A)Gregor Heinrich A generic approach to topic models 18 / 35

  • Bayesian inference problem and Gibbs sampler

    Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A

    = Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

    In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)

    ~rm |rm

    ~m,x|w

    ~y | ? ?

    H1i H2i Vp( , | , A)Hi ,H1i ,H2i

    Gregor Heinrich A generic approach to topic models 18 / 35

  • Sampling distribution for NoMMs

    xm y w~rm |r ~m,x| ~y |

    Gibbs sampler can be generically derived (Heinrich 2009)

    Typical case: Quotients of factors over levels `:

    p(Hi|Hi,V ,A) `

    nik,t + t nik,t +

    [`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)

    More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )

    Gregor Heinrich A generic approach to topic models 19 / 35

  • Sampling distribution for NoMMs

    xm y w~rm |r ~m,x| ~y |

    Gibbs sampler can be generically derived (Heinrich 2009)

    Typical case: Quotients of factors over levels `:

    p(Hi|Hi,V ,A) `

    nik,t + t nik,t +

    [`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)

    More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )

    Gregor Heinrich A generic approach to topic models 19 / 35

  • Sampling distribution for NoMMs

    xm y wq(m, x) q((m,x), y) q(y,w)

    Gibbs sampler can be generically derived (Heinrich 2009)

    Typical case: Quotients of factors over levels `:

    p(Hi|Hi,V ,A) `

    nik,t + t nik,t +

    [`] = `

    [q(k, t)

    ][`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)

    More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )

    Gregor Heinrich A generic approach to topic models 19 / 35

  • Typology of NoMM substructures

    ~a |

    ~b |

    a

    b~k |

    z

    a~z |

    b

    ~a |~z |

    c

    c

    ~a |

    ~b |

    a

    b~z |

    c

    x

    z1

    yza

    ~z |b

    ~a |

    N1. Dirichletmultinomial

    v

    za~z |~ca

    b

    x

    a~x |

    b

    ~a | y~y |

    c

    q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)

    ca,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z1) q(b, z2) q(z1, c c) q(z2, c c)

    E2. Autonomous edges C2. Combined indices

    N2. Observed parameters E3. Coupled edges C3. Interleaved indices

    z2

    NoMM substructures: Nodes, edges/branches, componentindices/merging of edges:

    Representation via q-functions and likelihoodMultiple samples per data point: q(a, x y) for respective level

    Library incl. additional structures: alternative distributions,regression, aggregation etc. q-functions + other factors

    Gregor Heinrich A generic approach to topic models 20 / 35

  • Implementation: Gibbs meta-sampler

    NoMM codegenerator

    C/Java codeprototype

    Java VM native platform

    optimise

    codetemplates

    generate

    Java modelinstance

    C/Java codemodule

    validate deploydata =compile

    data =

    specificationtopic model

    Code generator for topic models in Java and C

    Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture

    Gregor Heinrich A generic approach to topic models 21 / 35

  • Implementation: Gibbs meta-sampler

    codegenerator

    C/Java codeprototype

    Java VM native platform

    optimise

    codetemplates

    generate

    Java modelinstance

    C/Java codemodule

    validate deploydata =compile

    data =

    specificationtopic model

    NoMM

    Code generator for topic models in Java and C

    Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture

    Gregor Heinrich A generic approach to topic models 21 / 35

  • Example NoMM script and generated kernel: hPAM2

    ~k | wmn

    xmn=x

    [V]

    ~m |m[M]

    [Y]

    [X]ymn=y

    ~xm,x|xxm[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + xx, y , 0 : k = 1 + X + y

    model = HPAM2

    description:

    Hierarchical PAM model 2 (HPAM2)

    sequences:

    # variables sampled for each (m,n)

    w, x, y : m, n

    network:

    # each line one NoMM node

    m >> theta | alpha >> x

    m,x >> thetax | alphax[x] >> y

    x,y >> phi[k] >> w

    # java code to assign k

    k : {

    if (x==0) { k = 0; }

    else if (y==0) k = 1 + x;

    else k = 1 + X + y;

    }.

    // hidden edge

    for (hx = 0; hx < X; hx++) {

    // hidden edge

    for (hy = 0; hy < Y; hy++) {

    mxsel = X * m + hx;

    mxjsel = hx;

    if (hx == 0)

    ksel = 0;

    else if (hy == 0)

    ksel = 1 + hx;

    else

    ksel = 1 + X + hy;

    pp[hx][hy] = (nmx[m][hx] + alpha[hx])

    * (nmxy[mxsel][hy] + alphax[mxjsel][hy])

    / (nmxysum[mxsel] + alphaxsum[mxjsel])

    * (nkw[ksel][w[m][n]] + beta)

    / (nkwsum[ksel] + betasum);

    psum += pp[hx][hy];

    } // for h

    } // for h

    sub

    root

    sup

    sub sub

    supwords

    words

    words

    words words words

    Gregor Heinrich A generic approach to topic models 22 / 35

  • Example NoMM script and generated kernel: hPAM2

    ~k | wmn

    xmn=x

    [V]

    ~m |m[M]

    [Y]

    [X]ymn=y

    ~xm,x|xxm[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + xx, y , 0 : k = 1 + X + y

    model = HPAM2

    description:

    Hierarchical PAM model 2 (HPAM2)

    sequences:

    # variables sampled for each (m,n)

    w, x, y : m, n

    network:

    # each line one NoMM node

    m >> theta | alpha >> x

    m,x >> thetax | alphax[x] >> y

    x,y >> phi[k] >> w

    # java code to assign k

    k : {

    if (x==0) { k = 0; }

    else if (y==0) k = 1 + x;

    else k = 1 + X + y;

    }.

    // hidden edge

    for (hx = 0; hx < X; hx++) {

    // hidden edge

    for (hy = 0; hy < Y; hy++) {

    mxsel = X * m + hx;

    mxjsel = hx;

    if (hx == 0)

    ksel = 0;

    else if (hy == 0)

    ksel = 1 + hx;

    else

    ksel = 1 + X + hy;

    pp[hx][hy] = (nmx[m][hx] + alpha[hx])

    * (nmxy[mxsel][hy] + alphax[mxjsel][hy])

    / (nmxysum[mxsel] + alphaxsum[mxjsel])

    * (nkw[ksel][w[m][n]] + beta)

    / (nkwsum[ksel] + betasum);

    psum += pp[hx][hy];

    } // for h

    } // for h

    Gregor Heinrich A generic approach to topic models 22 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 1

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 5

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 10

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 15

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 20

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 30

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 40

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 50

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 60

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 80

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 100

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 120

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 150

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 200

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 300

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • DocumentTopic distribution in Gibbs sampler

    Iteration 500, converged stationary state

    Documenttopic matrix (200 documents, 50 topics)

    Gregor Heinrich A generic approach to topic models 23 / 35

  • Fast sampling: Hybrid scaling methods

    1 10 100 1000iterations

    depend.indep.

    5000

    perplexity

    PAM4

    model dim. speedup

    LDA 500 30.2

    PAM4 40 40 7.4PAM4 24.1

    PAM4 49.8

    par. indep.ser.

    40 4040 40

    Serial and parallel scaling methods:Generalised results for LDA to generic NoMMs, specifically (Porteouset al. 2008; Newman et al. 2009) + novel approach

    Problem: Sampling space for stat. dependent variables: K L . . . Independence assumption: Separate samplers with dimensions

    K + L + . . . K L . . .Empirical result: Iterations , but topic quality comparable

    Hybrid approaches with independent samplers highly effectiveImplementation: complexity covered by meta-sampler

    Gregor Heinrich A generic approach to topic models 24 / 35

  • Overview

    Introduction

    Generic topic models

    Inference methods

    Application to virtual communities

    Conclusions and outlook

    Gregor Heinrich A generic approach to topic models 25 / 35

  • How can generic models be applied to data in virtual communities?

    Gregor Heinrich A generic approach to topic models 26 / 35

  • NoMM design process~a |

    ~b |

    a

    b~k |

    z

    a~z |

    b

    ~a |~z |

    c

    c

    ~a |

    ~b |

    a

    b~z |

    c

    x

    z1

    yza

    ~z |b

    ~a |

    N1. Dirichletmultinomial

    v

    za~z |~ca

    b

    x

    a~x |

    b

    ~a | y~y |

    c

    q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)

    ca,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z1) q(b, z2) q(z1, c c) q(z2, c c)

    E2. Autonomous edges C2. Combined indices

    N2. Observed parameters E3. Coupled edges C3. Interleaved indices

    z2

    Typology Library of NoMM substructures Idea: Construct models from simple substructures that connect

    terminal nodes:Terminal nodes multimodal data (virtual communities...)Substructures relationships in data; latent semantics

    Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)

    Gibbs distribution known! model behaviour: q(x, y) = rich get richerImplementation and test with Gibbs meta-sampler; possibly iteration

    Gregor Heinrich A generic approach to topic models 27 / 35

  • NoMM design process

    1 2 3 4 51 2 3 4 5Define Definemodelling task Createand metricsevidence model terminals Formulatemodelassumptions Composemodel andpredict properties

    6 7 8 9 106 7 8 9 10Write Generateand adapt ImplementNoMM script target metric Evaluatebased ontest corpus Optimiseand integrate fortarget platformGibbs samplerTypology Library of NoMM substructures

    Idea: Construct models from simple substructures that connectterminal nodes:

    Terminal nodes multimodal data (virtual communities...)Substructures relationships in data; latent semantics

    Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)

    Gibbs distribution known! model behaviour: q(x, y) = rich get richerImplementation and test with Gibbs meta-sampler; possibly iteration

    Gregor Heinrich A generic approach to topic models 27 / 35

  • Application: Expert finding with tag annotations

    authorship

    annotationRetrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    authorship

    Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags

    Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

    Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output

    Gregor Heinrich A generic approach to topic models 28 / 35

  • Application: Expert finding with tag annotations

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    14~wm 13

    5 14 33AB

    12

    TH

    AB

    TH

    5

    33

    ~am ~cm

    document m

    authors tags

    Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags

    Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

    Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output

    Gregor Heinrich A generic approach to topic models 28 / 35

  • Application: Expert finding with tag annotations

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    14~wm 13

    5 14 33AB

    12

    TH

    AB

    TH

    5

    33

    ~am ~cm

    document m

    authors tags

    Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags

    Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

    Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output

    Gregor Heinrich A generic approach to topic models 28 / 35

  • Application: Expert finding with tag annotations

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    14~wm 13

    5 14 33AB

    12

    TH

    AB

    TH

    5

    33

    ~am ~cm

    document m

    authors tags

    Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags

    Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

    Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output

    Gregor Heinrich A generic approach to topic models 28 / 35

  • Model assumptions

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    14~wm 13

    5 14 33AB

    12

    TH

    AB

    TH

    5

    33

    ~am ~cm

    document m

    authors tags

    (4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

    single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y

    Gregor Heinrich A generic approach to topic models 29 / 35

  • Model assumptions

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    14~wm 13

    5 14 33AB

    12

    TH

    AB

    TH

    5

    33

    ~am ~cm

    document m

    authors tags

    (4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

    single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y

    Gregor Heinrich A generic approach to topic models 29 / 35

  • Model assumptions

    Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

    14~wm 13

    5 14 33

    AB

    TH

    5

    33~cm

    AB

    12

    TH

    ~am

    document m

    authors tags

    (4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

    single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y

    Gregor Heinrich A generic approach to topic models 29 / 35

  • Model construction

    ~cm

    ~am wm,n[M,Nm]

    p(. . . |~a, ~w, ~c) . . .

    authors word

    tags

    (5) Model construction: (a) Start with terminal nodes (from step 3)

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Model construction

    ~cm

    . . .xm,n

    ~amm wm,n

    [M,Nm]

    p(x, . . . |) am,x q(x, . . . ) . . .

    document author word

    tags

    (b) Authorship ~am given as observed distribution node sampels author x of a word

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Model construction

    ~cm

    q(x, z)xm,n

    ~amm zm,n wm,n

    [M,Nm]

    p(x, z, . . . |) am,x q(x, z) . . .

    document author word topic word

    tags

    (c) Each author has only a single field of expertise (topic distribution) q(x, z) associates (word-)topics with sampled authors x (cf. ATM)

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Model construction

    ~cm

    q(x, z)xm,n

    ~amm

    q(z,w)zm,n wm,n

    [M,Nm]

    p(x, z, . . . |) am,x q(x, z) q(z,w) . . .

    document author word topic word

    tags

    (d) Topic distribution over terms connect z and w via q(z,w)

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Model construction

    cm, jq(x,zy)

    xm,nj~am

    mq(z,w)zm,n

    wm,n[M,Nm]

    p(x, z, y |) am,x q(x, z y) q(z,w) q(y, c)

    q(y, c) [M, Jm]ym, j

    document author

    word topic

    tag topic

    word

    tag

    (e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z y) overlays values for z and y

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Model construction

    cm, jq(x,zy)

    xm,n~am

    mq(z,w)zm,n

    wm,n[M,Nm]

    p(x, z, y |) am,x q(x, z y) q(z,w) q(y, c)

    q(y, c) [M, Jm]ym, j

    document author

    word topic

    tag topic

    word

    tag

    (e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z y) overlays values for z and y

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Model construction

    cm, jq(x,zy)

    xm, j~am

    mq(z,w)zm,n

    wm,n[M,Nm]

    p(x, z, y |) am,x q(x, z y) q(z,w) q(y, c)

    q(y, c) [M, Jm]ym, j

    document author

    word topic

    tag topic

    word

    tag

    (e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z y) overlays values for z and y

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Model construction ordinary approach

    wm,n

    zm,n

    cm, j

    ym, j

    ~am

    ~k ~k

    ~x

    xm, j xm,n

    m [1,M]n [1,Nm]j [1, Jm] k [1,K]k [1,K]

    x [1, A]

    Experttagtopic model

    (Heinrich 2011)

    Gregor Heinrich A generic approach to topic models 30 / 35

  • Experttagtopic model: Evaluation

    ATM ETT

    AP@

    10

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    ETT

    word queries tag queries

    ATM ETT-500

    -450

    -400

    -350

    -300

    -250

    -200

    -150

    coherencescore

    NIPS Corpus: 2.3 million words, 2037 authors, 165 tagsRetrieval: Average Precision @10:

    Term queries: ETT > ATMTag queries: Similarly good AP values

    Topic coherence (Mimno et al. 2011): ETT > ATMSemi-supervised learning: Tag queries retrieve items without tags

    Gregor Heinrich A generic approach to topic models 31 / 35

  • Overview

    Introduction

    Generic topic models

    Inference methods

    Application to virtual communities

    Conclusions and outlook

    Gregor Heinrich A generic approach to topic models 32 / 35

  • Conclusions: Resesarch contributions

    Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

    Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

    Design process based on typology of NoMM substructures

    Application to virtual communities: Experttagtopic model forexpert finding with annotated documentsContribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios

    Gregor Heinrich A generic approach to topic models 33 / 35

  • Conclusions: Resesarch contributions

    Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

    Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

    + Variational Inference for NoMMs (Heinrich and Goesele 2009)

    Design process based on typology of NoMM substructures

    Application to virtual communities: Experttagtopic model forexpert finding with annotated documentsContribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios

    Gregor Heinrich A generic approach to topic models 33 / 35

  • Conclusions: Resesarch contributions

    Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

    Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

    + Variational Inference for NoMMs (Heinrich and Goesele 2009)

    Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for

    scenario modelling (Heinrich 2010)

    Application to virtual communities: Experttagtopic model forexpert finding with annotated documentsContribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios

    Gregor Heinrich A generic approach to topic models 33 / 35

  • Conclusions: Resesarch contributions

    Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

    Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

    + Variational Inference for NoMMs (Heinrich and Goesele 2009)

    Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for

    scenario modelling (Heinrich 2010)

    Application to virtual communities: Experttagtopic model forexpert finding with annotated documents

    + Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)

    Contribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios

    Gregor Heinrich A generic approach to topic models 33 / 35

  • Conclusions: Resesarch contributions

    Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

    Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

    + Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures

    + AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)

    Application to virtual communities: Experttagtopic model forexpert finding with annotated documents

    + Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)

    + Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models

    Contribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios

    Gregor Heinrich A generic approach to topic models 33 / 35

  • Conclusions: Resesarch contributions4/15/12 1:47 AM

    Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/fmica.svg

    1.

    2.

    4.

    8.

    3.

    6.

    9.

    7.

    10.

    5.

    authors

    authors

    authorsauthors

    cites

    authorsauthors

    cites

    cites

    authors

    authors

    authors

    authors

    cites

    cites

    authors

    cites

    authors

    cites

    authors

    cites

    cites

    authorscites

    cites

    authors

    authors

    authors

    authorscites

    authorsauthors

    cites cites

    authors

    cites

    authors

    citescites authorsauthors

    authors

    authors

    authors

    cites

    cites

    authors

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    Yang_H1

    [blind source separation] [independent component analysis]

    Cichocki_A

    Bell_A2

    Rokni_U

    Lee_T1

    Parra_L1

    Shouval_H

    Oja_E1

    Lin_J

    Hyvarinen_A

    A New Learning Algorithm for Blind Signal Separation

    9

    Independent Component Analysis of Eiectroencephalographic Data

    4

    Unmixing Hyperspectral Data,

    4

    Factorizing Multivariate Function Classes,

    1

    A Non-linear Information Maximisation Algorithm that Performs Blind Separation

    4

    Maximum Likelihood Blind Source Separation A Context-Sensitive Generalization of ICA,

    4

    Semiparametric Approach to Multichannel Blind Deconvolution of Nonminimum Phase Systems,

    2

    One-unit Learning Rules for Independent Component Analysis,

    Extended ICA Removes Artifacts from Electroencephalographic Recordings,

    8

    Multiplicative Updating Rule for Blind Separation Derived from the Method of Scoring,

    1

    Algorithms for Independent Components Analysis and Higher Order Statistics,

    2

    Edges are the "Independent Components" of Natural Scenes,

    2

    Blind Separation of Delayed and Convolved Sources,

    2

    The Efficiency and the Robustness of Natural Gradient Descent Learning Rule,

    2

    Receptive Field Formation in Natural Scene Environments Comparison of Single Cell Learning Rules,

    4

    New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit,

    Search for Information Bearing Components in Speech,

    1

    Source Separation and Density Estimation by Faithful Equivariant SOM,

    2

    Data Visualization and Feature Selection New Algorithms for Nongaussian Data,

    1

    Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings,

    5

    Unsupervised Classification with Non-Gaussian Mixture Models Using ICA,

    3

    Sparse Code Shrinkage.' Denoising by Nonlinear Maximum Likelihood Estimation,

    2

    Symplectic Nonlinear Component Analysis

    Blind Separation of Filtered Sources Using State-Space Approach,

    1

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    [blind source separation] [independent component analysis]

    Figure: ETT1: Expert search in community browser

    Gregor Heinrich A generic approach to topic models 33 / 35

  • Conclusions: Resesarch contributions

    Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

    Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

    + Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures

    + AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)

    Application to virtual communities: Experttagtopic model forexpert finding with annotated documents

    + Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)

    + Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models

    Contribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios

    Gregor Heinrich A generic approach to topic models 33 / 35

  • Outlook

    New applications and NoMM structures, e.g., time as variableAlternative inference methods:

    Generic Collapsed Variational Bayes (Teh et al. 2007): Structuresimilar to Collapsed Gibbs-SamplerNon-parametric methods: Learning model dimensions using Dirichletor PitmanYor process priors (Teh et al. 2004; Buntine and Hutter2010), NoMM polymorphism (Heinrich 2011a)

    Improved support in design process:Data-driven design: Search over model structures to obtain bestmodel for data setArchitecture-specific Gibbs meta-sampler, e.g., massively-parallel orFPGA, cf. (Heinrich et al. 2011)

    Integration with interactive user interfaces: Models can be createdon the fly, e.g., for visual analytics

    Gregor Heinrich A generic approach to topic models 34 / 35

  • Thank you!

    Q + A

    Gregor Heinrich A generic approach to topic models 35 / 35

  • References I

    References

    Barnard, K., P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan (2003, August).Matching words and pictures.JMLR Special Issue on Machine Learning Methods for Text and Images 3(6), 11071136.

    Bellegarda, J. (2000, August).Exploiting latent semantic information in statistical language modeling.Proc. IEEE 88(8), 12791296.

    Blei, D., A. Ng, and M. Jordan (2003, January).Latent Dirichlet allocation.Journal of Machine Learning Research 3, 9931022.

    Buntine, W. and M. Hutter (2010).A Bayesian review of the Poisson-Dirichlet process.arXiv:1007.0296v1 [math.ST].

    Chang, J., J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei (2009).Reading tea leaves: How humans interpret topic models.In Proc. Neural Information Processing Systems (NIPS).

    Gregor Heinrich A generic approach to topic models 36 / 35

  • References II

    Dietz, L., S. Bickel, and T. Scheffer (2007, June).Unsupervised prediction of citation influences.In Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon,

    USA.

    Heinrich, G. (2009).A generic approach to topic models.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases

    (ECML/PKDD), Part 1, pp. 517532.

    Heinrich, G. (2010).Actorsmediaqualities: a generic model for information retrieval in virtual communities.In Proc. 7th International Workshop on Innovative Internet Community Systems (I2CS 2007), part

    of I2CS Jubilee proceedings, Lecture Notes in Informatics, GI.

    Heinrich, G. (2011a, March).Infinite LDA Implementing the HDP with minimum code complexity.Technical note TN2011/1, arbylon.net.

    Heinrich, G. (2011b).Typology of mixed-membership models: Towards a design method.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases

    (ECML/PKDD).

    Gregor Heinrich A generic approach to topic models 37 / 35

  • References III

    Heinrich, G. and M. Goesele (2009).Variational Bayes for generic topic models.In Proc. 32nd Annual German Conference on Artificial Intelligence (KI2009).

    Heinrich, G., J. Kindermann, C. Lauth, G. Paa, and J. Sanchez-Monzon (2005).Investigating word correlation at different scopes a latent concept approach.In Workshop Lexical Ontology Learning at Int. Conf. Mach. Learning.

    Heinrich, G., F. Logemann, V. Hahn, C. Jung, G. Figueiredo, and W. Luk (2011).HW/SW co-design for heterogeneous multi-core platforms: The hArtes toolchain, Chapter Audio

    array processing for telepresence, pp. 173207.Springer.

    Li, W., D. Blei, and A. McCallum (2007).Mixtures of hierarchical topics with pachinko allocation.In International Conference on Machine Learning.

    Li, W. and A. McCallum (2006).Pachinko allocation: DAG-structured mixture models of topic correlations.In ICML 06: Proceedings of the 23rd international conference on Machine learning, New York,

    NY, USA, pp. 577584. ACM.

    Gregor Heinrich A generic approach to topic models 38 / 35

  • References IV

    Mimno, D., H. M. Wallach, E. Talley, M. Leenders, and A. McCallum (2011, July).Optimizing semantic coherence in topic models.In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing,

    Edinburgh, UK, pp. 262272.

    Newman, D., A. Asuncion, P. Smyth, and M. Welling (2009, August).Distributed algorithms for topic models.JMLR 10, 18011828.

    Porteous, I., D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling (2008).Fast collapsed Gibbs sampling for latent Dirichlet allocation.In KDD 08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge

    discovery and data mining, New York, NY, USA, pp. 569577. ACM.

    Rosen-Zvi, M., T. Griffiths, M. Steyvers, and P. Smyth (2004).The author-topic model for authors and documents.In Proc. 20th Conference on Uncertainty in Artificial Intelligence (UAI).

    Teh, Y., M. Jordan, M. Beal, and D. Blei (2004).Hierarchical Dirichlet processes.Technical Report 653, Department of Statistics, University of California at Berkeley.

    Teh, Y. W., D. Newman, and M. Welling (2007).A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation.In Advances in Neural Information Processing Systems, Volume 19.

    Gregor Heinrich A generic approach to topic models 39 / 35

  • Appendix

    Gregor Heinrich A generic approach to topic models 40 / 35

  • Example: Text mining for semantic clusters

    Topic label Dominant terms according to k,t = p(term |topic)Bundesliga FC SC Munchen Borussia SV VfL Kickers SpVgg Uhr Koln Bochum Freiburg VfB

    Eintracht Bayern Hamburger Bayern+Munchen

    Polizei / Unfall Polizei verletzt schwer Auto Unfall Fahrer Angaben schwer+verletzt Menschen Wa-gen Verletzungen Lawine Mann vier Meter Strae

    Tschetschenien Rebellen russischen Grosny russische Tschetschenien Truppen Kaukasus MoskauAngaben Interfax tschetschenischen Agentur

    Politik / Hessen FDP Koch Hessen CDU Koalition Gerhardt Wagner Liberalen hessischen Wester-welle Wolfgang Roland+Koch Wolfgang+Gerhardt

    Wetter Grad Temperaturen Regen Schnee Suden Norden Sonne Wetter Wolken Deutsch-land zwischen Nacht Wetterdienst Wind

    Politik / Kroatien Parlament Partei Stimmen Mehrheit Wahlen Wahl Opposition Kroatien PrasidentParlamentswahlen Mesic Abstimmung HDZ

    Die Grunen Grunen Parteitag Atomausstieg Trittin Grune Partei Trennung Mandat Ausstieg AmtRoestel Jahren Muller Radcke Koalition

    Russische Politik Russland Putin Moskau russischen russische Jelzin Wladimir Tschetschenien Rus-slands Wladimir+Putin Kreml Boris Prasidenten

    Polizei / Schulen Polizei Schulen Schuler Tater Polizisten Schule Tat Lehrer erschossen BeamtenMann Polizist Beamte verletzt Waffe

    Bigram-LDA: Topics from 18400 dpa news messages, Jan. 2000 (Heinrich et al. 2005)

    Gregor Heinrich A generic approach to topic models 41 / 35

  • Notation: Bayesian network vs. NoMM levels

    ~k

    xi

    K

    ki

    ~k | [T ]xiki

    [K]

    parameters + hyperparameters nodes ( |)variables ki, xi edges ki, xi

    plates (i.i.d. repetitions) i, k indexes i + dimensions k

    Gregor Heinrich A generic approach to topic models 42 / 35

  • NoMM representation: Variable dependencies

    ~k ~k ~k

    ~k

    vi

    ~k

    ~k

    `=1 `=2 `=3 `=4 `=5 `=6

    ~k

    A

    X

    vi

    `=8`=7

    ~k

    vihi

    hihi

    hi hi

    ~k |

    `=1

    `=2 `=3

    `=4

    `=5 `=6

    ~k | ~k |~k |~k | ~k |

    ~k

    `=8

    h1i

    h2i

    h4ih3i v5i h6i

    v8i

    `=7

    ~k |v7i

    k1i

    k2i

    Gregor Heinrich A generic approach to topic models 43 / 35

  • Collapsed Gibbs sampler

    ~x (0)

    x2

    x1

    ~x

    p(~x |V)

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    ~x (0)

    x2

    x1

    ~x ~x continuous, i [1, 2]H discrete, i [1,W]

    l

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 1

    ~xp(x1 | x2)

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 1

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 2

    ~x

    p(x2 | x1)

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 2

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 1

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 1

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 2

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 1

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 2

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 1

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 2

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Collapsed Gibbs sampler

    x2

    x1i = 1

    ~x

    Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:

    Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior

    Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

    Gregor Heinrich A generic approach to topic models 44 / 35

  • Generic topic models: Generative process

    ~j

    ~kK

    J

    xi

    xi

    Ixi

    ...

    X X

    A

    Generative process on level `:

    xi Mult(xi | ~k) ~k Dir(~k | ~j) (2)k = fk(parents(xi), i) j = fj(known parents(xi), i) . (3)

    Gregor Heinrich A generic approach to topic models 45 / 35

  • Generic topic models: Complete-data likelihood

    Likelihood of all hidden and visible data X = {H,V} and parameters :p(X, |A) =

    `L

    [ i

    p(xi,out|~xi,in) data items Discrete unionsq

    k

    p(~k|~) components Dirichlet 4

    ][`]

    levels

    =`L

    [k

    f (~k, ~nk, ~)][`]

    ~nk = (nk,1, nk,2, ) (4)

    Product dependent on co-occurrences nk,t between input and outputvalues, xi,in=k and xi,out=t, on each level `

    There are variants to component selection xi,in=k

    There are mixture node variants, e.g., observed components

    Gregor Heinrich A generic approach to topic models 46 / 35

  • Generic topic models: Complete-data likelihood

    The conjugacy between the multinomial and Dirichlet distributions ofmodel levels leads to a simple complete-data likelihood:

    p(X, |A) =`

    i

    Mult(x`i |`, k`i )k

    Dir(~`k | ~`j ) (5)

    =`

    i

    ki,xi

    k

    1B(~j)

    t

    j1k,xi

    ` (6)=

    `

    k

    1B(~j)

    t

    j+nk,t1k,xi

    ` (7)=

    `

    k

    B(~nk + ~j)B(~j)

    Dir(~k |~nk + ~j)` (8)

    where brackets []` enclose a particular level `.nk,t is how often k and t co-occur.

    Gregor Heinrich A generic approach to topic models 47 / 35

  • Inference: Generic full conditionals

    Gibbs full conditionals are derived for groups of dependent hidden edges,Hdi Hd X and surrounding edges Sdi Sd considered observed. Alltokens co-located with a particular observation: Xdi = {Hdi , Sdi }.Full conditional via chain rule applied to (8) with integrated out:

    p(Hdi |X\Hdi ,A) =p(Hdi , S

    di |X\{Hdi , Sdi },A)

    p(Sdi |X\{Hdi , Sdi },A)(9)

    p(Xdi |X\Xdi ,A) =p(X |A)

    p(X\Xdi |A)(10)

    =`

    [k

    B(~nk + ~j)

    B(~nk\Xdi + ~j)]`

    (11)

    `{Hd ,Sd}

    [B(~nk + ~j)

    B(~nk\Xdi + ~j)]`

    (12)

    H11H21

    H12H22

    H13H23

    H14H24

    Hd2

    H1Hd

    Gregor Heinrich A generic approach to topic models 48 / 35

  • Inference: q-functions

    q(k, t) =B(~nk + ~j)

    B(~nk\xdi + ~j)|xdi |=1=

    nik,t + t nik,t +

    |xdi |=2=

    nk,t\xdi,1 + t nk,t\xdi,1 +

    nk,t\xdi,2 + + (xdi,1xdi,2)

    t nk,t\xdi,2 + + 1. . .

    Gregor Heinrich A generic approach to topic models 49 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn: sampling with over-replacement.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    Gregor Heinrich A generic approach to topic models 50 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn: sampling with over-replacement.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    Gregor Heinrich A generic approach to topic models 50 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn: sampling with over-replacement.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    Gregor Heinrich A generic approach to topic models 50 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn: sampling with over-replacement.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    Gregor Heinrich A generic approach to topic models 50 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    Gregor Heinrich A generic approach to topic models 51 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .8 6 4 +

    8 6 5 +B( )

    B( )

    ti =

    ti

    Gregor Heinrich A generic approach to topic models 51 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    8 6 4 +

    8 +

    ti

    Gregor Heinrich A generic approach to topic models 51 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .7 6 4 +

    8 6 5 +B( )

    B( )

    ti = { , }ui viui vi

    Gregor Heinrich A generic approach to topic models 51 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    ui

    ui

    Gregor Heinrich A generic approach to topic models 51 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    ui vi

    ui q(k, )u=v1

    Gregor Heinrich A generic approach to topic models 51 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    ui vi

    ui q(k, )u=vvi1

    Gregor Heinrich A generic approach to topic models 51 / 35

  • q-functions: Polya urn and sampling weights

    E{~k}

    Figure: Polya urn and discrete parameters.

    q(k, t) ,B(~nk + )B(~nik + )

    |t|=1=

    ntik,t +

    ntik + T= smoothed ratio of occurrences

    t={u,v}=

    nuik,u +

    nuik + Tnvik,v + + (u v)nvik + T + 1

    , q(k, u v). . .

    ui vi

    ui q(k, )u=v1

    Gregor Heinrich A generic approach to topic models 51 / 35

  • NoMM substructure library: Gibbs weights and likelihood9.3. SUB-STRUCTURE LIBRARY 171ID.Name

    Structure diagram Gibbs sampler weight w, Likelihood p for single token iModelled aspect, example models

    N1,E1,C1.DirMultnodes,unbranched

    =1 =2

    ziaik |

    bi,na | j

    C1B: k = ziN1B: j = f (ai, i)

    E1S(zi): n i

    C1A: k = i

    E1

    N1A: j 1

    w(z|) = q(a, z)q(z, b) E1S(zi): q(a, z)q(z, b1 b2 . . . bNi )p(b|a) = z a,zz,bMixture/admixture: LDA [Blei et al. 2003b], PAM [Li & McCallum 2006]; LDCC[Shafiei & Milios 2006] (E1S)

    N2.

    Observedparameters =1 =2

    ziaiz |ca

    bi w(z|ca, ) = ca,z q(z, b)p(b|a) = z ca,zz,bLabel distribution: ATM [Rosen-Zvi et al. 2004]

    N3.

    Non-Dirichletprior

    =2

    ziaiz |

    bi

    =1

    a | a p(a |)

    w(z|a, ) = p(zi |ai, a)q(z, b) ;M-step: estimate a [Blei & Laerty 2007]

    p(b|a) = z a,zz,bAlternative distributions on the simplex: CTM [Blei & Laerty 2007]: a exp, N (,); TLM [Wallach 2008]: hierarchy of Dirichlet priors

    N4.

    Non-discreteoutput

    =1

    ziaia |

    =2

    z | vi

    z p(z |)

    w(z|, ) = q(a, z)p(vi | z) ; M-step: estimate zp(v|a) = z a,z p(v | z)Non-multinomial observ.: Corr-LDA [Barnard et al. 2003], GMM [McLachlan &Peel 2000]: p(v|) = N (x | ,)

    N5+E4.

    Aggregation

    =2zi

    z |bi

    =3

    =1

    aia | zm

    m| ,vm

    m jm (z zj) regression

    w(z|zm, vm, ) = q(a, z) q(z,w)N (vm |v m,2) ;M-step: estimate v,

    2 |z,v (for linear regression, N5B)prediction: vm =

    vm

    Regression/supervised learning: Supervised LDA [Blei & McAulie 2007], Rela-tional topic model [Chang & Blei 2009]

    E2.

    Autonomousedges

    =1

    =2xi

    ai jx |

    bi

    a |

    =3

    y jy |

    c jE2A: i jE2B: i = j

    w(x, y|) = q(a, xy)q(x, b)q(y, c) E2A: q(ai j, xiy j) = q(ai, xiy j)q(a j, xiy j)p(b, c|a) = x a,xx,by a,yy,cCommon mixture of causes: Multimodal LDA [Ramage et al. 2009]

    E3.

    Couplededges

    =1

    =2zi

    aiz |

    bi

    a |

    =3

    z |ci

    w(z|) = q(a, z)q(z, b)q(z, c)p(b, c|a) = z a,zz,bz,cCommon cause for observations: Hidden relational model (HRM) [Xu et al. 2006],Link-LDA [Erosheva et al. 2004]

    C2.

    Combinedindices

    =1a |

    =2

    b |

    ai

    b j

    =3

    kci j

    xi

    y j |C2B: k = (xi, y j)C2A: k = (i, xi)

    C2C: k = g(i, j, xi, y j)

    w(x, y|) = q(a, x)q(b, y)q(k, c)p(c|a, b) = x[a,xy b,yk,c] , k = g(x, y, i, j)Dierent dependent causes, relation: hPAM [Li et al. 2007a], HRM [Xu et al.2006], Multi-LDA [Porteous et al. 2008a]

    C3.

    Interleavedindices

    =1a |

    =2

    b |

    ai

    b j

    =3

    |ci j

    xi

    z

    y j C3A: i jC3B: i = j

    C3A: w(xi, y j |) = q(ai, xi)q(b j, y j)q(xi, ci c j)q(y j, ci c j)C3B: w(x, y|) = q(a, x)q(b, y)[q(x, c c)](xy) [q(x, c c)q(y, c c)]1(xy)C3A: p(c |a) = x a,xx,c, p(c |b) sim., C3B: p(c|a, b) = x a,xx,cy b,yy,cDierent causes, same eect: proposed here

    C4.

    Switch

    =1a |

    =2

    b |

    ai

    bi

    z |

    z |

    =3ci

    =4

    di

    zi

    si

    s ?=0

    s ?=1

    w(z, s|) = q(a, z)[q(b, 1)q(z, c)](s1) [q(b, 2)q(z, d)](s2)p(c, d|a, b) = z a,z[b,0z,c + b,1z,d]Select complex submodels: Multi-grain LDA [Titov & McDonald 2008], Entity-topic model