c o r p o r a t e t e c h n o l o g y information & communications intelligent autonomous...
Post on 20-Dec-2015
217 views
TRANSCRIPT
C
O R
P O
R A
T E
T
E C
H N
O L
O G
Y
Information & Communications
IntelligentAutonomous
Systems
Infinite Hidden Relational Models
Zhao Xu1, Volker Tresp2, Kai Yu2, Shipeng Yu and Hans-Peter Kriegel1
1 University of Munich, Germany
2 Siemens Corporate Technology, Munich, Germany
© Siemens AG, CT IC 4
Motivation
• Relational learning is an object oriented approach to representation and learning that clearly distinguishes between entities (e.g., objects), relationships and their respective attributes and represents an area of growing interest in machine learning
• Learned dependencies encode probabilistic constraints in the relational domain
• Many relational learning approaches involve extensive structural learning, which is makes RL somewhat tricky to apply in practice
• The goal of this work is an easy to apply generic system which relaxes the need for extensive structural learning
• In the infinite hidden relational model (IHRM) we introduce for each entity an infinite-dimensional latent variable, whose state is determined by a Dirichlet process
• The resulting representation is a network of interacting DPs
© Siemens AG, CT IC 4
Work on DPs in Relational Learning
• C. Kemp, T. Griffiths, and J. R. Tenenbaum (2004). Discovering Latent Classes in Relational Data (Technical Report AI Memo 2004-019)
• Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. AAAI 2006
• Z. Xu, V. Tresp, K. Yu, S. Yu, and H.-P. Kriegel (2005). Dirichlet enhanced relational learning. In Proc. 22nd ICML, 1004-1011. ACM Press
• Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel. Infinite hidden relational models. In Proc. 22nd UAI, 2006
• P. Carbonetto, J. Kisynski, N. de Freitas, and D. Poole. Nonparametric bayesian logic. In Proc. 21st UAI, 2005.
© Siemens AG, CT IC 4
A AR AR AR
A AR AR AR
R R R R
A AR AR AR
R R R R
Ground Network With an Image Structure
• Ground Network
• A: entity attributes
• R: relational attributes (e.g., exist, not exist)
• Limitations
• Attributes locally predict the probability of a relational attribute
• Given the parent attributes, all relational attributes are independent
• To obtain non local dependency: structural learning might be involves
© Siemens AG, CT IC 4
Ground Network With an Image Structure and Latent Variables: The HRM
• Z: latent variable
• Information can now flow through the network of latent variables
• In an IHRM, Z can be thought of as representing unknown attributes (such as a cluster attribute)
• Note, that in image processing, Z would correspond to the true pixel value, A to a noisy measurement and R would encode neighboring pixel value constraints
R R R
R R R
R R R R
R R R
R R R R
ZA
ZA
ZA
ZA
ZA
ZA
ZA
ZA
ZA
ZA
ZA
ZA
© Siemens AG, CT IC 4
A Recommendation System
items
users
A A A A
A A A A
R R R R R R R R
R R R R R R R R
ZA
ZA
ZA
ZA
Z A Z A Z A Z A
• A relational attribute (like) only depends on the attributes of the user and the item
• If both attributes are weak, we’re stuck
• A relational attribute (like) only depends on the states of the latent variables of user and item
• If entity attributes are weak, other known relations are exploits, we exploit collaborative information items
users
© Siemens AG, CT IC 4
The Hidden Relational Model (HRM)
items
users • A relational attribute (like) only depends on the states of the latent variables of user and item
• If entity attributes are weak, other known relations are exploits, we exploit collaborative information
R
Z A
Z A
G0bb
u
m
u
0
m
0
G0uu
G0mm
Multinomial with Dirichlet priors;
Three Base Distributions
For a DP model, the number of states becomes infinite; the prior distribution for is denoted as
The Infinite Hidden Relational Model (IHRM)
)(Stick~ 0
© Siemens AG, CT IC 4
Inference in the IHRM
1. Gibbs sampler derived from the Chinese restaurant process representation (Kemp et al. 2004, 2006, Xu et al. 2006);
2. Gibbs sampler derived finite approximations to the stick breaking representation
1. Dirichlet multinomial allocation
2. Truncated Dirichlet process
3. Two mean field approximations based on those procedures
4. A memory-based empirical approximation (EA)
(2,3,4 in Xu et al 2006, submitted)
© Siemens AG, CT IC 4
Experimental Analysis on Movie Recommendation (1)
• Task description
• To predict whether a user likes a movie given attributes of users and movies, as well as known ratings of users.
• Data set: MovieLens
• Model
User
Like
UserAttributes
Movie
R
MovieAttributes
Zu
Zm
G0u
G0m
G0bb
u
m
u
m
u
0
m
0
© Siemens AG, CT IC 4
Experimental Analysis on Movie Recommendation (2)
• Result
MethodPrediction Accuracy (%)
Time (s) #Compu #Compm
given5 given10 given15 given20
GS-TDP 65.71 66.47 66.99 68.33 23497 67 41
MF-TDP 65.06 65.38 66.54 67.69 1014 9 6
EA 63.91 64.10 64.55 64.55 386 --- ---
Note, for GS-TDP and MF-TDP, α0=100
943 users, 1680 movies
© Siemens AG, CT IC 4
Experimental Analysis on Gene Function Prediction (1)
• Task description
• To predict functions of genes given the information on the gene-level and the protein-level, as well as interaction between genes.
• Data set: KDD Cup 2001
• Model
© Siemens AG, CT IC 4
Gene
Interact
Zg
Rg,g
GeneAttributes
Phenotype
Zp
Observe
Rg,p
StructuralCategory Zcl
belong Rg,cl
Motif
Zm
Contain
Rg,m
Complex
Zc
Form
Rg,c
Function
Zf
Have
Rg,f
Experimental Analysis on Gene Function Prediction (2)
© Siemens AG, CT IC 4
• An example gene
Attribute Value
Gene ID G234070
Essential Non-Essential
Structural Category 1, ATPases 2, Motorproteins
Complex Cytoskeleton
Phenotype Mating and sporulation defects
Motif PS00017
Chromosome 1
Function
1, Cell growth, cell division and DNA synthesis
2, Cellular organization
3, Cellular transport and transport mechanisms
Experimental Analysis on Gene Function Prediction (3)
© Siemens AG, CT IC 4
• Results
Algorithm Accuracy(%) #Compgene
GS-TDP 89.46 15
MF-TDP 91.96 742
EA 93.18 ---
Kdd cup winner
93.63 ---
Experimental Analysis on Gene Function Prediction (4)
© Siemens AG, CT IC 4
• Results
RelationshipsPrediction Accuracy (%)
(without the relationship)Importance
Complex 91.13 197
Interaction 92.14 100
Structural Category 92.61 55
Phenotype 92.71 45
Attributes of Gene 93.08 10
Motif 93.12 6
The importance of a variety of relationships in function prediction of genes
Experimental Analysis on Gene Function Prediction (5)
© Siemens AG, CT IC 4
Experimental Analysis on Clinical Data (1)
• Task description
• To predict future procedures for patients given attributes of patients and procedures, as well as prescribed procedures and diagnosis of patients.
• Model
© Siemens AG, CT IC 4
Patient
Take
PatientAttributes
Procedure
Rpa,pr pa,pr
ProcedureAttributes
Zpa
Zpr
pa
pa
θpr
pr
α0pa
G0pa
G0pr
G0pa,pr
α0pr
Make
Diagnosis
Rpa,dg pa,dg
DiagnosisAttributesZdg θdg
dg
G0dg
G0pa,dg
α0dg
Experimental Analysis on Clinical Data (2)
© Siemens AG, CT IC 4
Experimental Analysis on Clinical Data (3)
• Results
ROC curves for predicting procedures, average on all patients
ROC curves for predicting procedures, only considering patients with prime complaint circulatory problem
E1: one-sided CF E2: 2-sided CF E3: full model E4: no hidden E5: content based BN
© Siemens AG, CT IC 4
Conclusion
• The IHRM is a new nonparametric hierarchical Bayes model for relational modeling
• Advantages
• Reducing the need for extensive structural learning
• Expressive ability via coupling between heterogeneous relationships
• The model decides itself about the optimal number of states for the latent variables.
• Scaling:
• # of entities times # of occupied states times # of known relations
• Note: default relations (example: by default there is no relation) can often be treated as unknown and drop out
• Conjugacy can be exploited
© Siemens AG, CT IC 4
A memory-based empirical approximation
• First, we assume the number of components to be equal to the corresponding entities in the corresponding entity class
• Then in the training phase each entity contributes to its own class only
• Based on this simplification the parameters in the attributes and relations can be learned very efficiently. Note that this approximation can be interpreted as relational memory-based learning
• To predict a relational attributes we assume that only the states of the latent variables involved in the relation are unknown