searching for optimal patterns in boolean tensors
TRANSCRIPT
![Page 1: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/1.jpg)
Searching for optimal patterns in Booleantensors
Dmitry I. Ignatov∗, Dmitry V. Gnatyshak∗, Sergei O.Kuznetsov∗, and Jaume Baixeries�
∗National Research University Higher School of Economics, Moscow, Russia�Universitat Politecnica de Catalunya, Barcelona
Tensor-Learn@NIPS 2016, Barcelona
1 / 11
![Page 2: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/2.jpg)
Motivation
A large amount of structured and unstructured data generatesn-ry Boolean tensors (n-ary relations).E.g. folksonomy is a set of triples (user, object, tag)
Concrete examples:Bibsonomy.org (user,bookmark, tag)Social networking sites(user, group, interest)Delicious (user, link,tag) Figure : Folksonomy as a
graph.
2 / 11
![Page 3: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/3.jpg)
Main research question
General question: How to find interesting patterns(subtensors) in Boolean tensors?Concrete question: Which triclusters are goodapproximation of the triconcepts of a given triadic data?
3 / 11
![Page 4: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/4.jpg)
Concept lattices as concept hierarchies[R.Wille, 1982], [B.Ganter and R.Wille, 1999]
({1,2,3,4},∅)
({1,4},{d}) ({2,3,4},{c}) ({1,2},{a})
({3,4},{b,c})
({1},{a,d})
(∅,M)
({4},{b,c,d}) ({2},{a,c})
G \ M a b c d
1 × ×
2 × ×
3 × ×
4 × × ×
a – has exactly 3 vertices,b – has exactly 4 vertices,c – has a right angle,d – is equilateral
4 / 11
![Page 5: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/5.jpg)
Concept-based biclustering[D. Ignatov and S. Kuznetsov, 2010]
Let K = (G,M, I ⊆ G×M) be a formal context (Booleanmatrix).
Definition 1If (g,m) ∈ I, then (m′, g′) is called an object-attribute orOA-bicluster with density ρ(m′, g′) = |I∩(m′×g′)|
|m′|·|g′| .
5 / 11
![Page 6: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/6.jpg)
Geometric interpretation of OA-bicluster and complexity[D. Ignatov and S. Kuznetsov, 2010]
g
m
g''
m''
g'
m'
OA-biclustering versus FCA
O(|I| · |G| · |M |) (O(|I|) in online case) VS O(|L| · |G|2 · |M |)6 / 11
![Page 7: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/7.jpg)
OAC triclustering based on prime operators[Dmitry Gnatyshak et al., 2012]
Let K = (G,M,B, Y ) be a triadic context (3D Boolean tensor).Prime operators:
(g,m)′ = { b | (g,m, b) ∈ Y }
(g, b)′ = {m | (g,m, b) ∈ Y }
(m, b)′ = { g | (g,m, b) ∈ Y }
DefinitionFor (g,m, b) ∈ I an OAC-tricluster based on prime operators isa triple T = ((m, b)′, (g, b)′, (g,m)′).
7 / 11
![Page 8: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/8.jpg)
Geometric interpretation: a cross-like structure again[D. Gnatyshak et al., 2011]
Prime based OAC-triclusters are more dense than box operator based ones.Every element corresponding to the “grey” cell is an element of Y .
Figure : Prime operator based tricluster structure
8 / 11
![Page 9: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/9.jpg)
Experiments: Pareto optimalityMachine Learning 101(1-3): 271-302 (2015)
Figure : Pairwise criterion graphs for IMDB dataset9 / 11
![Page 10: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/10.jpg)
OAC-primeMachine Learning 101(1-3): 271-302 (2015)
It is one of the fastest algorithmsModerately large number of dense well-interpreted triclustersFor ρmin = 0 coverage is equal to 1, but remains high for differentρmin
Diversities are also rather highLow efficiency of parallelization
Examples of the triclusters for the IMDB context:
1 36%, {The Shawshank Redemption (1994), Cool Hand Luke (1967),American History X (1998), A Clockwork Orange (1971), The GreenMile (1999)}, {Prison, Murder, Friend, Shawshank, Banker}, {Crime,Drama}
2 56, 67%, {The Godfather: Part II (1974), The Usual Suspects (1995)},{Cuba, New York, Business, 1920s, 1950s}, {Crime, Drama, Thriller}
3 60%, {Toy Story (1995), Toy Story 2 (1999)}, {Jealousy, Toy,Spaceman, Little Boy, Fight}, {Fantasy, Comedy, Animation, Family,Adventure}
10 / 11
![Page 11: Searching for optimal patterns in Boolean tensors](https://reader037.vdocuments.net/reader037/viewer/2022093013/58e65fb11a28ab8d758b4fe7/html5/thumbnails/11.jpg)
Thank you!
11 / 11