searching for optimal patterns in boolean tensors

11
Searching for optimal patterns in Boolean tensors Dmitry I. Ignatov * , Dmitry V. Gnatyshak * , Sergei O. Kuznetsov * , and Jaume Baixeries * National Research University Higher School of Economics, Moscow, Russia Universitat Polit` ecnica de Catalunya, Barcelona Tensor-Learn@NIPS 2016, Barcelona 1 / 11

Upload: dmitrii-ignatov

Post on 06-Apr-2017

106 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Searching for optimal patterns in Boolean tensors

Searching for optimal patterns in Booleantensors

Dmitry I. Ignatov∗, Dmitry V. Gnatyshak∗, Sergei O.Kuznetsov∗, and Jaume Baixeries�

∗National Research University Higher School of Economics, Moscow, Russia�Universitat Politecnica de Catalunya, Barcelona

Tensor-Learn@NIPS 2016, Barcelona

1 / 11

Page 2: Searching for optimal patterns in Boolean tensors

Motivation

A large amount of structured and unstructured data generatesn-ry Boolean tensors (n-ary relations).E.g. folksonomy is a set of triples (user, object, tag)

Concrete examples:Bibsonomy.org (user,bookmark, tag)Social networking sites(user, group, interest)Delicious (user, link,tag) Figure : Folksonomy as a

graph.

2 / 11

Page 3: Searching for optimal patterns in Boolean tensors

Main research question

General question: How to find interesting patterns(subtensors) in Boolean tensors?Concrete question: Which triclusters are goodapproximation of the triconcepts of a given triadic data?

3 / 11

Page 4: Searching for optimal patterns in Boolean tensors

Concept lattices as concept hierarchies[R.Wille, 1982], [B.Ganter and R.Wille, 1999]

({1,2,3,4},∅)

({1,4},{d}) ({2,3,4},{c}) ({1,2},{a})

({3,4},{b,c})

({1},{a,d})

(∅,M)

({4},{b,c,d}) ({2},{a,c})

G \ M a b c d

1 × ×

2 × ×

3 × ×

4 × × ×

a – has exactly 3 vertices,b – has exactly 4 vertices,c – has a right angle,d – is equilateral

4 / 11

Page 5: Searching for optimal patterns in Boolean tensors

Concept-based biclustering[D. Ignatov and S. Kuznetsov, 2010]

Let K = (G,M, I ⊆ G×M) be a formal context (Booleanmatrix).

Definition 1If (g,m) ∈ I, then (m′, g′) is called an object-attribute orOA-bicluster with density ρ(m′, g′) = |I∩(m′×g′)|

|m′|·|g′| .

5 / 11

Page 6: Searching for optimal patterns in Boolean tensors

Geometric interpretation of OA-bicluster and complexity[D. Ignatov and S. Kuznetsov, 2010]

g

m

g''

m''

g'

m'

OA-biclustering versus FCA

O(|I| · |G| · |M |) (O(|I|) in online case) VS O(|L| · |G|2 · |M |)6 / 11

Page 7: Searching for optimal patterns in Boolean tensors

OAC triclustering based on prime operators[Dmitry Gnatyshak et al., 2012]

Let K = (G,M,B, Y ) be a triadic context (3D Boolean tensor).Prime operators:

(g,m)′ = { b | (g,m, b) ∈ Y }

(g, b)′ = {m | (g,m, b) ∈ Y }

(m, b)′ = { g | (g,m, b) ∈ Y }

DefinitionFor (g,m, b) ∈ I an OAC-tricluster based on prime operators isa triple T = ((m, b)′, (g, b)′, (g,m)′).

7 / 11

Page 8: Searching for optimal patterns in Boolean tensors

Geometric interpretation: a cross-like structure again[D. Gnatyshak et al., 2011]

Prime based OAC-triclusters are more dense than box operator based ones.Every element corresponding to the “grey” cell is an element of Y .

Figure : Prime operator based tricluster structure

8 / 11

Page 9: Searching for optimal patterns in Boolean tensors

Experiments: Pareto optimalityMachine Learning 101(1-3): 271-302 (2015)

Figure : Pairwise criterion graphs for IMDB dataset9 / 11

Page 10: Searching for optimal patterns in Boolean tensors

OAC-primeMachine Learning 101(1-3): 271-302 (2015)

It is one of the fastest algorithmsModerately large number of dense well-interpreted triclustersFor ρmin = 0 coverage is equal to 1, but remains high for differentρmin

Diversities are also rather highLow efficiency of parallelization

Examples of the triclusters for the IMDB context:

1 36%, {The Shawshank Redemption (1994), Cool Hand Luke (1967),American History X (1998), A Clockwork Orange (1971), The GreenMile (1999)}, {Prison, Murder, Friend, Shawshank, Banker}, {Crime,Drama}

2 56, 67%, {The Godfather: Part II (1974), The Usual Suspects (1995)},{Cuba, New York, Business, 1920s, 1950s}, {Crime, Drama, Thriller}

3 60%, {Toy Story (1995), Toy Story 2 (1999)}, {Jealousy, Toy,Spaceman, Little Boy, Fight}, {Fantasy, Comedy, Animation, Family,Adventure}

10 / 11

Page 11: Searching for optimal patterns in Boolean tensors

Thank you!

11 / 11