matrix factorizations over non-conventional …pmiettin/slides/matrix...matrix factorizations over...

43
Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015

Upload: others

Post on 23-Apr-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Matrix Factorizations over

Non-Conventional Algebras for

Data MiningPauli Miettinen 28 April 2015

Page 2: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Chapter 1. A Bit of Background

Page 3: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Data

✔ ✔ ✘

✔ ✔ ✔

✘ ✔ ✔

long-hairedwell-known

male

Page 4: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Data

long-hairedwell-known

male

1 1 01 1 10 1 1( )

Page 5: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Factorization point of view

1 1 01 1 10 1 1( )

1 0

1 1

0 1( ) 1 1 0

0 1 1( )= ×○

Page 6: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Chapter 2. Boolean Matrix Factorization

Page 7: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Gian-Carlo Rota Foreword to Boolean matrix

theory and applications by K. H. Kim, 1982

In the sleepy days when the provinces of France were still quietly provincial, matrices with Boolean entries were a favored occupation of aging professors at the universities of Bordeaux and Clermont-Ferrand. But one day…

Page 8: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Boolean products and factorizations

• The Boolean matrix product of two binary matrices A and B is their matrix product under the Boolean semi-ring

• The Boolean matrix factorization of a binary matrix A expresses it as a Boolean product of two binary factor matrices B and C, that is, A = B◦C

(A �B)�j =Wk

�=1 ��kbkj

Page 9: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Matrix ranks• The (Schein) rank of a matrix A is the least number of

rank-1 matrices whose sum is A

• A = R1 + R2 + … + Rk

• Matrix is rank-1 if it is an outer product of two vectors

• The Boolean rank of binary matrix A is the least number of binary rank-1 matrices whose element-wise or is A

• The least k such that A = B◦C with B having k columns

Page 10: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Comparison of ranks• Boolean rank can be less than normal rank

• rankB(A) = O(log2(rank(A))) for certain A

⇒ Boolean factorization can achieve less error than SVD

• Boolean rank is never more than the non-negative rank

0B@1 1 01 1 10 1 1

1CA

Page 11: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

The many names of Boolean rank

• Minimum tiling (data mining)

• Rectangle covering number (communication complexity)

• Minimum bi-clique edge covering number (Garey & Johnson GT18)

• Minimum set basis (Garey & Johnson SP7)

• Optimum key generation (cryptography)

• Minimum set of roles (access control)

Page 12: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

123

Boolean rank and bicliques

1 0

1 1

0 1( )1

2

3

A

B

C

1 1 0

1 1 1

0 1 1( )1

23

A B C

1 1 0

0 1 1( )o=

A B C

Page 13: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Boolean rank and sets• The Boolean rank of a

matrix A is the least number of subsets of U(A) needed to cover every set of the induced collection C(A)

• For every C in C(A), if S is the collection of subsets, have subcollection SC such that

1 3

2

SS2SC S = C

Page 14: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Approximate factorizations

• Noise usually makes real-world matrices (almost) full rank

• We want to find a good low-rank approximation

• The goodness is measured using the Hamming distance

• Given A and k, find B and C such that B has k columns and |A – B◦C| is minimized

• No easier than finding the Boolean rank

Page 15: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

The many applications of Boolean factorizations

• Data mining

• noisy itemsets, community detection, role mining, …

• Machine learning

• multi-label classification, lifted inference

• Bioinformatics

• Screen technology

• VLSI design

• …

Page 16: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

The bad news• Computing the Boolean rank is NP-hard

• Approximating it is (almost) as hard as Clique [Chalermsook et al. ’14]

• Minimizing the error is hard

• Even to additive factors [M. ’09]

• Given one factor matrix, finding the other is NP-hard

• Even to approximate well [M. ’08]

Page 17: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Some algorithms• Exact / Boolean rank

• reduction to clique [Ene et al. ’08]

• GreEss [Bělohlávek & Vychodil ’10]

• Approximate

• Asso [M. et al. ’06]

• Panda+ (error & MDL) [Lucchese et al. ’13]

• Nassau (MDL) [Karaev et al. ’15]

Page 18: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Chapter 3. Dioids Are Not Droids

Page 19: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Intuition of matrix multiplication

• Element (AB)ij is the inner product of row i of A and column j of B

Page 20: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Intuition of matrix multiplication

• Matrix AB is a sum of k matrices alblT

obtained by multiplying the l-th column of A with the l-th row of B

��

Page 21: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Remember at least this slide

• A matrix factorization presents the input matrix as a sum of rank-1 matrices

• A matrix factorization presents the input matrix as an aggregate of simple matrices

• What “aggregate” and “simple” mean depends on the algebra

Page 22: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Dioids are not droids• Dioid is also not a diode

• Dioid is an idempotent semiring S = (A, ⊕, ⊗, ⓪, ①)

• Addition ⊕ is idempotent

• a + a = a for all a ∈ A

• Addition is not invertible

Page 23: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Some examples (1)

• The Boolean algebra B = ({0,1}, ∨, ∧, 0, 1)

• The subset lattice L = (2U, ∪, ∩, ∅, U) is isomorphic to Bn

• The Boolean matrix factorization expresses matrix A as A ≈ B⊗BC where all matrices are Boolean

Page 24: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Some examples (2)

• Fuzzy logic F = ([0, 1], max, min, 0, 1)

• Generalizes (relaxes) Boolean algebra

• Exact k-decomposition under fuzzy logic implies exact k-decomposition under Boolean algebra

Page 25: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Fuzzy example0B@1 1 0 01 1 1 10 1 0 10 1 1 1

1CA ⇡

0B@1 01 10 10 1

1CA⌦FÅ1 1 0 10 1 2/3 1

ã

=

0B@1 1 0 01 1 2/3 10 1 2/3 10 1 2/3 1

1CA

Page 26: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Some examples (3)

• The or–Łukasiewicz algebra

• Ł = {[0,1], max, ⊗Ł, 0, 1}

• a ⊗Ł b = max(0, a + b – 1)

• Used to decompose matrices with ordinal values [Bělohlávek & Krmelova ’13]

Page 27: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Some examples (4)

• The max-times (or subtropical) algebra M = (ℝ≥0, max, ×, 0, 1)

• Isomorphic to the tropical algebra T = (ℝ∪{–∞}, max, +, –∞, 0)

• T = log(M) and M = exp(T)

Page 28: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Why max-times?

• One interpretation: Only strongest reason matters (a.k.a. the winner takes it all)

• Normal algebra: rating is a linear combination of movie’s features

• Max-times: rating is determined by the most-liked feature

Page 29: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Max-times example0B@1 1 0 01 1 1 10 1 0 10 1 1 1

1CA ⇡

0B@1 01 10 2/30 1

1CA⌦MÅ1 1 0 10 1 2/3 1

ã

=

0B@1 1 0 01 1 2/3 10 2/3 4/9 2/30 1 2/3 1

1CA

Page 30: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

On max-times algebra

• Max-times algebra relaxes Boolean algebra (but not fuzzy logic)

• Rank-1 components are “normal”

• Easy to interpret?

• Not much studied

Page 31: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

On tropical algebras• A.k.a. max-plus, extremal, maximal algebra

• Much more studied than max-times

• Can be used to solve max-times problems, but needs care with the errors

• If in max-plus then in max-times, where kX � eXk �kX0 �›X0k M2�M = exp(m�x�,j{X �j, eX �j})

Page 32: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

More max-plus• Max-plus linear functions:

f(x) = fT⊗x = max{fi+xi}

• f(α⊗x ⊕ β⊗y) = α⊗f(x) ⊕ β⊗f(y)

• Max-plus eigenvectors and values: X⊗v = λ⊗v (maxj{xij + vj} = λ + vi for all i)

• Max-plus linear systems: A⊗x = b

• Solving in pseudo-P for integer A and b

Page 33: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Computationalcomplexity

• If exact k-factorization over semiring K implies exact k-factorization over B, then finding the K-rank of a matrix is NP-hard (even to approximate)

• Includes fuzzy, max-times, and tropical

• N.B. feasibility results in T often require finite matrices

Page 34: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Anti-negativity and sparsity

• A semiring is anti-negative if no non-zero element has additive inverse

• Some dioids are anti-negative, others not

• Anti-negative semirings yield sparse factorizations of sparse data

Page 35: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Chapter 4. Even More General

Page 36: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Community detection• Boolean factorization can be considered as a community detection

method

• But not all communities are cliques

• “Beyond the blocks”

• Are matrix factorizations outdated models for graph communities before they even took off?

Beyond Blocks: Hyperbolic Community Detection 51

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Fig. 1. Motivation for our work: Realground-truth community

0

100

200

300

400

0 100 200 300 400

Fig. 2. Result of our work: Commu-nity found by HyCoM-FIT

degree distributions. As such, they are typically best represented as having anhyperbolic structure in the adjacency matrix, rather than rectangular (uniform)structure. We detail HyCoM - the Hyperbolic Community Model - as a bet-ter representation of communities and the relationships between their members,and introduce HyCoM-FIT as a scalable algorithm to detect communities withhyperbolic structure. To illustrate our model and algorithm, Figure 1 representsthe adjacency matrix of a real (ground-truth) community externally providedwhen nodes are ordered by degree, and Figure 2 shows the adjacency matrix ofan exemplary community found by our algorithm. Clearly, both communities donot show uniform density. In a nutshell, the main contributions of our work are:

– Introduction of the Hyperbolic Community Model: We provide empiri-cal evidence that communities in large, real social graphs are better modeledusing an hyperbolic model. We also show that this model is better from acompression perspective than previous models.

– Scalability: We develop HyCoM-FIT, an algorithm for the detection ofhyperbolic communities that scales linearly with the number of edges.

– No user-defined parameters: HyCoM-FIT detects communities in aparameter-free fashion, transparent to the end-user.

– Effectiveness: We applied HyCoM-FIT on real data where we discoveredcommunities that agree with intuition.

– Generality: HyCoM includes uniform block communities used by otheralgorithms as a special case.

2 Background and Related Work

Nodes in real-world networks organize into communities or clusters, which tendto exhibit a higher degree of ‘cohesiveness’ with respect to the underlying rela-tional patterns. Group formation is natural in social networks as people organizein families, clubs and political organizations; see e.g., [19]. Communities also

Page 37: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Generalized outer product

• A generalized outer product is a function o(x, y, θ)

• Returns an n-by-m matrix A

• If xi = 0 or yj = 0, then (A)ij = 0

• Compare to xyT

Page 38: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Example• Generalized outer product for biclique core

• Binary vector x to select the subgraph • Set C to define the nodes in the core

• (o(x, x, C))ij = 1 if xi = xj = 1 and exactly one of i and j is in C

0BB@

11...1

1CCA

�1 1 · · · 1�

} = C

Page 39: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Generalized decomposition

• A generalized matrix decomposition decomposes input matrix A into a sum of generalized outer products

• A = o(x1, y1, θ1) ⊕ o(x2, y2, θ2) ⊕ … ⊕ o(xk, yk, θk)

• Sum can be over any semi-ring

• The generalized rank is defined as expected

Page 40: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Why generalize?

• Provides an unifying framework

• Some algorithms and many computational hardness results generalize well

• Depend more on the addition ⊕ than on the outer product

Page 41: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Some results• Finding the largest-circumference rank-1 submatrix

is NP-hard if the outer product is hereditary

• Generalizes results for nestedness

• Given a set of binary rank-1 matrices, finding the smallest exact sub-decomposition from them is NP-hard if addition is either OR, AND, or XOR

• But exact hardness depends on the algebra

Page 42: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Chapter 5. The Chapter to Remember

Page 43: Matrix factorizations over non-conventional …pmiettin/slides/Matrix...Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1

Conclusions• Matrix factorizations are just a way to express

complex data as an aggregate of simple parts

• Normal and Boolean algebras are the best-studied ones

• but by no means the only possible ones

• Generalizing the outer product gives even more versatile language

Thank You!