a generalized maximum entropy approach to bregman co clustering

Author : Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. ModhaSource : KDD ’04, August 22-25, 2004, ACM, pp. 509- pp.514Presenter : Allen Wu

112/04/09

Introduction Bregman divergences Bregman co-clustering Algorithm Experiments Conclusion

112/04/09

Information-theoretic co-clustering (ITCC) model the co-clustering problem as the joint probability distribution.

We seek a co-clustering of both dimensions such that loss in “Mutual Information”

is minimized given a fixed no. of row & col. Clusters.

)ˆ;ˆ( - );(min,ˆ

YXIYXIYX

112/04/09

The loss in mutual information equals

Can be shown that q(x,y) is a “maximum entropy” approximation to p(x,y).

)),( || ),((D )ˆ;ˆ( - );( KL yxqyxpYXIYXI

yyxxyypxxpyxpyxq ˆ,ˆ where),ˆ|()ˆ|()ˆ,ˆ(),(

112/04/09

0.18 0.18 0.14 0.14 0.18 0.18

0.150.150.150.150.20.2

)()ˆ,ˆ()ˆ|()ˆ|()ˆ,ˆ(),(

xpyxpyypxxpyxpyxq

0.5 0.5

0.30.30.4

054.05.0

15.03.0

112/04/09

D(p||q)0.0419

090.0419

090.05696

0.05696

0.03760.04964

D(p||q)0.056960.056960.0419

10.0419

10.04964

10.0376

112/04/09

D(p||q)0.0211

80.0211

80.0224

30.04076

50.04893 0.04893

D(p||q)0.04813

80.04813

80.04194

20.0229

50.0205

20.0205

112/04/09

However, the matrix may contain negative entries or a distortion measure other than KL-divergence.

The squared Euclidean distance might be more appropriate.

This paper address the general situation by extending ITCC along three directions. “Nearness” is now measured by any Bregman

divergence. Allow specification of a larger class of constraints. Generalize the maximum entropy approach.

112/04/09

The objective function is

},...,{ 1

112/04/09

Let ф be a real-valued strictly convex function defined on the convex set S=dom(ф)R, ф is differentiable on int(S), the interior of

The Bregman divergence dф:S ×int(S)[0,∞) is defined as

)(,)()(),( 2212121 zzzzzzzd

112/04/09

I-Divergence Given zR+, let ф(z) = zlog(z).For z1, z2 R+

Squared Euclidean Distance Given z R, let ф(z) =z2. For z1, z2 R,

)()/log(),( 2121121 zzzzzzzd

22121 )(),( zzzzd

112/04/09

Bregman information is defined as the expected Bregman divergence to the expectation. Iф(Z)=E[dф(Z,E[Z])]

I-Divergence Given a real non-negative random variable Z, the

Bregman information is Iф(Z)=E[Zlog(Z/E[Z])]

Squared Euclidean Distance Given any real random variable Z, the Bregman

information is Iф(Z)=E[(Z-E[Z])2]

112/04/09

Let (X, Y)~p(X, Y) be jointly distributed random variables with X, Y.

p(X, Y) be written the form of the matrix Z

The quality of the co-clustering can be defined as

)(,][,][],[ ,11 vuuvnm

uv yxpzvuzZ

mu vyYuxX 11 ][},{:;][},{:

),( clustering-co by the determineduniquely is Z where

)ˆ,()]ˆ,([1 1

vuvuvuv zzdzZZdE

112/04/09

(,) involves four random variables corresponding to the various partitioning of the matrix Z.

We can obtain different matrix approximations based on the statistics of Z corresponding to the non-trivial combinations of }}ˆ{},ˆ{},{},{},ˆ,ˆ{},,ˆ{},ˆ,{{ VUVUVUVUVU

}ˆ,ˆ,,{ VUVU

112/04/09

(Γ) denotes the class of matrix approximation schemes based on (,).

The set of approximations MA(,,C) consists of all Z’Sm×n.

The “best” approximation Z.

}},ˆ{},ˆ,{{ }},{},{},ˆ,ˆ{{

}}ˆ,ˆ{{ }},ˆ{},ˆ{{

VUVUCVUVUC

VUCVUC

)]',([minargˆ),,('

ZZdEZCMZ A

112/04/09

We present brief case studies to demonstrate two salient features. Dimensionality reduction Missing value prediction

112/04/09

Clustering interleaved with implicit dimensionality reduction

Superior performance as compared to one-sided clustering

112/04/09

Assign zero measure for missing elements, co-cluster and use reconstructed matrix for prediction

Implicit discovery of correlated sub-matrices

112/04/09

The Bregman divergence as the co-clustering loss function. I-divergence and squared Euclidean distance

Approximation models of various complexities are possible depending on the statistics.

The minimum Bregman information principle as a generalization of the maximum entropy principle.

112/04/09

a generalized maximum entropy approach to bregman co clustering

matrix z

statistics of z

bregman information

z s mn

real random variable

bregman divergence d

expected bregman divergence

coclustering problem

Technology

18 phut - peter bregman - hoang vinh.pdf

scriptie eric bregman - theater =

bregman algorithms - uc santa...

logistic regression, adaboost and bregman...

black hole entropy from loop quantum gravity: generalized...

bregman algorithms

05 bregman sondeos y grouting

bregman information bottleneck

bregman iterative methods, lagrangian connections, dual

clustering with bregman divergences - machine learning

coupling image restoration and segmentation: a generalized...

generalized second law and entropy bound in a black...

18 phút - peter bregman - hoàng vịnh

generalized entropy regularization or: there’s nothing

bregman divergences 08-05-2008

rutger bregman 27 oktober 2014 oss

generalized holographic entanglement entropy ·...

foreword - institutul de fizica atomica · of generalized...

generalized entropy(ies) depending only on the probability...

bregman voronoi diagrams - link.springer.com · building...