idea of co-clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Idea of Co-Clustering

• Co-clusteringTo combine the row and column clustering of co-

occurrence matrix together and bootstrap each other.Simultaneously cluster the rows X and columns Y of

the co-occurrence matrix.


Hierarchical Co-Clustering Based on Entropy Splitting

• View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables

• Objective: seeking a hierarchical co-clustering containing given number of clusters while maintaining as much “Mutual Information” between row and column clusters as possible.

yx

yxoccurenceco

yxoccurencecoyxp

,

),(#

),(#),(

XY

XY

c1 c2 c3 c4

r1 0.1 0 0.2 0

r2 0 0.1 0.1 0

r3 0.2 0.1 0.1 0

r4 0 0 0 0.1



Y

0.1 0 0.2 0

0 0.1 0.1 0

0.2 0.1 0.1 0

0 0 0 0.1

X

0.1 0.2 0

0.2 0.4 0

0 0 0.1

Y

X

1

Y

X

1 0 2 0

0 1 1 0

2 1 1 0

0 0 0 1

1 0 2 0

0 1 1 0

2 1 1 0

0 0 0 1

1 0 2 0

0 1 1 0

2 1 1 0

0 0 0 1

0 0.4691 0.7751

Co-occurrence Matrices

Joint probability distribution between row & column cluster random variables



Update cluster indicators

Pipeline: (recursive splitting)

While(Termination condition)While(Termination condition)

Find optimal row/column cluster split which achieves maximal ˆ ˆ( , )I X Y

Termination Condition: ˆ ˆ( , )

( , )ˆˆ| | max | | maxI X Y

r cI X Y or R or C



Randomly split cluster S into S1 and S2

Converge at a local optima

How to find an optimal split at each step?

An Entropy-based Splitting Algorithm:

Input: Cluster S

Until Convergence Until Convergence

Update cluster indicators and probability values

For all element x in S, re-assign it to cluster S1 or S2 to minimize:

ˆ ˆ( ( | ) || ( | ) {1,2}jD p Y x p Y S j



• Example Y1 Y2 Y3 Y4

X1 0.1 0 0 0

X2 0 0.2 0.2 0

X3 0 0.2 0.2 0

X4 0.1 0 0 0

S={X1, X2, X3, X4}

S1={X1} S2={X2, X3, X4}

Naïve method needs trying 7 splits.Exponential time to size of S.

Randomly split

Re-assign X4 to S1S2={X2, X3}S1={X1, X4}


Experiments

• Data sets Synthetic data 20 Newsgroups data

20 classes, 20000 documents


Results-Synthetic Data

11.40

1000*1000 Matrix

Add noise to (a) by flipping values with probability 0.3

Randomly permute rows andcolumns of (b)

Clustering resultWith hierarchical structure


Results-20 Newsgroups Data

Compare

with

baselines:

Method HICC NVBD ICC HCC

Dataset

m-pre

#clusters

m-pre #clusters

m-pre #clusters m-pre #clusters

Multi5subject 0.95 5 0.93 5 0.89 5 0.72 5 Multi5 0.93 5 N/A 0.87 5 0.71 5

Multi10subject 0.69 10 0.67 10 0.54 10 0.44 10 Multi10 0.67 10 N/A 0.56 10 0.61 10

HICC(merged) Single-Link UPGMA WPGMA Complete-Link

m-pre

#clusters m-pre

#clusters m-pre

#clusters m-pre

#clusters m-pre #clusters

0.96 30 0.27 30 0.73 30 0.65 30 0.89 300.96 30 0.29 30 0.59 30 0.71 30 0.85 30

0.74 60 0.24 60 0.60 60 0.58 60 0.67 600.74 60 0.24 60 0.61 60 0.62 60 0.60 60

Micro-averaged precision: M/NM:number of documents correctly clustered; N: total number of documents


Thank You !

Questions?

idea of co-clustering

Documents