siti cl ti dseriation, clustering and matrix reordering ... · matrix reordering: towardsmatrix...

39
S i ti Cl t i d Seriation, Clustering and Matrix Reordering: Towards Matrix Reordering: Towards the Encyclopedia of Structures the Encyclopedia of Structures INNAR LIIV 10 15 O t 8 2009 [email protected] 10:15am, Oct 8, 2009 Tartu Guest [email protected]

Upload: others

Post on 18-Jan-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

S i ti Cl t i dSeriation, Clustering and Matrix Reordering: TowardsMatrix Reordering: Towards

the Encyclopedia of Structuresthe Encyclopedia of StructuresINNAR LIIV

10 15 O t 8 2009

[email protected]

10:15am, Oct 8, 2009Tartu

Guest [email protected]

Page 2: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Visualization and human computation “brain exercise”

Perception of/and experience

• Simple example of 5 entities p p(persons) and their relationships

• Who would you prefer to be?• Who wouldn’t you want to be?• Who wouldn t you want to be?

• And what if the relationship means• And what if the relationship means “company A sells to company B” ?What if relationship means “love”?• What if relationship means “love”?

Page 3: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

OutlineOutline

• Micro-tutorial on Seriation:– What is it? Who cares?– Clustering versus seriation;

Related work and background– Related work and background– Recent advances

• Similarity (“goodness”) measures• Where to go from here?• Where to go from here?

– The Encyclopedia (Gallery,DB) of Structures?

Page 4: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Matrix representation of a graph

Page 5: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Loua, T. (1873), Atlas statistique de la population de Paris, Paris: J. Dejey.

Page 6: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Definiton: SeriationDefiniton: Seriation

• Seriation is an exploratory combinatorial data analysis technique to reorder objects into a sequence along a one-dimensionalinto a sequence along a one dimensional continuum so that it best reveals regularity and patterning among the whole seriesand patterning among the whole series.

Page 7: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

A simple example with 11 objectsA simple example with 11 objects

“R d t ”“Raw data”

Page 8: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Seriation vs clusteringSeriation vs clusteringk 4

CLUSTERING:k=4

4 / 36% 2 / 18%2 / 18%

4 / 36%

3 / 27%

2 / 18%

SERIATION:

Page 9: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Seriation and Matrix ReorderingSeriation and Matrix Reordering• Seriation is typically applied for matrix• Seriation is typically applied for matrix

reordering (two-way one/two-mode )seriation);

• Every matrix is two-way*, N x N matrix isEvery matrix is two way , N x N matrix is one-mode and N x M matrix is two-mode.

COUNTRIES ATTRIBUTES1 0 0 11 1 1 1

COUNTRIES

RIE

S 1 0 01 1 1

ATTRIBUTES

CTS

1 1 1 11 0 1 1

OU

NTR 1 1 1

1 0 1OB

JEC

0 0 1 1CO 0 0 1O

*Using Carroll-Arabie taxonomy of scaling methods and terminology of Tucker

Page 10: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Seriation and Matrix ReorderingSeriation and Matrix Reordering

Page 11: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Seriation vs ClusteringSeriation vs ClusteringExample by Prof Gilles Caraux (Permutmatrix software):• Example by Prof. Gilles Caraux (Permutmatrix software):

ORIGINAL DATA MATRIX

AFTER CLUSTERAFTER CLUSTERANALYSIS OF ROWS

AFTER SERIATION

Page 12: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Gower & Digby (1981)

Page 13: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Can you see the pattern in data?Can you see the pattern in data?

Page 14: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Did you see it?Did you see it?

Page 15: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Related workRelated workSokal BurbidgeRobinson

P t i

1963

SokalBiology

1971

BurbidgeManufacturing

MullatKendalll ith

1951HartiganStatistics

RobinsonArchaeology

1909Bertin

CartographyCzekanowskiAnthropology

Petrie1946 1967

McCormickOp.research

1969 1972 1979VõhanduSurvey DA

algorithms

Forsyth&KatzSociology

1987

CartographyAnthropology Op.research

MarcotorchinoUnified approach 1991

Sociology

2004 1999 1992 Data miningBiclustering

SiirtolaVisualization

LiivUnified viewAlgorithms

Chen et al.GAP

Applications

Page 16: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Seriation: who cares (nowadays)?Seriation: who cares (nowadays)?• Information visualization & HCI community;y

– Siirtola, H. (1999). Interaction with the Reorderable Matrix. (IV'99)– Siirtola, H. , & Mäkinen, E. (2005). Constructing and

reconstructing the reorderable matrix Information Visualizationreconstructing the reorderable matrix. Information Visualization, 4(1), 32-48.

– Ghoniem&Fekete&Castagliola: A Comparison of the Readability of Graphs Using Node-Link and Matrix-Based Representations +of Graphs Using Node-Link and Matrix-Based Representations + MatrixExplorer (Henry, Fekete)

• Data mining and statistics community;g y• Bioinformatics community;• Social Network Analysis community;Social Network Analysis community;• Operations research and combinatorial

optimization community.optimization community.

Page 17: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Recent advancesRecent advancesNiermann (2005) presented• Niermann (2005) presenteda GA approach for seriationi Th A i St ti ti iin The American Statistician

• Brusco, M., & Stahl, S. (2005). Branch-and-Bound Applications in Combinatorial Data Analysis. New York: Springer.y p g

• Chen, C.H., Härdle, W. and Unwin, A. (2008) Handbooks of Computational Statistics: DataHandbooks of Computational Statistics: Data Visualization. Springer Verlag, Heidelberg

• The History of the Cluster Heat Map• The History of the Cluster Heat Map (Wilkinson& Friendly, TAS, May 2009)

Page 18: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

“Optimal leaf ordering (OLO)”Optimal leaf ordering (OLO)

• Bar-Joseph et al. 2001:

Input Hierarchical clusteringresult

Optimal ordering result

Unclassed Matrix Shading and Optimal Ordering in Hierarchical Cluster Analysis(Gale et al., Journal of Classification,1:75-92,1984)

Page 19: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

No name consensusNo name consensus• Czekanowski diagram, Robinson matrix,

Reorderable matrix, Matrix reordering, Matrix , g,visualization, Matrix analysis, Matrix permutation, Permutable matrix, Array-based clustering, Block clustering Biclustering (two(n) mode clustering)clustering, Biclustering (two(n)-mode clustering), Co-clustering, Product Flow Analysis, Group Technology, Part/Machine group formation, gy, g p ,Manufacturing cell formation, Cellular manufacturing, Seriation, cleaned up differential shading of the similarity matrix Matrix tile analysisshading of the similarity matrix, Matrix tile analysis, Rearrangement clustering, Generalized AssociationPlots (GAP), non-destructive data analysis, optimal ( ), y , porder of matrices, Optimal leaf ordering, band form, banded structure; Matrice ordonnable; Differentialdiagnose; методMatrice ordonnable; Differentialdiagnose; метод групповой технологии и организации группового производства;

Page 20: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Seriation: a unified viewSeriation: a unified view

unidimensional seriation block diagonal seriation

block checkboard seriation Pareto seriation

Page 21: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Similarity (“goodness”) measuresSimilarity ( goodness ) measures

• McCormick et al. (1969, 1972):

• Cumulative Hamming (Verin/Grishin,1986):

• Can be generalized and written compactly:

Page 22: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

4 Important questionsp q

• I’m not buying that matrix• I m not buying that matrix representation is better than graph layout! (actually, it’s worse!!!)

• How is it different from correlation?How is it different from correlation?(why can’t I just calculate corr coef for everything and sort as a list?)

• How is it different from clustering?(there’s lots of tools for clustering – Why can’t I just pick one of th ?)those?)

• What is the added value to InfoVis it f thi h?community from this approach?

Page 23: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

I’m not buying that matrixI m not buying that matrix representation is better than graph p g playout!

S h di i i ld h (F h• Such discussion is older than us (Forsyth-Katz vs Moreno 1940s, recent user studies b F k H Gh i )by Fekete,Henry,Ghoniem)

• “Cliques”, clusters, hubs, chains harder to ffdetect in graphs with different entity types

(“bipartite” and n-partite graphs)• Hard to read if n gets bigger• With graphs we encode only positive

connections (existing relationships)

Page 24: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Seriation and Matrix ReorderingSeriation and Matrix Reordering

Page 25: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

How is it different from correlation?How is it different from correlation?(why can’t I just calculate corr coef for everything and sort as a list?)

• We don’t know the two attributes!• We want to find multiple correlations (corr• We want to find multiple correlations (corr

between >> more than two attr.) (2n list)• We don’t always know what “level” of• We don t always know what level of

correlation provides the most information (highest corr != “best” corr)(highest corr != best corr)

• We are interested in chained corr• Not to mention that there are some• Not to mention that there are some

fundamental issues already with std corr:

Page 26: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR
Page 27: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

How is it different from clustering?How is it different from clustering?(there’s lots of tools for clustering – Why can’t I just pick one of those?)those?)

• Algorithmic problem: k # of clusters unknown

• Goal of clustering is to assign similar entities to groups, not to identify or describegroups, not to identify or describe similarities/affinities between entities!

• Clustering of attributes aka/~ factor analysis

• It is not a clustering’s “fault”, because if the goal is not to find all similarities between entities and between clusters, it would be unefficient extra work for CPU/GPU

Page 28: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

How is it different from clustering?How is it different from clustering?(there’s lots of tools for clustering – Why can’t I just pick one of those?)those?)• Co-clustering objects and attributes or

different entity types – an interesting y yp gbehavior/finding may not hold in the whole dataset, but within only a subset!, y

• In case of clusters with “wierd shape” and interplay between different “wierd” clusters... p yonly human eye can detect/analyze those

• Most probably we won’t see any “answers”,Most probably we won t see any answers , but the analyst sure knows what to do next time he/she sees a similar cluster shapetime he/she sees a similar cluster shape

Page 29: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

What is the added value to InfoVisWhat is the added value to InfoVis community from this approach?

• Yet another view of the system –classical version had overview/details,classical version had overview/details, now overview & details-on-demand

• Visual distribution of (lack of)• Visual distribution of (lack of) relationships

• Useful in similar cases as paral.coords, clustering & finding outliersclustering & finding outliers

• Provides a neutral view for the datasetdataset

Page 30: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

What is the added value to InfoVisWhat is the added value to InfoVis community from this approach?

• If we strip out all the semantics from different fields structural pictures start to repeat!fields, structural pictures start to repeat!

• Don’t know how to “save” that experience yet, but analyst remembers what procedure wasbut analyst remembers what procedure was efficient the last time;

• Best place for matrix reordering at the• Best place for matrix reordering at the beginning either to distribute entities between analysts or just to get an overview of how theanalysts or just to get an overview of how the structure “looks like”

Page 31: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

unidimensional seriation block diagonal seriationunidimensional seriation block diagonal seriation

block checkboard seriation Pareto seriation

Page 32: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR
Page 33: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

What is the added value to InfoVisWhat is the added value to InfoVis community from this approach?• Important to distinguish that learning to see

different structural patterns from the overview pis not just moving along the learning curve to get the technique, but to accumulate get t e tec que, but to accu u ateknowledge from all your previous works.

• Not just learning how to read the display, butNot just learning how to read the display, but how to connect and combine with past experience, background information,experience, background information, memories from previous investigations, not on entity level , but @ abstract metalevelentity level , but @ abstract metalevel

Page 34: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Encyclopedia of StructuresVISION

Encyclopedia of Structures

unidimensional seriation block diagonal seriation

block checkboard seriation Pareto seriation

Page 35: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Vis ali ation in entor transactionsEXAMPLE: QUERY1

Visualization: inventory transactionsITEMS (SKUs)

TIO

NS

RA

NS

AC

T

“1” (bl k d t) if it iTR “1” (black dot) if item isused in the transaction

INITIAL MATRIX MATRIX AFTER SERIATION“dimensions” of the data table = 1601 x 1735

Page 36: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

School iolenceEXAMPLE: QUERY2

School violence

1) B ttl k hi1) Bottleneck machine(in manufacturing)2) Excellent position(i l h i )3) Miserable love(Psychology)

(in supply chain)

Page 37: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Semantics out semantics inSemantics out, semantics in

• By taking out the semantics, we’ll see the structural similarities;;

• Putting back in the semantics, we can “borrow” insights from fields/experiments;borrow insights from fields/experiments;

• We would like to store and accumulate that experiences with different datasets (and domains) with “similar structure”;domains) with similar structure ;

• Index of structural patterns.

Page 38: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

Final thoughtsFinal thoughts

• Seriation research will merge with data mining, social network analysis & g, yinformation visualization+HCI;

• Non visual clustering will lose popularity;• Non-visual clustering will lose popularity;• Trend in similarity measures should be

from entity-to-entity to entity-to-set (and structural), i.e. not calculating the wholestructural), i.e. not calculating the whole entity-to-entity table is a crucial property!

Page 39: Siti Cl ti dSeriation, Clustering and Matrix Reordering ... · Matrix Reordering: TowardsMatrix Reordering: Towards the Encyclopedia of Structuresthe Encyclopedia of Structures INNAR

THANK YOU