visualizing cluster results using package flexclust and ... · visualizing cluster results using...
Post on 25-Mar-2020
31 Views
Preview:
TRANSCRIPT
Visualizing Cluster Results
Using Package FlexClust and Friendsd
Friedrich Leisch
University of Munich
useR!, Rennes, 10.7.2009
Acknowledgements & Apology
• Sara Dolnicar (University of Wollongong)
• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)
• Paul Murrell, Deepayan Sarkar (R Core)
Friedrich Leisch: Cluster Visualization, 10.7.2009 1
Acknowledgements & Apology
• Sara Dolnicar (University of Wollongong)
• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)
• Paul Murrell, Deepayan Sarkar (R Core)
Apology: Microarray data only in Theresa’s talk (try time-shift back to
Wednesday).
Friedrich Leisch: Cluster Visualization, 10.7.2009 1
Partitioning Clustering – KCCA
K-centroid cluster algorithms:
• data set XN = {x1, . . . ,xN}, set of centroids CK = {c1, . . . , cK}• distance measure d(x,y)
• centroid c closest to x:
c(x) = argminc∈CK
d(x, c)
• Most KCCA algorithms try to find a set of centroids CK for fixed K
such that the average distance
D(XN , CK) =1
N
N∑n=1
d(xn, c(xn))→ minCK
,
of each point to the closest centroid is minimal.
• Optimization algorithm not important for rest of talk
Friedrich Leisch: Cluster Visualization, 10.7.2009 2
Example: Australian Volunteers
Survey among 1415 Australian adults about which organziations they
would consider to volunteer for, main motivation to volunteer, actual
volunteering, image of organizations, . . .
We use a block of 19 binary questions (“applies”, “does not apply”)
about motivations to volunteer: “I want to meet people”, “I have no
one else”, “I want to set an example”, . . .
Organziations investigated: Red Cross, Surf Life Savers, Rural Fire
Service, Parents Association, . . .
Our analyses show that there is both competition between organizations
with similar profiles as well as complimentary effects (idividuals volun-
teering for more than one arganization, in most cases with very different
profiles).
Friedrich Leisch: Cluster Visualization, 10.7.2009 3
8 Volunteer Clusters
Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 Cl.8 Totalmeet.people 9.24 15.77 28.77 97.18 82.17 80.97 90.81 34.23 49.47no.one.else 8.21 8.34 6.16 27.72 10.51 12.47 16.75 7.23 11.38example 12.80 14.68 63.56 93.30 35.84 73.33 80.32 45.30 47.63socialise 14.12 10.68 3.28 88.74 52.79 54.17 83.39 6.35 35.83help.others 0.00 100.00 92.93 95.17 56.95 89.18 86.98 86.12 66.78give.back 21.68 29.18 87.73 96.24 43.41 87.60 96.18 89.77 63.75career 12.04 5.54 10.46 71.34 35.01 17.49 18.29 11.54 20.57lonely 4.55 8.71 2.30 56.55 17.16 9.76 18.93 0.75 13.14active 17.17 15.93 23.38 93.82 81.61 53.97 77.00 23.98 44.88community 16.17 9.64 66.66 93.75 14.67 87.80 90.34 77.21 52.72cause 20.49 12.07 79.66 96.91 47.11 83.31 85.12 79.82 58.66faith 10.12 7.24 22.84 66.89 10.52 42.10 27.83 19.47 24.03services 7.00 7.10 11.63 78.15 23.99 43.72 44.98 14.64 25.23children 6.88 11.76 11.88 28.58 14.86 16.20 14.64 8.31 13.00good.job 19.14 23.81 100.00 94.61 49.18 85.35 75.38 0.00 51.80benefited 10.49 15.09 14.26 74.37 15.04 100.00 0.00 12.68 26.29network 10.75 8.47 6.29 85.86 43.59 22.83 38.98 10.28 25.94recognition 10.30 8.03 11.29 79.59 12.80 19.56 21.40 3.49 18.73mind.off 8.56 10.95 12.53 87.96 39.55 24.55 47.43 4.31 26.57
Friedrich Leisch: Cluster Visualization, 10.7.2009 4
8 Volunteer Clusters
Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 Cl.8 Totalmeet.people 9 16 29 97 82 81 91 34 49no.one.else 8 8 6 28 11 12 17 7 11example 13 15 64 93 36 73 80 45 48socialise 14 11 3 89 53 54 83 6 36help.others 0 100 93 95 57 89 87 86 67give.back 22 29 88 96 43 88 96 90 64career 12 6 10 71 35 17 18 12 21lonely 5 9 2 57 17 10 19 1 13active 17 16 23 94 82 54 77 24 45community 16 10 67 94 15 88 90 77 53cause 20 12 80 97 47 83 85 80 59faith 10 7 23 67 11 42 28 19 24services 7 7 12 78 24 44 45 15 25children 7 12 12 29 15 16 15 8 13good.job 19 24 100 95 49 85 75 0 52benefited 10 15 14 74 15 100 0 13 26network 11 8 6 86 44 23 39 10 26recognition 10 8 11 80 13 20 21 3 19mind.off 9 11 13 88 40 25 47 4 27
Friedrich Leisch: Cluster Visualization, 10.7.2009 5
8 Volunteer Clusters
mind.offrecognition
networkbenefitedgood.jobchildrenservices
faithcause
communityactivelonelycareer
give.backhelp.others
socialiseexample
no.one.elsemeet.people
Cluster 1: 322 (23%)
0.0 0.2 0.4 0.6 0.8 1.0
Cluster 2: 140 (10%) Cluster 3: 166 (12%)
0.0 0.2 0.4 0.6 0.8 1.0
Cluster 4: 136 (10%)
mind.offrecognition
networkbenefitedgood.jobchildrenservices
faithcause
communityactivelonelycareer
give.backhelp.others
socialiseexample
no.one.elsemeet.people
0.0 0.2 0.4 0.6 0.8 1.0
Cluster 5: 160 (11%) Cluster 6: 147 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
Cluster 7: 178 (13%) Cluster 8: 166 (12%)
Friedrich Leisch: Cluster Visualization, 10.7.2009 6
8 Volunteer Clusters
We cannot easily test for differences between clusters, because they were
constructed to be different.
Friedrich Leisch: Cluster Visualization, 10.7.2009 7
8 Volunteer Clusters
We cannot easily test for differences between clusters, because they were
constructed to be different.
Improved presentation of results (following advice that is only around for
a few decades):
• Add reference lines/points
• Sort variables by content:
1. sort clusters by mean
2. sort variables by hierarchical clustering
• Highlight important points
Friedrich Leisch: Cluster Visualization, 10.7.2009 7
Add reference points
mind.offrecognition
networkbenefitedgood.jobchildrenservices
faithcause
communityactivelonelycareer
give.backhelp.others
socialiseexample
no.one.elsemeet.people
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 1: 322 (23%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 2: 140 (10%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 3: 166 (12%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 4: 136 (10%)
mind.offrecognition
networkbenefitedgood.jobchildrenservices
faithcause
communityactivelonelycareer
give.backhelp.others
socialiseexample
no.one.elsemeet.people
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 5: 160 (11%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 6: 147 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 7: 178 (13%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 8: 166 (12%)
Friedrich Leisch: Cluster Visualization, 10.7.2009 8
Sort Clusters
mind.offrecognition
networkbenefitedgood.jobchildrenservices
faithcause
communityactivelonelycareer
give.backhelp.others
socialiseexample
no.one.elsemeet.people
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 1: 322 (23%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 2: 140 (10%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 3: 136 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 4: 166 (12%)
mind.offrecognition
networkbenefitedgood.jobchildrenservices
faithcause
communityactivelonelycareer
give.backhelp.others
socialiseexample
no.one.elsemeet.people
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 5: 160 (11%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 6: 147 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 7: 178 (13%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 8: 166 (12%)
Friedrich Leisch: Cluster Visualization, 10.7.2009 9
Sort Variableshe
lp.o
ther
s
give
.bac
k
com
mun
ity
caus
e
exam
ple
good
.job
activ
e
mee
t.peo
ple
soci
alis
e
min
d.of
f
netw
ork
care
er
lone
ly
reco
gniti
on
faith
no.o
ne.e
lse
child
ren
serv
ices
bene
fited
Friedrich Leisch: Cluster Visualization, 10.7.2009 10
Sort Variables
help.othersgive.back
communitycause
examplegood.job
activemeet.people
socialisemind.offnetwork
careerlonely
recognitionfaith
no.one.elsechildrenservices
benefited
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 1: 322 (23%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 2: 140 (10%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 3: 136 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 4: 166 (12%)
help.othersgive.back
communitycause
examplegood.job
activemeet.people
socialisemind.offnetwork
careerlonely
recognitionfaith
no.one.elsechildrenservices
benefited
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 5: 160 (11%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 6: 147 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 7: 178 (13%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 8: 166 (12%)
Friedrich Leisch: Cluster Visualization, 10.7.2009 11
Highlight Important Points
help.othersgive.back
communitycause
examplegood.job
activemeet.people
socialisemind.offnetwork
careerlonely
recognitionfaith
no.one.elsechildrenservices
benefited
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 1: 322 (23%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 2: 140 (10%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 3: 136 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 4: 166 (12%)
help.othersgive.back
communitycause
examplegood.job
activemeet.people
socialisemind.offnetwork
careerlonely
recognitionfaith
no.one.elsechildrenservices
benefited
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 5: 160 (11%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 6: 147 (10%)
0.0 0.2 0.4 0.6 0.8 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 7: 178 (13%)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cluster 8: 166 (12%)
Friedrich Leisch: Cluster Visualization, 10.7.2009 12
Example: Car Data
A German manufacturer of premium cars asked recent customers about
main motivation to buy one of their cars. Binary data, respondents were
asked to check properties like sporty, power, interior, safety, quality,
resale value, etc.
Exploration of the data shows no natural groups, a segmentation of the
market imposes a partition on the data (which is absolutely fine for the
purpose).
Here: hierarchical clustering of variables, partition with 4 clusters from
neural gas algorithm for customers.
Friedrich Leisch: Cluster Visualization, 10.7.2009 13
Example: Car Data
driv
ing_
prop
ertie
s
pow
er
spor
ty
econ
omy
cons
umpt
ion re
sale
_val
ue
serv
ice
clar
ity
char
acte
r
spac
e
hand
ling
conc
ept
inte
rior
styl
ing
mod
el_c
ontin
uity
repu
tatio
n
safe
ty
qual
ity
relia
bilit
y
tech
nolo
gy
com
fort
Friedrich Leisch: Cluster Visualization, 10.7.2009 14
Example: Car Data
driving_propertiespowersporty
economyconsumptionresale_value
serviceclarity
characterspace
handlingconceptinteriorstyling
model_continuityreputation
safetyquality
reliabilitytechnology
comfort
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
Cluster 1: 237 (30%)
0.0 0.2 0.4 0.6 0.8 1.0
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
Cluster 2: 280 (35%)
driving_propertiespowersporty
economyconsumptionresale_value
serviceclarity
characterspace
handlingconceptinteriorstyling
model_continuityreputation
safetyquality
reliabilitytechnology
comfort
0.0 0.2 0.4 0.6 0.8 1.0
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
Cluster 3: 205 (26%)
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
Cluster 4: 71 (9%)
Friedrich Leisch: Cluster Visualization, 10.7.2009 15
PCA Projection
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●clarity
economy
driving_properties
serviceinterior
quality
technology
model_continuity
comfortreliability
handling reputation
concept
character
power
resale_value
styling
safety
sporty
consumption space
Friedrich Leisch: Cluster Visualization, 10.7.2009 16
PCA Projection
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
quality
technology comfort
power
resale_valueconsumption
Friedrich Leisch: Cluster Visualization, 10.7.2009 17
PCA Projection
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
12
3
4quality
technology comfort
power
resale_valueconsumption
Friedrich Leisch: Cluster Visualization, 10.7.2009 18
PCA Projection
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
12
3
4quality
technology comfort
power
resale_valueconsumption
Friedrich Leisch: Cluster Visualization, 10.7.2009 19
PCA Projection
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
12
3
4quality
technology comfort
power
resale_valueconsumption
Friedrich Leisch: Cluster Visualization, 10.7.2009 20
PCA Projection
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
12
3
4quality
technology comfort
power
resale_valueconsumption
Friedrich Leisch: Cluster Visualization, 10.7.2009 21
PCA Projection
12
3
4quality
technology comfort
power
resale_valueconsumption
Friedrich Leisch: Cluster Visualization, 10.7.2009 22
Convex Cluster Hulls
The main problem for 2-dimensional generalizations of boxplots is that
there is no total ordering of R2 (or higher).
For data partitioned using a centroid-based cluster algorithm there is a
natural total ordering for each point in a cluster: The distance d(x, c(x))
of the point to its respective cluster centroid. Let Ak be the set of
points in cluster k, and
mk = median{d(xn, ck)|xn ∈ Ak}
inner area: all data points where d(xn, ck) ≤ mk
outer area: all data points where d(xn, ck) ≤ 2.5mk
Friedrich Leisch: Cluster Visualization, 10.7.2009 23
Shadow Values
Second-closest centroid to x:
c̃(x) = argminc∈CK\{c(x)}
d(x, c)
Shadow value:
s(x) =2d(x, c(x))
d(x, c(x)) + d(x, c̃(x))∈ [0, 1]
s(x) = 0: centroid
s(x) = 1: on border of clusters
Friedrich Leisch: Cluster Visualization, 10.7.2009 24
Neighborhood Graph
Let
Aij ={xn | c(xn) = ci, c̃(xn) = cj
}be the set of all points where ci is the closest centroid and cj is second-
closest.
Cluster similarity:
sij =
{|Ai|−1 ∑
x∈|Aij| s(x), Aij 6= ∅0, Aij = ∅
Use sij + sji for thickness of line connecting ci and cj.
Friedrich Leisch: Cluster Visualization, 10.7.2009 25
Neighborhood Graph
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
● ●●
●
●
●●
●●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
1
2
3
45 6
7
8
Friedrich Leisch: Cluster Visualization, 10.7.2009 26
Neighborhood Graph
●●●
●
●
●●
●
●
●
●●
●●
●●
●●
●●
●●
●●
●
●● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●●
●●
●●
●●
●
●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
● ●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●
● ●
●
●●
●
●
●●
● ●
●
●
●
●●
●
●
●
●
1
2
3
45
6
7
89
10
11
1213
14
15
16
Friedrich Leisch: Cluster Visualization, 10.7.2009 27
Volunteer PCA-Projection
1
2
3 4
5
67
8
example
socialise
help.othersgive.back
active
communitycause
network
Friedrich Leisch: Cluster Visualization, 10.7.2009 28
Volunteer PCA-Projection
●
●
●●
●
●●
●1
2
34
5
6
7
8
Friedrich Leisch: Cluster Visualization, 10.7.2009 29
Volunteer PCA-Projection
●
●
●●
●
●●
●1
2
34
5
6
7
8
Friedrich Leisch: Cluster Visualization, 10.7.2009 30
Nonlinear Arrangement
k1
k2
k3
k4
k5
k6
k7
k8
Friedrich Leisch: Cluster Visualization, 10.7.2009 31
Cluster Size
k1
k2
k3
k4
k5
k6
k7
k8
Friedrich Leisch: Cluster Visualization, 10.7.2009 32
Age Distribution
●●●●●●●●●●
●●
●●●●
●
●
●
●●●
●●
●
●
Friedrich Leisch: Cluster Visualization, 10.7.2009 33
Gender Distribution
Friedrich Leisch: Cluster Visualization, 10.7.2009 34
Nice to Look at
Friedrich Leisch: Cluster Visualization, 10.7.2009 35
Model-based Ordinal Clustering
Survey data for two fastfood chains (McDonald’s, Subway) from 715
respondents on 10 items (yummy, fattening, greasy, fast, . . . ).
We are interested in capturing scale usage heterogeneity under the
assumption that different involvement with a brand may provoke different
scale usage.
Finite mixture model: For each item and group we estimate mean and
standard deviation of a latent Gaussian. Assumption of independence
between items given group membership, estimation by EM.
For a 3-component model we get 30 means and 30 standard deviations.
Friedrich Leisch: Cluster Visualization, 10.7.2009 36
Model-based Ordinal Clustering
−2 −1 0 1 2
0.0
0.1
0.2
0.3
0.4
x
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
−2 −1 0 1 2
0.0
0.1
0.2
0.3
0.4
x
dnor
m(x
, mea
n =
m[n
], sd
= s
[n])
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
−2 −1 0 1 2
0.0
0.1
0.2
0.3
0.4
x
dnor
m(x
, mea
n =
m[n
], sd
= s
[n])
−3 −2 −1 0 1 2 30.
00.
10.
20.
30.
4
Friedrich Leisch: Cluster Visualization, 10.7.2009 37
Subway: 2 Componentspr
ob
0.0
0.2
0.4
0.6
0.8
yummy1
fattening1
greasy1
fast1
cheap1
tasty1
healthy1
disgusting1
convenient1
spicy1
0 1 2 3 4 5 6
yummy2
0 1 2 3 4 5 6
fattening2
0 1 2 3 4 5 6
greasy2
0 1 2 3 4 5 6
fast2
0 1 2 3 4 5 6
cheap2
0 1 2 3 4 5 6
tasty2
0 1 2 3 4 5 6
healthy2
0 1 2 3 4 5 6
disgusting2
0 1 2 3 4 5 6
convenient2
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
spicy2
Friedrich Leisch: Cluster Visualization, 10.7.2009 38
Subway: 2 Components
prob
spicy
convenient
disgusting
healthy
tasty
cheap
fast
greasy
fattening
yummy
0.0 0.2 0.4 0.6 0.8 1.0
1
0.0 0.2 0.4 0.6 0.8 1.0
2
Friedrich Leisch: Cluster Visualization, 10.7.2009 39
Subway: 2 Components
Overall Scale Usage
prob
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6
1
0 1 2 3 4 5 6
2
Friedrich Leisch: Cluster Visualization, 10.7.2009 40
Subway: 2 Components
Scale Usage Corrected
prob
rat
io 0
1
2
3
4
5
6yummy
1fattening
1greasy
1fast1
cheap1
tasty1
healthy1
disgusting1
convenient1
spicy1
0 1 2 3 4 5 6
yummy2
0 1 2 3 4 5 6
fattening2
0 1 2 3 4 5 6
greasy2
0 1 2 3 4 5 6
fast2
0 1 2 3 4 5 6
cheap2
0 1 2 3 4 5 6
tasty2
0 1 2 3 4 5 6
healthy2
0 1 2 3 4 5 6
disgusting2
0 1 2 3 4 5 6
convenient2
0 1 2 3 4 5 6
0
1
2
3
4
5
6spicy
2
Friedrich Leisch: Cluster Visualization, 10.7.2009 41
McDonald’s: 3 Components
prob
spicy
convenient
disgusting
healthy
tasty
cheap
fast
greasy
fattening
yummy
0.0 0.2 0.4 0.6 0.8 1.0
1
0.0 0.2 0.4 0.6 0.8 1.0
2
0.0 0.2 0.4 0.6 0.8 1.0
3
Friedrich Leisch: Cluster Visualization, 10.7.2009 42
McDonald’s: 3 Components
Overall Scale Usage
prob
0.0
0.1
0.2
0.3
0.4
0.5
0 1 2 3 4 5 6
1
0 1 2 3 4 5 6
2
0 1 2 3 4 5 6
3
Friedrich Leisch: Cluster Visualization, 10.7.2009 43
McDonald’s: 3 Components
Scale Usage Corrected
prob
rat
io
0
1
2
3
4
5yummy
1fattening
1greasy
1fast1
cheap1
tasty1
healthy1
disgusting1
convenient1
spicy1
yummy2
fattening2
greasy2
fast2
cheap2
tasty2
healthy2
disgusting2
convenient2
0
1
2
3
4
5spicy
2
0
1
2
3
4
5
0 1 2 3 4 5 6
yummy3
0 1 2 3 4 5 6
fattening3
0 1 2 3 4 5 6
greasy3
0 1 2 3 4 5 6
fast3
0 1 2 3 4 5 6
cheap3
0 1 2 3 4 5 6
tasty3
0 1 2 3 4 5 6
healthy3
0 1 2 3 4 5 6
disgusting3
0 1 2 3 4 5 6
convenient3
0 1 2 3 4 5 6
spicy3
Friedrich Leisch: Cluster Visualization, 10.7.2009 44
Crosstabulation of Clusters
−2.93
−2.00
0.00
2.00
4.00
5.87
Pearsonresiduals:
p−value =7.7721e−11
32
1
1 2
Only a very small group
has extreme response style
for both brands, otherwise
almost independence.
Friedrich Leisch: Cluster Visualization, 10.7.2009 45
Software, Papers, Outlook
Packages are available from CRAN and/or R-Forge:
flexclust: KCCA clustering for arbitrary distances, shaded barcharts,projections, convex hulls, static neighborhood graphs, . . .
gcExplorer: Interactive neighborhood graphs with links to Gene Ontol-ogy.
symbols: Grid versions of boxplot(), symbols(), stars(), . . .flexmix: Finite mixture models. Model based clustering for various
distributions and mixtures of generalized linear models.
Public availability of features shown in this talk depends on releaseversion of packages.
Papers available at
http://www.statistik.lmu.de/~leisch
Big 2do: Redo most with iPlots Extreme . . .
Friedrich Leisch: Cluster Visualization, 10.7.2009 46
top related