clustering and classification methods: a brief introduction€¦ · Ø divisive hierarchical...

ClusteringandClassificationMethods:ABriefIntroduction

EPSY905:MultivariateAnalysisSpring2016

Lecture#14(andlast!)–May4,2016

Today’sLecture

• Classificationmethods:Usefulforwhenyoualreadyknowwhichgroupsexistbutyoudon’tknowwhichgrouptoassignanewobservations

• Clusteringmethods:Usefulforwhenyoudon’tknowhowmanygroupsexist

• Asbothmethodsrelyupondistances(moreorless)wewillstartwithadefinitionofdistances

Ø We’llalsogetabonusslideortwoaboutmultidimensional scaling—anothermethod thatusesdistances (butdoesn’t clusterorclassifydirectly)

Today’sData

• WewillmakeuseoftheclassicFisher’sIrisDatatodemonstrateclusteringandclassificationmethods

• 150flowers;50fromthreespecies(Setosa,Versicolor,Virginica)

• Fourmeasurementsfromeachflower:Ø SepalwidthØ SepallengthØ PetalwidthØ Petallength

• ThesedataarebuiltintoRalready!

DISTANCEMETRICS

MeasuresofDistance

• Caremustbetakenwithchoosingthemetricbywhichsimilarityisquantified

• Importantconsiderationsinclude:Ø Thenatureofthevariables (e.g., discrete, continuous, binary)Ø Scalesofmeasurement (nominal,ordinal, interval, orratio)Ø Thenatureofthematterunderstudy

EuclideanDistance• Euclideandistanceisafrequentchoiceofadistancemetric:

𝑑 𝒙, 𝒚 = 𝑥' − 𝑦' * + 𝑥* − 𝑦* * + ⋯+ 𝑥- − 𝑦-*

= 𝒙 − 𝒚 . 𝒙− 𝒚

• Distancescanbebetweenanythingyouwishtocluster:Ø Observations (distanceacrossallvariables)Ø Variables (distance acrossallobservations)

• TheEuclideandistanceisafrequentchoicebecauseitrepresentsanunderstandablemetric

Ø Itmaynotbethebestchoicealways

EuclideanDistance?

• ImagineIwantedtoknowhowmanymilesitwasfrommyoldhouseinSacramentotoLombardStreetinSanFrancisco…

• Knowinghowfaritwasonastraightlinewouldnotdometoomuchgood,particularlywiththenumberofone-waystreetsthatexistinSanFrancisco

• OtherpopulardistancemetricsincludetheMinkowski metric:

• Thekeytothismetricisthechoiceofm:– Ifm =2,thisprovidestheEuclideandistance– Ifm=1,thisprovidesthe“city-block”distance

OtherDistanceMetrics

mii yxd

1),( ⎥

⎤⎢⎣

⎡−= ∑

PreferredDistanceProperties

• Itisoftendesirabletouseadistancemetricthatmeetsthefollowingproperties:

Ø d(P,Q) = d(Q,P)Ø d(P,Q) > 0 if P ≠ QØ d(P,Q) = 0 if P = QØ d(P,Q) ≤ d(P,R) + d(R,Q)

• TheEuclideanandMinkowskimetricssatisfytheseproperties

TriangleInequality• Thefourthproperty iscalled thetriangleinequality, whichoftengetsviolatedbynon-routinemeasures ofdistance

• Thisinequality canbeshownbythefollowing triangle(withlinesrepresenting Euclidean distances):

d(P,Q)

d(P,R)d(R,Q)

d(P,R) d(R,Q)

d(P,Q)

BinaryVariables• Inthecaseofbinary-valuedvariables(variablesthathavea0/1coding),manyotherdistancemetricsmaybedefined

• TheEuclideandistanceprovidesacountofthenumberofmismatchedobservations:

• Here,d(i,k)=2

• ThisissometimescalledtheHammingDistance

OtherBinaryDistanceMeasures

• Thereareanumberofotherwaystodefinethedistancebetweenasetofbinaryvariables

• Mostofthesemeasuresreflectthevariedimportanceplacedondifferingcellsina2x2table

GeneralDistanceMeasureProperties

• Useofmeasuresofdistancethataremonotonicintheirorderingofobjectdistanceswillprovideidenticalresultsfromclusteringheuristics

• Manytimesthiswillonlybeanissueifthedistancemeasureisforbinaryvariablesor thedistancemeasuredoesnotsatisfythetriangleinequality

DistancesinR

CLASSICALCLUSTERINGMETHODS(USINGDISTANCES)

ClusteringDefinitions• Clusteringobjects(orobservations)canprovidedetailregardingthenatureandstructureofdata

• ClusteringisdistinctfromclassificationinterminologyØ Classification pertainstoaknown numberofgroups,withtheobjectivebeingtoassignnewobservations tothesegroups

• Classicalmethodsofclusteranalysisisatechniquewherenoassumptionsaremadeconcerningthenumberofgroupsorthegroupstructure

Ø Mixturemodelsdomakeassumptions aboutthedata

• Clusteringalgorithmsmakeuseofmeasuresofsimilarity(oralternatively,dissimilarity)todefineandgroupvariablesorobservations

• Clusteringpresentsahostoftechnicalproblems

• Forareasonablesizeddatasetwithn objects(eithervariablesorindividuals),thenumberofwaysofgroupingn objectsintokgroupsis:

• Forexample,thereareoverfourtrillionwaysthat25objectscanbeclusteredinto4groups– whichsolutionisbest?

kj jjk

k ∑=−

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎞⎜⎝

NumericalProblems• Intheory,onewaytofindthebestsolutionistotryeachpossiblegroupingofalloftheobjects– anoptimizationprocesscalledintegerprogramming

• Itisdifficult,ifnotimpossible,todosuchamethodgiventhestateoftoday’scomputers(althoughcomputersarecatchingupwithsuchproblems)

• Ratherthanusingsuchbrute-forcetypemethods,asetofheuristicshavebeendevelopedtoallowforfastclusteringofobjectsintogroups

• Suchmethodsarecalledheuristicsbecausetheydonotguaranteethatthesolutionwillbeoptimal(best),onlythatthesolutionwillbebetterthanmost

ClusteringHeuristicInputs

• Theinputsintoclusteringheuristicsareintheformofmeasuresofsimilaritiesordissimilarities

• Theresultoftheheuristicdependsinlargepartonthemeasureofsimilarity/dissimilarityusedbytheprocedure

ClusteringVariablesvs.ClusteringObservations

• Whenvariablesaretobeclustered,oftusedmeasuresofsimilarityincludecorrelationcoefficients(orsimilarmeasuresfornon-continuousvariables)

• Whenobservationsaretobeclustered,distancemetricsareoftenused

HierarchicalClusteringMethods• Becauseofthelargesizeofpossibleclusteringsolutions,searchingthroughallcombinationsisnotfeasible

• Hierarchicalclusteringtechniquesproceedbytakingasetofobjectsandgroupingasetatatime

• Twotypesofhierarchicalclusteringmethodsexist:Ø Agglomerative hierarchicalmethodsØ Divisive hierarchicalmethods

AgglomerativeClusteringMethods• Agglomerativeclusteringmethodsstartfirstwiththeindividualobjects

• Initially,eachobjectisit’sowncluster

• Themostsimilar objectsarethengroupedtogetherintoasinglecluster(withtwoobjects)

We will find that what we mean by similar will change depending on the method

AgglomerativeClusteringMethods• Theremainingstepsinvolvemergingtheclustersaccordingthesimilarityordissimilarityoftheobjectswithintheclustertothoseoutsideofthecluster

• Themethodconcludeswhenallobjectsarepartofasinglecluster

DivisiveClusteringMethods• Divisivehierarchicalmethodsworksintheoppositedirection–beginningwithasingle,n-objectsizedcluster

• Thelargeclusteristhendividedintotwosubgroupswheretheobjectsinopposinggroupsarerelativelydistantfromeachother

• Theprocesscontinuessimilarlyuntilthereareasmanyclustersasthereareobjects

ToSummarize• Soyoucanseethatwehavethisideaofsteps

Ø Ateachsteptwoclusters combine toformone(Agglomerative)

Ø Ateachstepaclusterisdivided intotwonewclusters(Divisive)

MethodsforViewingClusters• Asyoucouldimagine,whenweconsiderthemethodsforhierarchicalclustering,therearealargenumberofclustersthatareformedsequentially

• Oneofthemostfrequentlyusedtoolstoviewtheclusters(andlevelatwhichtheywereformed)isthedendrogram

• Adendrogram isagraphthatdescribesthedifferinghierarchicalclusters,andthedistanceatwhicheachisformed

ExampleDataSet#1• Todemonstrateseveralofthehierarchicalclusteringmethods,anexampledatasetisused

• Datacomefroma1991studybytheeconomicresearchdepartmentoftheunionbankofSwitzerlandrepresentingeconomicconditionsof48citiesaroundtheworld

• Threevariableswerecollected:– Average workinghoursfor12occupations– Priceof112goodsandservicesexcluding rent– Indexofnethourly earningsin12occupations

This is an example of an agglomerate cluster where cities start off in there own group and then are combined

For example, Here Amsterdam and Brussels were combined to form a group

Notice that we do not specify groups, but if we know how many we want…we simply go to the step where there are that many groups

The Cities

Where the lines connect is when those two previous groups were joined

Similarity?• So,wementionedthat:

Ø Themostsimilar objectsarethengrouped togetherintoasinglecluster(withtwoobjects)

• SothenextquestionishowdowemeasuresimilaritybetweenclustersØ Morespecifically,howdoweredefine itwhenaclustercontainsacombinationofoldclusters

• Wefindthatthereareseveralwaystodefinesimilarandeachwaydefinesanewmethodofclustering

AgglomerativeMethods• NextwediscussseveraldifferentwaytocompleteAgglomerativehierarchicalclustering:

Ø SingleLinkageØ Complete LinkageØ Average LinkageØ CentroidØ MedianØ WardMethod

ExampleDistanceMatrix

• Theexamplewillbebasedonthedistancematrixbelow

1 2 3 4 5

1 0 9 3 6 11

2 9 0 7 5 10

3 3 7 0 9 2

4 6 5 9 0 8

5 11 10 2 8 0

SingleLinkage

• Thesinglelinkagemethodofclusteringinvolvescombiningclustersbyfindingthe“nearestneighbor”– theclusterclosesttoanygivenobservationwithinthecurrentcluster

Cluster 1

Cluster 2

Notice that this is equal to the minimum distance of any observation in cluster one with any observation in cluster 3

• Sothedistancebetweenanytwoclustersis:

{ })(min),( jidBAD y,y= For all yi in A and yj in B

Notice any other distance is longer

So how would we do this using our distance matrix?

SingleLinkageExample

• Thefirststepintheprocessistodeterminethetwoelementswiththesmallestdistance,andcombinethemintoasinglecluster.

• Here,thetwoobjectsthataremostsimilarareobjects3and5…wewillnowcombinetheseintoanewcluster,andcomputethedistancefromthatclustertotheremainingclusters(objects)viathesinglelinkagerule.

1 2 3 4 5

1 0 9 3 6 11

2 9 0 7 5 10

3 3 7 0 9 2

4 6 5 9 0 8

5 11 10 2 8 0

• Theshadedrows/columns aretheportionsofthetablewiththedistances fromanobjecttoobject3orobject5.

• Thedistancefromthe(35)clustertotheremainingobjectsisgivenbelow:

d(35),1 =min{d31,d51}=min{3,11}=3d(35),2 =min{d32,d52}=min{7,10}=7d(35),4 =min{d34,d54}=min{9,8}=8

1 2 3 4 5

1 0 9 3 6 11

2 9 0 7 5 10

3 3 7 0 9 2

4 6 5 9 0 8

5 11 10 2 8 0

These are the distances of 3 and 5 with 1. Our rule says that the distance of our new cluster with 1 is equal to the minimum of theses two values…3

These are the distances of 3 and 5 with 2. Our rule says that the distance of our new cluster with 2 is equal to the minimum of theses two values…7

In equation form our new distances are

• Usingthedistancevalues,wenowconsolidateourtablesothat(35)isnowasinglerow/column

d(35)1 =min{d31,d51}=min{3,11}=3d(35)2 =min{d32,d52}=min{7,10}=7d(35)4 =min{d34,d54}=min{9,8}=8

(35) 1 2 4

(35) 0

2 7 9 0

4 8 6 5 0

• Wenowrepeattheprocess,byfinding thesmallestdistancebetween within thesetofremaining clusters

• Thesmallestdistanceisbetween object1andcluster(35)

• Therefore, object1joinscluster(35),creatingcluster(135)

(35) 1 2 4

(35) 0 3 7 8

1 3 0 9 6

2 7 9 0 5

4 8 6 5 0

The distance from cluster (135) to the other clusters is then computed:

d(135)2 = min{d(35)2,d12} = min{7,9} = 7d(135)4 = min{d(35)4,d14} = min{8,6} = 6

• Usingthedistancevalues,wenowconsolidateourtablesothat(135)isnowasinglerow/column

(135) 2 4

(135) 0

4 6 5 0

d(135)2 = min{d(35)2,d12} = min{7,9} = 7d(135)4 = min{d(35)4,d14} = min{8,6} = 6

• Wenowrepeattheprocess,byfindingthesmallestdistancebetweenwithinthesetofremainingclusters

• Thesmallestdistanceisbetweenobject2andobject4

• Thesetwoobjectswillbejoinedtofromcluster(24)

• Thedistancefrom(24)to(135)isthencomputed

d(135)(24) =min{d(135)2,d(135)4}=min{7,6}=6

• Thefinalclusterisformed(12345)withadistanceof6

(135) 2 4

(135) 0

4 6 5 0

TheDendogram

For example, here is where 3 and 5 joined

CompleteLinkage

• Thecompletelinkagemethodofclusteringinvolvescombiningclustersbyfindingthe“farthestneighbor”–theclusterfarthesttoanygivenobservationwithinthecurrentcluster

• Thisensuresthatallobjectsinaclusterarewithinsomemaximumdistanceofeachother

Cluster 1

Cluster 2

ClusteringinR

Dendrogram ofRExamplewithFlowers

OtherClassicalClusteringMethods

• K-meansclusteringwillpartitiondataintomorethanonegroup…butyouhavetospecifyhowmanygroups

• ThemethodusestheMahalanobisdistanceofeachobservationtothegroupmeananditerativelyswitchesgroupmembershipuntilnooneswitches

• Rhasabuilt-inK-meansmethod

FINITEMIXTUREMODELS

FiniteMixtureModels:ClusteringwithLikelihoods• Finitemixturemodelsaremodelsthatattempttodeterminethenumberofclusters(calledclasses)inasetofdata

Ø Finite:countablymanyclassesØ Mixture:distributionofthedatacomesfromamixtureofobservationsfromdifferentclasses

• FMMsuselikelihoodstodetermineclassmembershipØ Alikelihood(pdf)isasimilarity(inversedistance)Ø Higherlikelihoodà observationismorelikeclassaverageàmorelikelyobservationcomesfromclass

Ø Lowerlikelihoodà observationislesslikeclassaverageà lesslikelyobservationcomesfromclass

• FMMshaveveryspecificnames:Ø Latentclassanalysis:FMMwheredataarebinaryandindependentwithinclassØ Factormixturemodel:FMMwheredataarecontinuousandhaveafactorstructurethatyieldsawithin-classcovariancematrix

FMMMarginalLikelihood Function

• ThegeneralFMMmarginallikelihoodfunctionis:

𝑓 𝒙0 =1𝜂3𝑓(𝒙0|𝑐)8

39'• 𝑓(𝒙0|𝑐) iswithinclasslikelihoodfunction(pdf)

Ø Eachclasshasitsowndistribution

• 𝜂3 istheprobabilityanyobservationcomesfromclasscØ The“mixing”proportion

• Thesumiswherethemixturegetsitsname:themarginaldatalikelihoodisablend(mixture)ofeachclass’likelihood

FMMsinR:Gaussian(MVN)Mixtureswithmclust

• Rhasseveralpackagesformixtures:Iwillonlyshowyouone– Mclust

• Wewillattempttodeterminethenumberofclassesinourflowerdatausingonlythefourmeasurements

• First,weletmclust runanumberofmodelstodeterminethe“bestfitting”(lowestBICvalue)

PlotofBICsfrommclust

ExampleOutputfromOneSolution

ProblemswithFMMs

• Exploratory(findingnumberofclasses)usesofmixturesareoftensuspectasnearlyanythingcanbringaboutadditional(spurrious/false)classes

• Forinstance:ifourdatawerenotMVNwithinclass,wewouldlikelyseemoreclassesthanweneed

• Findingtherightmodelisoftendifficult

WRAPPINGUP

WrappingUp

• Thisclassonlyscratchedthesurfaceofwhatcanbedonetoclusterorclassifydata

• Clustering:DeterminingthenumberofclustersindataØ Agglomerative/Hierarchical clusteringØ K-MeansØ Finitemixturemodels

• Classification:PuttingobservationsintoknownclassesØ Classicalapproach:Discriminantanalysis(notshowntoday;lda()Rfunction)Ø Modernapproach:FMMs

• Myclassnextsemester(DiagnosticTesting)describesconfirmatoryusesofFMMs

clustering and classification methods: a brief introduction€¦ · Ø divisive hierarchical...

Documents

variable selection methods for model-based clustering ·...

data clustering 2 – k means contd & hierarchical methods...

an overview of clustering methods

other clustering methods

cluster analysis - technical university of denmark primary...

clustering methods with r

robust clustering-based segmentation methods for

lec13: clustering based medical image segmentation methods

information theoretic methods for biometrics, clustering

introduction to data clustering 1 - unige.it...introduction...

hierarchical clustering methods

clustering introduction preprocessing: dimensional reduction...

clustering methods

clustering the web comparing clustering methods in swedish

comparison of clustering methods

methodology clustering methods · 329 methodology review:...

clustering. overview definition of clustering existing...

spatial clustering methods

a survey of clustering methods via optimization …

chapter 15 clustering methods - citeseerx