Transcript
Page 1: Departamento de Electr onica, Telecomunica˘c~oes …Departamento de Electr onica, Telecomunica˘c~oes e Inform atica EXPLORAC˘AO de DADOS & DATA MINING~ Exercises: Clustering- k-means

Departamento de Electronica, Telecomunicacoes e Informatica

EXPLORACAO de DADOS & DATA MINING

Exercises: Clustering- k-means and hierarchical clustering

1. The following table and plot describe 16 objects with two attributes

ID attribute ]1 attribute ]22 0.8 9.83 1.2 11.64 2.8 9.65 3.8 9.96 4.4 6.57 4.8 1.18 6.0 19.99 6.2 18.510 7.6 17.411 7.8 12.212 6.6 7.713 8.2 4.514 8.4 6.915 9.0 3.416 9.6 11.1

The data set has to be clustered intro k = 3 using k-means algorithm

(a) Assuming that the cluster centroids 1 are

attribute ]1 attribute ]2cluster 1 3.8 9.9cluster 1 7.8 12.2cluster 3 6.2 18.5

distribute the data set by the cluster using euclidian distance as measure of proximity.

(b) After the distribution of the data the cluster centroids might have different values.Compute these values.

(c) The objects stay in the same groups with the new values of the centroids?

2. The table, on the right, the distance matrix between 6 objects (clusters) of an hierarchicalclustering algorithm. The table, on left, represents the euclidian distance between pairs ofobjects in the data set.

1Representing a random choice of objects in the data set

1

Page 2: Departamento de Electr onica, Telecomunica˘c~oes …Departamento de Electr onica, Telecomunica˘c~oes e Inform atica EXPLORAC˘AO de DADOS & DATA MINING~ Exercises: Clustering- k-means

A B B D E FA 0 12 6 3 25 4B 12 0 19 8 14 15C 6 19 0 12 5 18D 3 8 12 0 11 9E 25 14 5 11 0 7F 4 15 18 9 7 0

AD B B E FAD 0 ? ? ? ?B ? 0 19 14 15C ? 19 0 5 18E ? 14 5 0 7F ? 15 18 7 0

(a) The table on the right represents the new data matrix after the first merging.

• What was the proximity measure to merge the two objects (A,D).

• Compute the missing values (marked as ?) assuming that the same proximitymeasure between clusters.

(b) Design the dendogram corresponding to the hierarchical clustering process.

3. An Health insurance provider wants to identify the individuals in risk of having coronarydiseases. The data set is stored in coronary.csv and has the attributes weight, cholesterol,gender . Download the data and try to understand the data using kmeans in RapidMiner.

2

Page 3: Departamento de Electr onica, Telecomunica˘c~oes …Departamento de Electr onica, Telecomunica˘c~oes e Inform atica EXPLORAC˘AO de DADOS & DATA MINING~ Exercises: Clustering- k-means

~<:bc...<k ~\~C>'aS- 1-

L_~~\n<j_

-- -------

__ ~-~~~~~o~~---------

ch'ji= ~ :-;y.,_~,--,-,_L. _

--- - -- ------

,--'- l.1::;o~~)'-, í.3.1,.q.

--- ------------------------------- --- -- ----

~ ~8,\LÜ--r--- --- ~

d.~i-=~et±L'ili- \LÜ

Q.\.us\-trL ~c...G.7,\~, ç )

ç\'~~~.2')·'l-l SL~1BS.''- __

r--- ---,__ d_ 42,.Q-=-'---'S'-'-3f,"-"'-- _ ~U~<; _

-----1--- -I

_\ I . ( I-_Q)(X<:;iUU:', 2-G2V~-1JG- '\~

__ ~1 7'S t- V1 IÔ_' _

~-- ----~

------- ---- -

--4--'~'~~~:9~GL)-----

\___ "''' ~ +00

-----11------ ~--

--------

J()('b\:' ~1Ô.'2Jl..>,. IQC,-1?;'}

__ =_~~-\- ""'-o.. L__

\rlC~U.ta. ..~l,.. (s~-=U.,--?JLt _

-~ ? S-.b '4

-_ ~ 'i.5L _

--~ cl,s.x j' =~~1:nlt(ç,.s -)21 l.

= ~ 11S(é .•. ? Z .49(2C,j)~ ~-<~.zl<' .•..1.E,.~1e...5 lo

~U,'-1\..j4

Page 4: Departamento de Electr onica, Telecomunica˘c~oes …Departamento de Electr onica, Telecomunica˘c~oes e Inform atica EXPLORAC˘AO de DADOS & DATA MINING~ Exercises: Clustering- k-means

;1"',+l '-1.0,. ~1 ') --

--\ 1 0\ \ . L,----

rl('i,L.\:. \\t'-l.,?,·",g\'--\- (l.,\-Q.C(') /D['':00~01.'i\''3'?,y~('\)-'\2,2) dC)(Ij'l :o-A\\J.~\-c..d-\- (~.'\ .. '\.(i\s\<'v

= ~ 'IVn·.l;1.j " \; o., .•.. '17"'.2, " ~ 1\0,'; .•. '2,02.~b

'!e. R ~ç "'- '\1.'-1 c ::;: 1l.l;

,...,,'.r-. ( 8,9·,,: . ~1.lC n l>. ') " R í"." -, ; ri -. -) riu..! ~...,,, 'I,

---

iA f'Cc.(\ . 1Q 0\

d('V.\)\:~(c:..C'\ -"<.'eY"~(1Q.Q~)L

-do -Ih~\«'_.C'\-.9\~' 1',oO-".?\' A, " "' ",r(rJI ..c..1'11. ~hc,.a--1p.,<:\~ " ...,;

-::;-~ tiL,", 1(')(\ ~ I", 2 -I.>,.<j"'l,19 '" A -1,0\1 .•.. /) ,qlô

'" 'lO 1 ~':Io. :::: -1 L\ "\ --

"'- (,\0." '1.c,. I ~') ': \ l. 1 ,<".IA -, cu,~'•.•,,?,

. ,.IQ! Or.~' «\E' )--

,rho( ".ç, 11lJ, )

rieV' ,,\, ~ c, _ '" ?, )'.;. (,\+.: _I) ,J)l\

,.l1'Y.")" \(~ r_->. 91" .(,,. ..C.C,)'- dcy.l.I",\C"l.t:-r...,,:. .•. (114_IÇl,,$'\1.'-' '-' ...,

"'~~'-l + 'OC..7Ç' ~r::-:. .l,.?~ "I., ",""qc:;.l.'1""

::::. R "-\(") ~" ?C) -. '\':\1'1

",,', -, ! " l (; , ,-"n . 1.+1"'1 '" ~-,.s:>, "'\1''1 )A .~,,?

,,-h 1( .l, \--

',rl1·?I(. r..1.~')

'--

r.lCy.\.,\ ~\ (r_c.._'.~)1.+h.":l-C\.q)l.r:---~-

~';y,,, ~ ,\ (e ,&.-(;,,7,..•.•. ( •. +-~!l.,,)di>! ,1; Ii ( C. .r~--:u;?- ) 1.h,~ .. \? 7)'-...,;'.~ ..;

,,~ ':i..Rl +- l lU ~ ~ -I.\. l< •.. 70.2<;: ~ ~ (\.~C. '.-1-1<. .(;;1.;

""" 1.,," ""~ G(' ~ 1" RO

\'0\0 (3,5 , '" GS , 10.1',,) = 3 S- ,'c\-; 1 -) c\.\ú.\.fo j!

,.dn...l...2 ? l" ')

- ,-

d(XI'j)~ rc;:"l:?,)Lt<'-i<; ~,ÚL d\X,~) O=~(8 ? -Ç,.2 )'-.>;- ('-IS .. \1', ~\,4(y,I,\" ~( R 2 -., V',.. C"" _c, C;y.~' - -" ,I 1C, '>.t:.l. 70 1Ç. '" ~",'IG +~c. <.0 "f u,-.t ...•o,e;

,--

:::! ç; C\~ ~ 1.•. :::: 'li.. ,\1.{

""'.,'<'\ (c.. q c. . A 3 ' 1'~ ,•••.j ~ç,8Ç, (,\.1'" -) ~u.,<'k"-\I

Page 5: Departamento de Electr onica, Telecomunica˘c~oes …Departamento de Electr onica, Telecomunica˘c~oes e Inform atica EXPLORAC˘AO de DADOS & DATA MINING~ Exercises: Clustering- k-means

__ ~=- ~.d:ili. -'\lUl"

_ __ ---'-'"{ 4.Rl; -"-'\";,'-\.Th

.ckx~)_::.k~ '?l~ -I-iç,~«..c..)L

__ ~.~G....t~O, __

~~)=- ~"-1Ji.t:Ú.G,~ui _r~~() .%_"=--28 ~oct.........

-------

---- ---I

------+-~~~~.~----------------------------------------------

- -- ----ll----

clL':c ; \ ,~~q ...ow..1..__ __'_~,(!~) =- ~(9.o'~.~)'l..(') \;:~ __ -º-Ur •.~~~.~t,....:l.."-'.('--'-'-"--'--U-'-'~l..--

~j '-"1,0 .l.~ ::.f'1.l.Iy. -'o- :>-=t " " _ _ ~':\ • ,' .......= L-cl) .o í

----------------------------------------- --

------ -----.;1----

-lI----,-"",-CÁ'jl.=..fuG -?, 1',)L.1. G.u..:.. c..~ ~0~ l Gk'"} ,?\1.. •. (11.1 -.1Llk..JL1.. ...!.D.""'-';'\-'-) ~ (g.,~, ....•~'""'---'''-'--'-,~~'-'t-

Jn:G~ __ __.:J 3.1.=--""L' ..:.~\~1<.....1'__ _\

_______ .......".~=-~L.13SG + '>.'--A..3:f,

------.1-- ~<: .37

_______ ~-----~L!~~~L--

_____ -----il-- __ -'--~L,·__u_i o'o,--'JL-":....- ~ili:<;,LcLU.~~.~GL ~"'-roiJ~1 .1..B.il

~". .

-------il--

____ ~~~nU_~~ ~~ ~

____~~,~~~--------------~(~,~~.~~~~--------~_T~1q~.qu)---

.c.. ,\11)I

') G. ~-:l.~I

----~----~~~ -------- -----

____ -'-( -",Co. "",-'--'-'LL...... _

r--r-r- i'-------.-----ll-----------\--'.).oci..'....-~)

('1.(' . ?,

------ -- - --

i .-

Page 6: Departamento de Electr onica, Telecomunica˘c~oes …Departamento de Electr onica, Telecomunica˘c~oes e Inform atica EXPLORAC˘AO de DADOS & DATA MINING~ Exercises: Clustering- k-means

J

(' \ <. '"', ror,"" ; J '" \ (,<"1@'-\.l> " < p' ''''\-»r\<:' (Y',(', ~,\o{"\.< . ',",,,,, --

("-.)0 enill ,\o e.--c> -; ........~( ~o..\() "m\(' u \rw:,

')

rl' /.l. "~,,, .••.·,An ""\a ..\ o..D; 'C. r\"\ci',,,:n : .~ "'''N-D ",~h ~'C

AQ .B ~é E

A\, O R ç, -'\1 '-I IJ-O

P, X O 4°, '" "c. c. 'IC\ n < ",c: ..,' 1 <r S C) -:L

F L'1 "

",p. +- () --

L,l 41"\Ç u (, ç:

Aí'> ç Q ~ 'n .. A-I'> ç

'" R O '\~ ~Lj

C. -;; -\ o, '> S

ç ~ ~ LI .':> <"'I

I;."ç í' .

AQÇ O 8 -6

~ R O ~L.t

"t:.- G, , '1 O --

--

AO PC (:- G

c r-v C ~r O ~

,7. ~ 1'1

A D r-: r c: r

A o - n C C

I ,---- - -- - .J

!

i

\ T-- - -

I

I


Top Related