quiz 2 review - northeastern universityquiz 2 review lecture 4 october 26, 2017 ... •of the four...

33
CS6220 – Data Mining Techniques Fall 2017 Derbinsky Quiz 2 Review Lecture 4 October 26, 2017 Quiz 2 Review 1

Upload: others

Post on 27-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Quiz 2 Review

Lecture 4

October 26, 2017

Quiz 2 Review

1

Page 2: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Quiz• Format: Blackboard, so remember…– Bring your laptop charged– Install LockDown Browser

• No notes/programs/web/calculator/etc– Scratch paper will be supplied– Bring a writing utensil

• Content: all of clustering

October 26, 2017

Quiz 2 Review

4

Page 3: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Of the four algorithms we covered (K-

Means, agglomerative, DBSCAN, GMMs), which (if any) could have led to the following clustering (to 3 decimal places of precision)

October 26, 2017

Quiz 2 Review

5

c1 c2 c3p1 0 0 1p2 0 1 0p3 1 0 0p4 0 1 0

Page 4: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• K-Means• DBSCAN• GMMs

October 26, 2017

Quiz 2 Review

6

Page 5: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Of the four algorithms we covered (K-

Means, agglomerative, DBSCAN, GMMs), which (if any) could have led to the following clustering (to 3 decimal places of precision)

October 26, 2017

Quiz 2 Review

7

c1 c2 c3p1 0 1 1p2 0 1 1p3 1 0 1p4 1 0 1

Page 6: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Agglomerative

C3 = {C1, C2}

October 26, 2017

Quiz 2 Review

8

Page 7: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Of the four algorithms we covered (K-

Means, agglomerative, DBSCAN, GMMs), which (if any) could have led to the following clustering (to 3 decimal places of precision)

October 26, 2017

Quiz 2 Review

9

c1 c2 c3p1 0 1 1p2 0 1 1p3 1 0 1p4 0 0 0

Page 8: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• None–What are the properties of this clustering?

October 26, 2017

Quiz 2 Review

10

Page 9: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• We learned about the agglomerative

clustering algorithm in detail.

• In what context have we learned that divisive is also useful?

October 26, 2017

Quiz 2 Review

11

Page 10: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• A method for K-Means initialization!

October 26, 2017

Quiz 2 Review

12

Page 11: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Agree/Disagree + WHY!?– It is not an effective strategy to evaluate using

a function over the data/clusters that is different than that optimized by your clustering algorithm

October 26, 2017

Quiz 2 Review

13

Page 12: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Disagree!– K-Means: SSE vs Silhouette (amongst others)

October 26, 2017

Quiz 2 Review

14

Page 13: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Agree/Disagree– K-Means is guaranteed to converge and,

when it does, the clustering is globally optimum.

October 26, 2017

Quiz 2 Review

15

Page 14: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Disagree!– Converge, yes, but to a local optimum

October 26, 2017

Quiz 2 Review

16

Page 15: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Describe the meaning and context of the

following expression in context of K-Means, where x refers to the dataset and rthe one-hot membership variables

October 26, 2017

Quiz 2 Review

17

Pn rnkxnPn rnk

Page 16: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• The M-Step!!

October 26, 2017

Quiz 2 Review

18

Page 17: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Agree/Disagree– Assuming K, # of dimensions, and N are large

enough, it is reasonable to assume that a small number of random Forgy restarts will result in good K-Means clusterings

October 26, 2017

Quiz 2 Review

19

Page 18: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Disagree!– The more likely there is a bad distribution

w.r.t. centroids/clusters

October 26, 2017

Quiz 2 Review

20

Page 19: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Increasing K will generally lead to…

a) Monotonically higher SSEb) Better clusterings

October 26, 2017

Quiz 2 Review

21

Page 20: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer

!

October 26, 2017

Quiz 2 Review

22

Page 21: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Agree/Disagree– Identifying isolated branches is a reasonable

approach to finding good cluster-number cutoffs

October 26, 2017

Quiz 2 Review

23

Page 22: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Disagree!– Think about relative dendogrammatic heights

October 26, 2017

Quiz 2 Review

24

Page 23: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Agree/Disagree– A point, q, is density-reachable from another

point, p, if q is in the eps-neighborhood of p, N!(p)

October 26, 2017

Quiz 2 Review

25

Page 24: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Disagree–What condition is missing about p?

October 26, 2017

Quiz 2 Review

26

Page 25: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• What algorithm(s) could have produced

the following clustering, where color (RGB) indicates cluster membership

October 26, 2017

Quiz 2 Review

27

Page 26: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• GMMs!

October 26, 2017

Quiz 2 Review

28

Page 27: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Provide approximate values of

responsibility for each distribution in the following diagram (close to 0/1, middle)

October 26, 2017

Quiz 2 Review

29

Page 28: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Red: close to 1• Blue: close to 0

October 26, 2017

Quiz 2 Review

30

Page 29: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Agree/Disagree–When running EM for GMMs, you stop

iterating once all the responsibilities have stopped changing

October 26, 2017

Quiz 2 Review

31

Page 30: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Disagree– Think !

October 26, 2017

Quiz 2 Review

32

Page 31: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Good Clustering?Proximity Matrix

October 26, 2017

Quiz 2 Review

33

Page 32: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Question• Agree/Disagree– High accuracy implies high precision and/or

recall

October 26, 2017

Quiz 2 Review

35

Page 33: Quiz 2 Review - Northeastern UniversityQuiz 2 Review Lecture 4 October 26, 2017 ... •Of the four algorithms we covered (K-Means, agglomerative, DBSCAN, GMMs), ... following clustering

CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky

Answer• Disagree

October 26, 2017

Quiz 2 Review

36

SameClass DiffClassSameCluster 0 25DiffCluster 0 175