quiz 2 review - northeastern universityquiz 2 review lecture 4 october 26, 2017 ... •of the four...
TRANSCRIPT
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Quiz 2 Review
Lecture 4
October 26, 2017
Quiz 2 Review
1
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Quiz• Format: Blackboard, so remember…– Bring your laptop charged– Install LockDown Browser
• No notes/programs/web/calculator/etc– Scratch paper will be supplied– Bring a writing utensil
• Content: all of clustering
October 26, 2017
Quiz 2 Review
4
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Of the four algorithms we covered (K-
Means, agglomerative, DBSCAN, GMMs), which (if any) could have led to the following clustering (to 3 decimal places of precision)
October 26, 2017
Quiz 2 Review
5
c1 c2 c3p1 0 0 1p2 0 1 0p3 1 0 0p4 0 1 0
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• K-Means• DBSCAN• GMMs
October 26, 2017
Quiz 2 Review
6
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Of the four algorithms we covered (K-
Means, agglomerative, DBSCAN, GMMs), which (if any) could have led to the following clustering (to 3 decimal places of precision)
October 26, 2017
Quiz 2 Review
7
c1 c2 c3p1 0 1 1p2 0 1 1p3 1 0 1p4 1 0 1
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Agglomerative
C3 = {C1, C2}
October 26, 2017
Quiz 2 Review
8
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Of the four algorithms we covered (K-
Means, agglomerative, DBSCAN, GMMs), which (if any) could have led to the following clustering (to 3 decimal places of precision)
October 26, 2017
Quiz 2 Review
9
c1 c2 c3p1 0 1 1p2 0 1 1p3 1 0 1p4 0 0 0
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• None–What are the properties of this clustering?
October 26, 2017
Quiz 2 Review
10
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• We learned about the agglomerative
clustering algorithm in detail.
• In what context have we learned that divisive is also useful?
October 26, 2017
Quiz 2 Review
11
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• A method for K-Means initialization!
October 26, 2017
Quiz 2 Review
12
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Agree/Disagree + WHY!?– It is not an effective strategy to evaluate using
a function over the data/clusters that is different than that optimized by your clustering algorithm
October 26, 2017
Quiz 2 Review
13
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Disagree!– K-Means: SSE vs Silhouette (amongst others)
October 26, 2017
Quiz 2 Review
14
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Agree/Disagree– K-Means is guaranteed to converge and,
when it does, the clustering is globally optimum.
October 26, 2017
Quiz 2 Review
15
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Disagree!– Converge, yes, but to a local optimum
October 26, 2017
Quiz 2 Review
16
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Describe the meaning and context of the
following expression in context of K-Means, where x refers to the dataset and rthe one-hot membership variables
October 26, 2017
Quiz 2 Review
17
Pn rnkxnPn rnk
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• The M-Step!!
October 26, 2017
Quiz 2 Review
18
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Agree/Disagree– Assuming K, # of dimensions, and N are large
enough, it is reasonable to assume that a small number of random Forgy restarts will result in good K-Means clusterings
October 26, 2017
Quiz 2 Review
19
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Disagree!– The more likely there is a bad distribution
w.r.t. centroids/clusters
October 26, 2017
Quiz 2 Review
20
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Increasing K will generally lead to…
a) Monotonically higher SSEb) Better clusterings
October 26, 2017
Quiz 2 Review
21
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer
!
October 26, 2017
Quiz 2 Review
22
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Agree/Disagree– Identifying isolated branches is a reasonable
approach to finding good cluster-number cutoffs
October 26, 2017
Quiz 2 Review
23
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Disagree!– Think about relative dendogrammatic heights
October 26, 2017
Quiz 2 Review
24
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Agree/Disagree– A point, q, is density-reachable from another
point, p, if q is in the eps-neighborhood of p, N!(p)
October 26, 2017
Quiz 2 Review
25
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Disagree–What condition is missing about p?
October 26, 2017
Quiz 2 Review
26
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• What algorithm(s) could have produced
the following clustering, where color (RGB) indicates cluster membership
October 26, 2017
Quiz 2 Review
27
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• GMMs!
October 26, 2017
Quiz 2 Review
28
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Provide approximate values of
responsibility for each distribution in the following diagram (close to 0/1, middle)
October 26, 2017
Quiz 2 Review
29
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Red: close to 1• Blue: close to 0
October 26, 2017
Quiz 2 Review
30
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Agree/Disagree–When running EM for GMMs, you stop
iterating once all the responsibilities have stopped changing
October 26, 2017
Quiz 2 Review
31
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Disagree– Think !
October 26, 2017
Quiz 2 Review
32
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Good Clustering?Proximity Matrix
October 26, 2017
Quiz 2 Review
33
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Question• Agree/Disagree– High accuracy implies high precision and/or
recall
October 26, 2017
Quiz 2 Review
35
CS6220 – Data Mining Techniques� �� Fall 2017� ��Derbinsky
Answer• Disagree
October 26, 2017
Quiz 2 Review
36
SameClass DiffClassSameCluster 0 25DiffCluster 0 175