finding unsupervised learning - cornell universityunsupervised learning pantelis p. analytis...
TRANSCRIPT
![Page 1: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/1.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Unsupervised Learning
Pantelis P. Analytis
March 19, 2018
1 / 40
![Page 2: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/2.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
1 Introduction
2 Finding structure in graphs
3 Clustering analysis
4 Dimensionality reduction
2 / 40
![Page 3: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/3.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
What’s unsupervised learning?
Most of the data available on the internet do not havelabels. How can we make sense of it?
3 / 40
![Page 4: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/4.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Finding structure in graphs
4 / 40
![Page 5: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/5.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Finding structure in graphs
5 / 40
![Page 6: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/6.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Organizing the web
First attempts to organize the web were based on humancurated directories (Yahoo, looksmart).People also used methods from information retrieval touncover relevant documents.Yet he web has a deluge of untrusted documents, spam,random webpages, advertisements etc.
6 / 40
![Page 7: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/7.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Elements of the PageRank algorithm
Solution: Use social feedback to rank the quality ofdocuments.You can see links as vote. A page is more important whenit has more incoming links.For instance www.nytimes.com has numerous incomingnotes, as opposed to www.inkefalonia.grLinks from important questions countmore—Recursiveness.
7 / 40
![Page 8: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/8.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
The iterative PageRank algorithm
At t = 0, assume an initial probability distribution:
PR(pi ; 0) = 1N .
At each time step, the computation yields:
PR(pi ; t + 1) = 1−dN + d
∑pj∈M(pi )
PR(pj ;t)L(pj )
8 / 40
![Page 9: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/9.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
At t = 0, assume an initial probability distribution:
PR(pi ; 0) = 1N .
At each time step, the computation yields:
PR(pi ; t + 1) = 1−dN + d
∑pj∈M(pi )
PR(pj ;t)L(pj )
9 / 40
![Page 10: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/10.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
10 / 40
![Page 11: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/11.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Page Rank Equilibrium
11 / 40
![Page 12: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/12.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
PageRank: The spider trap
12 / 40
![Page 13: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/13.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
PageRank: The spider trap
13 / 40
![Page 14: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/14.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
The Scaled PageRank algorithm
Scaled PageRank Update Rule
Apply basic PR rule.
Scale all values down by factor s.
Divide the 1-s leftover units of PR evenly over nodes.14 / 40
![Page 15: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/15.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
What’s unsupervised learning?
Most of the data available on the internet do not havelabels. How can we make sense of it?
15 / 40
![Page 16: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/16.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Clustering: the k-means algorithm
Input: K , set of points x1, ..., xn
Place centroids c1, ..., ck randomly
Then repeat until convergence:
For each point xi find the nearest centroid cj and assignthat point to that clusterIn math notation: argminj D(xi , cj)For each cluster j = 1, ...,K find the new centroid of allpoints xi assigned to cluster j in previous step.In math notation: cj(a) = 1/nj
∑xi→cj
xi (a) for a = 1, ..., d
Stop when the algorithm has converged i.e. none of theitems changes cluster.
16 / 40
![Page 17: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/17.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Converging to clusters
17 / 40
![Page 18: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/18.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
How do we select k?
There are diminishing returns in the size of differentclusters.
An intuitive approach suggests picking the after which thedistance reduction flattens out.
18 / 40
![Page 19: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/19.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Hierarchical Clustering
19 / 40
![Page 20: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/20.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Agglomerative vs. divisive
Agglomerative clustering starts from the bottom andmoves to larger clusters.
Divisive clustering starts with one cluster which isgradually disintegrated into smaller ones.
20 / 40
![Page 21: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/21.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Agglomerative vs. divisive
How do we determine the nearness of clusters?
Complete linkage: D(X ,Y ) = maxx∈X ,y∈Y d(x , y)
Single linkage: D(X ,Y ) = minx∈X ,y∈Y d(x , y)
Average linkage: 1|X ||Y |
∑x∈X
∑y∈Y d(x , y).
21 / 40
![Page 22: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/22.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Agglomerative Clustering
Pick k upfront, stop when we have k clusters.
Stop when a cluster with low cohesion is created(diameter, radius or density-based approaches).
22 / 40
![Page 23: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/23.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 0: Randomly position the grid’s neurons in the dataspace.
23 / 40
![Page 24: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/24.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 1: Select one data point, either randomly orsystematically cycling through the dataset in order
24 / 40
![Page 25: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/25.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 2: Find the neuron that is closest to the chosen datapoint. This neuron is called the Best Matching Unit(BMU).
25 / 40
![Page 26: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/26.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 3: Move the BMU closer to that data point. Thedistance moved by the BMU is determined by a learningrate, which decreases after each iteration.
26 / 40
![Page 27: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/27.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 4: Move the BMU’s neighbors closer to that datapoint as well, with farther away neighbors moving less.Neighbors are identified using a radius around the BMU,and the value for this radius decreases after each iteration.
27 / 40
![Page 28: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/28.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
28 / 40
![Page 29: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/29.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
29 / 40
![Page 30: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/30.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
30 / 40
![Page 31: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/31.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
31 / 40
![Page 32: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/32.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Update the learning rate and BMU radius, beforerepeating Steps 1 to 4. Iterate these steps until positionsof neurons have been stabilized.
32 / 40
![Page 33: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/33.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
33 / 40
![Page 34: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/34.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
34 / 40
![Page 35: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/35.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
35 / 40
![Page 36: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/36.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
36 / 40
![Page 37: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/37.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
Often used to accelerate supervised learning.
Visualization
37 / 40
![Page 38: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/38.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
38 / 40
![Page 39: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/39.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
39 / 40
![Page 40: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ec9357e708231d440edec/html5/thumbnails/40.jpg)
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Dimensionality reduction in recommender systems
40 / 40