data warehousing lecture-31 supervised vs. unsupervised learning virtual university of pakistan...
TRANSCRIPT
![Page 1: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/1.jpg)
Data Warehousing
Lecture-31Supervised vs. Unsupervised Learning
Virtual University of PakistanVirtual University of Pakistan
Ahsan AbdullahAssoc. Prof. & Head
Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]
![Page 2: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/2.jpg)
Data Structures in Data Mining
• Data matrix– Table or database – n records and m attributes, – n >> m
C1,1 C1,2 C1,3 C1,m
C2,1 C2,2 C2,3 C2,m
C3,1 C3,2 C3,3 C3,m
Cn,1 Cn,2 Cn,3 Cn,m
…
.
.
.…
.
.
.
1 S1,2 S1,3 S1,n
S2,1 1 S2,3 S2,n
S3,1 S3,2 1 S3,n
Sn,1 Sn,2 Sn,3 1
…
.
.
.…
.
.
.
• Similarity matrix– Symmetric square matrix– n x n or m x m
![Page 3: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/3.jpg)
Main types of DATA MINING
Supervised• Bayesian Modeling • Decision Trees• Neural Networks• Etc.
Unsupervised• One-way Clustering• Two-way Clustering
Type and number of classes are NOT known in advance
Type and number of classes are known in advance
![Page 4: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/4.jpg)
Clustering: Min-Max Distance
Age
Salary
20 40 60
outlier Inter-cluster distances are maximized
Intra-cluster distances are
minimized
![Page 5: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/5.jpg)
How Clustering works?
![Page 6: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/6.jpg)
One-way clustering example
INPUT OUTPUT
Black spotsare noise
White spotsare missing
data
![Page 7: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/7.jpg)
Data Mining Agriculture data
INPUT Clustered OUTPUT
clusters
![Page 8: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/8.jpg)
Which class?
Classifier (model)
Unseen Data
Classification
![Page 9: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/9.jpg)
Output
ConfidenceLevel
Inputs
How Classification work?
![Page 10: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/10.jpg)
Classification Process (1): Model Construction
TrainingTrainingDataData
NAME Time Items GenderMoin 10 2 MMunir 16 3 MMeher 15 1 FJaved 5 1 MMahin 20 1 FAkram 20 4 M
ClassificationClassificationAlgorithmsAlgorithms
IF time/items >= 6THEN gender = ‘F’
ClassifierClassifier(Model)(Model)
(observations, measurements, etc.)
Relationship between shopping time and items bought
![Page 11: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/11.jpg)
Classification Process (2): Use the Model in Prediction
TestingTestingDataData Unseen DataUnseen Data
(Firdous, Time= 15 Items = 1)
ClassifierClassifier
Gender?NAME Time Items GenderTahir 20 1 MYounas 11 2 MYasin 3 1 M
![Page 12: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/12.jpg)
Clustering vs. Cluster Detection
![Page 13: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/13.jpg)
Clustering vs. Cluster Detection Example
AA BB
![Page 14: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/14.jpg)
The K-Means Clustering
![Page 15: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/15.jpg)
The K-Means Clustering: Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
A B
D C
![Page 16: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f155503460f94c2a8dc/html5/thumbnails/16.jpg)
The K-Means Clustering: Comment