![Page 1: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/1.jpg)
DataBases & Data Mining Joined Specialization Project„Data Mining Classification
Tool”
By Mateusz Żochowski
& Jakub Strzemżalski
![Page 2: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/2.jpg)
2
Agenda
General description of the problem Functionality Data Mining aspects
Algorithm and optimisation Data Base aspects
General entities scheme
![Page 3: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/3.jpg)
3
General Description
Universal Tool Different kinds of objects (e.g.
preprocessed photos, hospital patients data)
Finding similar objects Decision problems
![Page 4: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/4.jpg)
4
Functionality
Independent system – user operated Using sets of data already provided or
uploading new types Influence on the way data is processed
Possible usage in bigger systems as a processing engine Additional module used as a helping
tool in more complex systems
![Page 5: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/5.jpg)
5
General Use Case
![Page 6: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/6.jpg)
6
Data Mining General Ideas
Description of a object Definition of a distance
K-NN algorithm Brief explanations of the algorithm
Optimization Problem of comparing large number of
objects Optimized solution – using grouping idea
![Page 7: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/7.jpg)
7
Definitions Objects
![Page 8: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/8.jpg)
8
K-NN
K – Nearest Neighbors Idea standing behind k-nn Aim - finding k-similar objects to the
one we are analyzing and eventually assigning appropriate decision
Method - calculating distance from analyzed object to the others in our database and finding the closest ones
![Page 9: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/9.jpg)
9
K-NN Graphical representation
![Page 10: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/10.jpg)
10
Definitions Distance
Calculations in multidimensional space Coefficients
Alfa wi – weights – underlining importance of
particular attributes n – number of all the attributes
)()( 21
1
21 *),( oaoa ii
in
i
iwOOD
![Page 11: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/11.jpg)
11
Optimalisation
The reason – cost of multidimensional distance computation for 1-all elements
Solution – improved Knn Result – better efficiency because
of reduced number of distance computations due to narrowed set of possibly similar objects
![Page 12: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/12.jpg)
12
Step 1 - Group-oriented plane division
![Page 13: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/13.jpg)
13
Step 2 – new Object appeares
![Page 14: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/14.jpg)
14
Step 3
![Page 15: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/15.jpg)
15
Step 4
![Page 16: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/16.jpg)
16
Step 5
![Page 17: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/17.jpg)
17
Grouping problem
The problem – assigning object into appropriate groups according to chosen distance definition
Solution – some clustering algorithm
Brief example – k-means algorithm
![Page 18: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/18.jpg)
18
DataBase – entities
![Page 19: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/19.jpg)
19
DataBase General structure of database
results from optimization issues Due to universal purpose of the
system database may contain many different tables of objects
Need of using system tables for defining experiments
Group Member as a temporary table ?
![Page 20: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski](https://reader035.vdocuments.net/reader035/viewer/2022071808/56649f095503460f94c1df34/html5/thumbnails/20.jpg)
20
Summary
There is still a lot of work to do...