databases & data mining joined specialization project „data mining classification tool” by...
TRANSCRIPT
DataBases & Data Mining Joined Specialization Project„Data Mining Classification
Tool”
By Mateusz Żochowski
& Jakub Strzemżalski
2
Agenda
General description of the problem Functionality Data Mining aspects
Algorithm and optimisation Data Base aspects
General entities scheme
3
General Description
Universal Tool Different kinds of objects (e.g.
preprocessed photos, hospital patients data)
Finding similar objects Decision problems
4
Functionality
Independent system – user operated Using sets of data already provided or
uploading new types Influence on the way data is processed
Possible usage in bigger systems as a processing engine Additional module used as a helping
tool in more complex systems
5
General Use Case
6
Data Mining General Ideas
Description of a object Definition of a distance
K-NN algorithm Brief explanations of the algorithm
Optimization Problem of comparing large number of
objects Optimized solution – using grouping idea
7
Definitions Objects
8
K-NN
K – Nearest Neighbors Idea standing behind k-nn Aim - finding k-similar objects to the
one we are analyzing and eventually assigning appropriate decision
Method - calculating distance from analyzed object to the others in our database and finding the closest ones
9
K-NN Graphical representation
10
Definitions Distance
Calculations in multidimensional space Coefficients
Alfa wi – weights – underlining importance of
particular attributes n – number of all the attributes
)()( 21
1
21 *),( oaoa ii
in
i
iwOOD
11
Optimalisation
The reason – cost of multidimensional distance computation for 1-all elements
Solution – improved Knn Result – better efficiency because
of reduced number of distance computations due to narrowed set of possibly similar objects
12
Step 1 - Group-oriented plane division
13
Step 2 – new Object appeares
14
Step 3
15
Step 4
16
Step 5
17
Grouping problem
The problem – assigning object into appropriate groups according to chosen distance definition
Solution – some clustering algorithm
Brief example – k-means algorithm
18
DataBase – entities
19
DataBase General structure of database
results from optimization issues Due to universal purpose of the
system database may contain many different tables of objects
Need of using system tables for defining experiments
Group Member as a temporary table ?
20
Summary
There is still a lot of work to do...