databases & data mining joined specialization project „data mining classification tool” by...

20
DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

Upload: charleen-lynch

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

DataBases & Data Mining Joined Specialization Project„Data Mining Classification

Tool”

By Mateusz Żochowski

& Jakub Strzemżalski

Page 2: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

2

Agenda

General description of the problem Functionality Data Mining aspects

Algorithm and optimisation Data Base aspects

General entities scheme

Page 3: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

3

General Description

Universal Tool Different kinds of objects (e.g.

preprocessed photos, hospital patients data)

Finding similar objects Decision problems

Page 4: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

4

Functionality

Independent system – user operated Using sets of data already provided or

uploading new types Influence on the way data is processed

Possible usage in bigger systems as a processing engine Additional module used as a helping

tool in more complex systems

Page 5: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

5

General Use Case

Page 6: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

6

Data Mining General Ideas

Description of a object Definition of a distance

K-NN algorithm Brief explanations of the algorithm

Optimization Problem of comparing large number of

objects Optimized solution – using grouping idea

Page 7: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

7

Definitions Objects

Page 8: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

8

K-NN

K – Nearest Neighbors Idea standing behind k-nn Aim - finding k-similar objects to the

one we are analyzing and eventually assigning appropriate decision

Method - calculating distance from analyzed object to the others in our database and finding the closest ones

Page 9: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

9

K-NN Graphical representation

Page 10: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

10

Definitions Distance

Calculations in multidimensional space Coefficients

Alfa wi – weights – underlining importance of

particular attributes n – number of all the attributes

)()( 21

1

21 *),( oaoa ii

in

i

iwOOD

Page 11: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

11

Optimalisation

The reason – cost of multidimensional distance computation for 1-all elements

Solution – improved Knn Result – better efficiency because

of reduced number of distance computations due to narrowed set of possibly similar objects

Page 12: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

12

Step 1 - Group-oriented plane division

Page 13: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

13

Step 2 – new Object appeares

Page 14: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

14

Step 3

Page 15: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

15

Step 4

Page 16: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

16

Step 5

Page 17: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

17

Grouping problem

The problem – assigning object into appropriate groups according to chosen distance definition

Solution – some clustering algorithm

Brief example – k-means algorithm

Page 18: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

18

DataBase – entities

Page 19: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

19

DataBase General structure of database

results from optimization issues Due to universal purpose of the

system database may contain many different tables of objects

Need of using system tables for defining experiments

Group Member as a temporary table ?

Page 20: DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

20

Summary

There is still a lot of work to do...