data mining 101

29
Data Mining 101 George Tziralis, FOSS Conf, June 19 09, Athens, GR

Upload: george-tziralis

Post on 11-May-2015

2.171 views

Category:

Technology


0 download

DESCRIPTION

A presentation I gave earlier today at 4th FOSS Conference in Greece, introducing data mining, its principles and application into a wider public, plus showcasing the use ofweka software for all core data mining purposes.

TRANSCRIPT

Page 1: Data Mining 101

Data Mining 101

George Tziralis, FOSS Conf, June 19 09, Athens, GR

Page 2: Data Mining 101

the facts

Page 3: Data Mining 101

the facts

The data gap

Page 4: Data Mining 101

the promise

understand and take advantage of the world’s

information

Page 5: Data Mining 101

the name

data mining:statistics at speed, scale

and simplicity

Page 6: Data Mining 101

what is

Databases Statistics

Artificial Intelligence

Page 7: Data Mining 101

the difference

•statistics: define a hypothesis, then test

•data mining: test all possible hypotheses

•is it possible? YES!

Page 8: Data Mining 101

the tasks

•classification•association•clustering•prediction

Page 9: Data Mining 101

the process

•data input & exploration•preprocessing•data mining algorithms•evaluation &

intrepretation

Page 10: Data Mining 101

an example# color size value buy

1 blue 5.32 b no

2 yellow 8.57 a yes

3 green 1.23 c no

4 yellow 9.35 c yes

5 red 5.99 b yes

6 red 4.43 b yes

7 green 6.21 b no

8 white 4.89 a yes

9 black 5.15 b no

10 green 5.67 b no

Page 11: Data Mining 101

an example# color size value buy

1 blue 5.32 b no

2 yellow 8.57 a yes

3 green 1.23 c no

4 yellow 9.35 c yes

5 red 5.99 b yes

6 red 4.43 b yes

7 green 6.21 b no

8 white 4.89 a yes

9 black 5.15 b no

10 green 5.67 b no

instance

attribute target

Page 12: Data Mining 101

so far

0

2.5

5.0

7.5

10.0

size

Page 13: Data Mining 101

now

•if size = [4.0 - 7.0] & value = {b,c} then buy = no

Page 14: Data Mining 101

now

•If color = yellow then buy = yes

•If color = red then buy = yes

•If color = white then buy = yes

•If color = green then buy = no

•If color = blue then buy = no

•If color = black then buy = no

Page 15: Data Mining 101

ok, cool! but how?

Page 16: Data Mining 101

the tool

WekaWaikato Environment for Knowledge Analysis

OSS, written in Java, providing API

Page 17: Data Mining 101

start

start -> explorer

Page 18: Data Mining 101

explore

open file -> data -> contact-lenses.arff

Page 19: Data Mining 101

.arff how-to% ARFF file of the example’s data @relation testset @attribute color {blue, yellow, green, red} @attribute size numeric @attribute value {a, b, c}@attribute buy {yes, no} @data blue, 5.32, b, noyellow, 8.57, a, yesgreen, 1.23, c, no ...

Page 20: Data Mining 101

preprocess

filter -> ... {tons of filters}

Page 21: Data Mining 101

visualize

tab “visualize” (per target/class)

Page 22: Data Mining 101

visualize

tab “preprocess’’ -> visualize all (per class)

Page 23: Data Mining 101

select attributes

tab “select attributes” (default settings)

Page 24: Data Mining 101

classify

tab “classify’’ -> rules -> PART -> start!

Page 25: Data Mining 101

associate

tab “associate’’ -> start! (default settings)

Page 26: Data Mining 101

pls tell me more!

Page 27: Data Mining 101

the book

your data mining & data guide!

Page 28: Data Mining 101
Page 29: Data Mining 101

thank [email protected]