data mining (student presentation) samira roshan_asma akbari mehr 87-88

29
Data Mining Data Mining (Student Presentation) (Student Presentation) Samira Roshan_Asma Akbari Samira Roshan_Asma Akbari Mehr 87-88 Mehr 87-88

Upload: danielle-daniel

Post on 26-Mar-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Data MiningData Mining(Student Presentation)(Student Presentation)

Samira Roshan_Asma AkbariSamira Roshan_Asma Akbari

Mehr 87-88Mehr 87-88

Page 2: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 3: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 4: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

• There is often information hiddenhidden in the data that is not readily evident

• Human analysts may take weeks to discover useful information

• Much of the data is never analyzed at all

Number of analysts

Total new disk (TB) since 1995

The DataThe Data GapGap

Page 5: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

• Data collected and stored at enormous speeds (GB/hour)

• Traditional techniques infeasible for raw data

• Data mining may help scientists

Page 6: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 7: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 8: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

DATA Base

Target Data

Transformed Data

Patternsand

Rules

Page 9: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 10: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Classification

Regression

Collaborative Filtering

Clustering

Association rules

Deviation detection

Page 11: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Classifier Decision rules

Salary > 5 L

Prof. = Exec

New applicant’s data

Many approaches: Statistics, Decision Trees,Neural Networks, ...

Page 12: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Unsupervised learning when old data with class labels not available e.g. when introducing a new product.

Page 13: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Given set T of groups of itemsExample: set of item sets purchased

...

Page 14: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

The use of data, particularly about people, for datamining has serious ethical implications.

When applied to people discriminate.

Page 15: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Data mining (or simple analysis) on people may comewith a profile that would raise controversial issues of– Discrimination– Privacy– Security

Examples:– Should males between 18 and 35 from countries that

produced terrorists be singled out for search before flight?

– Can people be denied mortgage based on age, sex, race?– Women live longer. Should they pay less for life insurance?

Page 16: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 17: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 18: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

InstancesInstances: the individual, independent examples of a concept

AttributesAttributes: measuring aspects of an instanceWe will focus on nominal and numeric ones

Page 19: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

number of nuclei (values: 1,2)number of tails (values: 1,2)color (values: light, dark)wall (values: thin, thick)

LethargiaBurpomaHealthy

Page 20: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 21: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

# ColorLightDark

Lethargia

32

Burpoma

12

Healthy22

# Tails12

Lethargia

50

Burpoma

03

Healthy22

# Nucleus

12

Lethargia

41

Burpoma

03

Healthy22

# Membrance

ThinThick

Lethargia32

Burpoma21

Healthy31

Page 22: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

#ColorLightDark

Lethargia

32

Burpoma

12

Healthy22

#Tails12

Lethargia

50

Burpoma

03

Healthy22

#Nucleus

12

Lethargia

41

Burpoma

03

Healthy22

#Membrance

ThinThick

Lethargia32

Burpoma21

Healthy31

Page 23: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Tails

Page 24: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

#ColorLightDark

Lethargia

32

Burpoma

00

Healthy02

#Nucleus

12

Lethargia

41

Burpoma

00

Healthy02

#Membrance

ThinThick

Lethargia32

Burpoma00

Healthy02

Page 25: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Tails

Nucleus

Lethargia

Page 26: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Tails

Nucleus

Lethargia Color

Nucleus

Healthy Burpoma

Lethargia Healthy

Page 27: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

If # Tails = 1 then If # Nucleus = 1 then class = Lethargia else If color = light then class = Lethargia else class = Healthyelse If # Nucleus = 1 then class = Healthy else class = Burpom

Page 28: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88
Page 29: Data Mining (Student Presentation) Samira Roshan_Asma Akbari Mehr 87-88

Resources

• http://office.microsoft.com/

• http://www.wisegeek.com/what-is-a-relational-database.htm

• http://www.cs.toronto.edu/avaisman/cscd34summer/ccsc343s.htm

• www.cl.cam.ac.uk/Teaching/current/Databases/• www.cs.uh.edu/~ceick/6340/dw-olap.ppt