data mining (student presentation) samira roshan_asma akbari mehr 87-88
TRANSCRIPT
Data MiningData Mining(Student Presentation)(Student Presentation)
Samira Roshan_Asma AkbariSamira Roshan_Asma Akbari
Mehr 87-88Mehr 87-88
• There is often information hiddenhidden in the data that is not readily evident
• Human analysts may take weeks to discover useful information
• Much of the data is never analyzed at all
Number of analysts
Total new disk (TB) since 1995
The DataThe Data GapGap
• Data collected and stored at enormous speeds (GB/hour)
• Traditional techniques infeasible for raw data
• Data mining may help scientists
DATA Base
Target Data
Transformed Data
Patternsand
Rules
Classification
Regression
Collaborative Filtering
Clustering
Association rules
Deviation detection
Classifier Decision rules
Salary > 5 L
Prof. = Exec
New applicant’s data
Many approaches: Statistics, Decision Trees,Neural Networks, ...
Unsupervised learning when old data with class labels not available e.g. when introducing a new product.
Given set T of groups of itemsExample: set of item sets purchased
...
The use of data, particularly about people, for datamining has serious ethical implications.
When applied to people discriminate.
Data mining (or simple analysis) on people may comewith a profile that would raise controversial issues of– Discrimination– Privacy– Security
Examples:– Should males between 18 and 35 from countries that
produced terrorists be singled out for search before flight?
– Can people be denied mortgage based on age, sex, race?– Women live longer. Should they pay less for life insurance?
InstancesInstances: the individual, independent examples of a concept
AttributesAttributes: measuring aspects of an instanceWe will focus on nominal and numeric ones
number of nuclei (values: 1,2)number of tails (values: 1,2)color (values: light, dark)wall (values: thin, thick)
LethargiaBurpomaHealthy
# ColorLightDark
Lethargia
32
Burpoma
12
Healthy22
# Tails12
Lethargia
50
Burpoma
03
Healthy22
# Nucleus
12
Lethargia
41
Burpoma
03
Healthy22
# Membrance
ThinThick
Lethargia32
Burpoma21
Healthy31
#ColorLightDark
Lethargia
32
Burpoma
12
Healthy22
#Tails12
Lethargia
50
Burpoma
03
Healthy22
#Nucleus
12
Lethargia
41
Burpoma
03
Healthy22
#Membrance
ThinThick
Lethargia32
Burpoma21
Healthy31
Tails
#ColorLightDark
Lethargia
32
Burpoma
00
Healthy02
#Nucleus
12
Lethargia
41
Burpoma
00
Healthy02
#Membrance
ThinThick
Lethargia32
Burpoma00
Healthy02
Tails
Nucleus
Lethargia
Tails
Nucleus
Lethargia Color
Nucleus
Healthy Burpoma
Lethargia Healthy
If # Tails = 1 then If # Nucleus = 1 then class = Lethargia else If color = light then class = Lethargia else class = Healthyelse If # Nucleus = 1 then class = Healthy else class = Burpom
Resources
• http://office.microsoft.com/
• http://www.wisegeek.com/what-is-a-relational-database.htm
• http://www.cs.toronto.edu/avaisman/cscd34summer/ccsc343s.htm
• www.cl.cam.ac.uk/Teaching/current/Databases/• www.cs.uh.edu/~ceick/6340/dw-olap.ppt