data mining overview

26
June 7, 2022 1 Data Mining 27/Sep/2008

Upload: golda-margret-sheeba-j

Post on 21-Jan-2015

4.280 views

Category:

Education


5 download

DESCRIPTION

Overall View about data mining

TRANSCRIPT

Page 1: Data Mining Overview

April 10, 2023 1Data Mining

27/Sep/2008

Page 2: Data Mining Overview

Evolution of Database Evolution of Database technologytechnology

YEAR PURPOSE1960’s Network Model, Batch Reports

1970’s Relational data model, Executive information Systems

1980’s Application specific DBMS(spatial data, scientific data, image data, …)

1990’s Terabyte Data warehouses, Object Oriented, middleware and web technology

2000’s Business Process

2010’s Sensor DB systems, DBs on embedded systems, large scale pub/ sub systems

April 10, 2023 2Data Mining

Page 3: Data Mining Overview

April 10, 2023Data Mining 3

Data explosion problem

◦ Automated data collection tools and mature database

technology lead to tremendous amounts of data stored in

databases, data warehouses and other information

repositories

We are drowning in data, but starving for knowledge!

Solution: Data warehousing and data mining

◦ Extraction of interesting knowledge (rules, regularities,

patterns, constraints) from data in large databases

Motivation : Necessity is Motivation : Necessity is the mother of inventionthe mother of invention

Page 4: Data Mining Overview

Why Data Mining?Why Data Mining?

Data, Data, Data Every where …

I can’t find data I need – data is scattered over network

I can’t get the data I need

I can’t understand the data I need

I can’t use the data I found

April 10, 2023 4Data Mining

Page 5: Data Mining Overview

An abundance of data Super Market Scanners, POS

data Credit cards transactions Call Center records ATM Machines Demographic data Sensor Networks Cameras Web server logs Customer web site trails Geographic Information

System National Medical Records Weather Images

This data occupies

Terabytes - 10^12 bytes

Petabytes - 10^15 bytes

Exabytes - 10^18bytes

Zettabytes - 10^21bytes

Zottabytes -10^24bytes

Walmart - 24 Terabytes

April 10, 2023 5Data Mining

Page 6: Data Mining Overview

Process of sorting through large amounts of data and picking out relevant information

Process of analyzing data from different perspectives and summarizing it into useful information

Discovering hidden value in database

It is non-trivial process of identifying valid, novel, useful and understandable patterns in data

Extracting or mining knowledge from large amounts of data

April 10, 2023 6Data Mining

Page 7: Data Mining Overview

April 10, 2023Data Mining 7

History Notes – Many Names of History Notes – Many Names of Data MiningData Mining

YEAR Names USES

1960 Data Fishing, Data Dredging

Statisticians

1990 Data Mining DB Community, business

1989 Knowledge Discovery in databases

AI, Machine Learning community

Other Names

Data Archaeology, Information Harvesting, Information Discovery, Knowledge Extraction,

Page 8: Data Mining Overview

Data Warehousing provides the Enterprise with a

memory

Data Mining provides the Enterprise with intelligence

April 10, 2023 8Data Mining

Page 9: Data Mining Overview

Why Data Mining?(Cont..)

April 10, 2023 9Data Mining

Data Warehouse is single, complete and consistent store of data from variety of different sources available to end users

For example, AT and T handles billions of calls per day. Europe's Very Long Baseline Interferometer (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session

We need data mining for Transforming data into useful information to users Present data in useful format Provide data access to business analyst, Information

technology professionals

Page 10: Data Mining Overview

April 10, 2023Data Mining 10

Data Mining is the technique used to carry out KDD.

Data Mining turns data into information and then to knowledge

Data Mining Process

Information

Data

Knowledge

Page 11: Data Mining Overview

1.Data cleaning To remove noise and inconsistent data

2. Data integrationTo integrate (compile) multiple data

sources3. Data selection

Data relevant to analysis is selected4. Data transformation

Summary normalization aggregation operations are performed (convert data into two dimension form) and consolidate the data

Steps in Data Mining

April 10, 2023 11Data Mining

Page 12: Data Mining Overview

5. Data miningIntelligent methods are applied to the data to discover knowledge or patterns

6. Pattern evaluationEvaluation of the interesting patterns by thresholding

7. Knowledge DiscoveryVisualization and presentation methods are used to present the mined knowledge to the user.

April 10, 2023Data Mining 12

Steps in Data Mining(Cont..)

Page 13: Data Mining Overview

◦ Data mining: the core of knowledge discovery process.

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

April 10, 2023 13Data Mining

Page 14: Data Mining Overview

1. Classification• Classification maps data into predefined groups or classes.• It may be represented by methods such as decision trees,

etc.

Decision tree Flow chart like tree structure Each node denotes test of

an attribute value Each branch represents

outcome of test Leaves represent classes

or class distribution.

April 10, 2023Data Mining 14

Data Mining Tasks

Page 15: Data Mining Overview

2. RegressionUsed to map a data item to a real valued

prediction variable.

Example. A manager wants to reach a certain level of savings before his retirement. Periodically he predicts his retirement savings by current value and several past values. He uses a simple linear regressive formula to predict the values of savings in future.

3. Prediction

Many real world applications can be seen

predicting future data states based on

past and current data.Example - Predicting flooding is difficult problem

April 10, 2023Data Mining 15

Page 16: Data Mining Overview

4. ClusteringClustering is similar to classification

except that the groups are not predefined.

5. Association RuleAssociation refers to uncovering relationship among data. Used in retail sales community to identify the items (products) that are frequently purchased together.

April 10, 2023Data Mining 16

1998

Zzzz...

Bread and Jam sell

together!

Page 17: Data Mining Overview

6. SummarizationSummarization of general characteristics or features of

target class of data. Data characterization presented in various forms - pie charts,

bar charts, curves.Data discrimination comparison of general features of target

class of data objects with general features of objects from one or a set of contrasting classes.

7. Outlier Analysis Database may contain data objects that do not comply with

general behavior model of data. These data objects are called as outliers.

Data mining methods discard outliers as noise or exceptions. In applications such as fraud detection, rare events may be more

interesting than regularly occurring events.

April 10, 2023Data Mining 17

Page 18: Data Mining Overview

Relational data and transactional data

Text

Images, video

Mixtures of data

Data Mining: Types of Data

April 10, 2023 18Data Mining

Page 19: Data Mining Overview

19

DataMind -- neurOagent Information Discovery -- IDIS SAS Institute -- SAS/Neuronets

Data Mining Products

April 10, 2023Data Mining

Page 20: Data Mining Overview

RapidMiner and Weka – Defining data mining process

Top 8 data mining software in 2008

1. Angoss software2. Infor CRM Epiphany3. Portrait Software4. SAS5. SPSS6. ThinkAnalytics7. Unica8. Viscovery

April 10, 2023 20Data Mining

Data Mining Software

Page 21: Data Mining Overview

Industry ApplicationFinance Credit Card AnalysisInsurance Fraud Analysis

Telecommunication Call record analysis

Application Areas

April 10, 2023 21Data Mining

Page 22: Data Mining Overview

Data Mining 22

Financial Industry, Banks, Businesses, E-commerce◦ Stock and investment analysis◦ Identify loyal customers and risky customer◦ Predict customer spending

Database analysis and decision support◦ Market analysis and management

target marketing, customer relation management, market basket analysis.

◦ Risk analysis and management Forecasting, quality control, competitive analysis

◦ Fraud detection and management

Applications

April 10, 2023

Page 23: Data Mining Overview

1. Intelligent Miner

It is IBM data mining product

Distinct feature is include scalability of its mining algorithm and tight integration with IBM DB2 related data base system.

2. DB Miner

Developed by DBMiner Technologies Inc.

Distinct features of DBMiner are Data cube based Online Analytical Mining

Data Mining in Usage

April 10, 2023 23Data Mining

Page 24: Data Mining Overview

April 10, 2023Data Mining 24

India

Product

Sales Channel

Regio

ns

Retail Direct Special

Household

Telecomm

Video

AudioFar East

Europe

The Telecomm Slice

Page 25: Data Mining Overview

April 10, 2023Data Mining 25

Data mining: discovering interesting patterns from large amounts of data

A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation

Mining can be performed in a variety of information repositories

Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier etc

Conclusion

Page 26: Data Mining Overview

April 10, 2023Data Mining 26

Thank you !!!Thank you !!!