mining frequent patterns without candidate …eogasawara/wp-content/uploads/...6 dmaa -research...

21
1 Centro Federal de Educação Tecnológica Celso Suckow da Fonseca CEFET/RJ Eduardo Ogasawara [email protected]

Upload: others

Post on 22-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

1

Centro Federal de Educação Tecnológica Celso Suckow da Fonseca

CEFET/RJ

Eduardo [email protected]

Page 2: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

2

Data Mining

• Extraction knowledge (rules, patterns, constraints) from"large" databases

• Data Deluge & Big Data introduce more complexity– We are immersed in data, but hungry for knowledge!

Page 3: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

3

Overview: Confluence of various disciplines

Data Mining

Databases Statistics

Domain Area

Artificial Intelligence

Visualization

Page 4: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

4

Goal: Answer key questions

• Consider an example of a internet sales system

• Do we have the data?– Transactions involving credit cards, loyalty cards, central customer

service, lifestyle customer

• Can we identify groups?– Identified clusters for "models" of customers who share certain

characteristics: interest, income, spending habits

• Can we classify or predict?– Identified patterns of consumption?

– Can establish a pattern of consumption for single and married?

• Can we associate patterns?– Association / correlation between product sales

– Prediction based on association information

Page 5: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

5

Data Mining Process (DMP)

1. Selection

DataSets

2. Pre-processing

3. Mining

Algorithms

4. Pattern

Evaluation

Knowledge

Page 6: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

6

DMAA - Research Project

• Goal– Develop Data Mining Algorithms and Application for practical problems

• Research Process– Basic Research

• Produce data knowledge extraction algorithms or tools without immediateapplication for real problems

– Applied Research

• Usage of know or developed algorithms or tools for solving real problems

Basic Research Applied Research

Page 7: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

7

Basic Research

• Data Management & Data Mining Process– Data Streaming

– Data Sources Location

– Uniform Data Model

– Data Experiment Representation

– Modeled through Workflows

• Algorithms and Methods– Research on specific Data Mining Activities

Page 8: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

8

DMP: Step #1 – Selection and Integration

• Types of data– Relational, Spatial, Temporal, Textual, Graph

• Sources– Centralized

– Distributed

– Streaming

• Integration– Data integration from different sources

• Dataset Production• Projection

• Selection

Page 9: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

9

DMP: Step #2 – Pre-processing

• Very important step– 60% of Data Mining effort!

• Data Reduction– Filtering or Sampling

• Attribution Reduction– Selection of important attributed

– Reduction of Dimensionality

• Transformation– Combination

– Derivation

– Aggregation

Page 10: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

10

DMP: Step #3 – Data Mining Algorithms and Methods

• Focus: Searching for patterns of interest

• Data Mining Methods– Clustering

– Classification and Prediction

– Association

– New Approaches

Page 11: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

11

DMP: Step #4 – Pattern Evaluation

• Error Measurements– Error Metrics

• Evaluation of patterns and extracted knowledge– Visualization

– Transformation

– Redundant pattern removal

• Use of discovered knowledge

• Are all discovered patterns important?

Page 12: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

12

Applied Research

• Focus on solving target application areas (datasets)

• Basic domain of target area– data

– Problem

– Identification of goals

• Examples– Astronomy

– Production engineering

– Social Networks

– Computing in Educational

Page 13: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

13

Evaluation of Enterprise Social Networks Adoption

Page 14: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

14

Time Series Prediction

Page 15: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

15

Forecast Sea Surface Temperature in Atlantic Pacific Ocean

Page 16: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

16

Identification of Stars and Gallaxies

NGC 7793: a spiral galaxy

Page 17: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

17

Production of Access Flow Dataset for

Identification of Deny of Service (DoS) Attacks

Page 18: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

18

Clustering of Access Flow Dataset

(Clustering of Time Series)

Page 19: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

19

Forecasting Delays in Airports

Page 20: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

20

Prediction of electrical power output

of a base load operated combined cycle power plan

Tufekci, P. 2014

Page 21: Mining Frequent Patterns Without Candidate …eogasawara/wp-content/uploads/...6 DMAA -Research Project •Goal –Develop Data Mining Algorithms and Application for practical problems

21

Referências