data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfdata mining for prediction prof....

33
Data mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université Libre de Bruxelles email: [email protected]

Upload: others

Post on 11-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Data mining for prediction

Prof. Gianluca BontempiDépartement d’Informatique

Faculté de SciencesULB Université Libre de Bruxelles

email: [email protected]

Page 2: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Outline

• Extracting knowledge from observations.

• Introduction to Data Mining (What, Why, and How?)

• Our techniques for data mining.• Prediction from data: applications.• What can we do for you?

Page 3: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Knowledge from observations

Extracting knowledge from observations is a major issue in applied sciences.

•qualitative description of phenomena,

•theoretical understanding of laws underlying nature,

•prediction of future observations.

Knowledge can be extracted for different purposes:

Page 4: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université
Page 5: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

What is data-mining?

• Search for valuable information in large volumes of data.

• Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns & rules.

Page 6: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Why Mine Data? (for businesspersons)

• Lots of data has been and is being collected and warehoused.

• computing has become affordable. • competitive pressure is strong

– Provide better, customized services for an edge. – Information is becoming product in its own right.

• « convert data into business intelligence »• « make management decision making based

on facts not intuition »• « get closer to the customers » • « gain competitive advantage ».

Page 7: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université
Page 8: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Why mine data? (for scientists)

• Data collected and stored at enormous speeds (Gbyte/hour) – remote sensor on a satellite – telescope scanning the skies – microarrays generating gene expression data – scientific simulations generating terabytes of data

• Large dimensionality (thousands of variables). • Data mining for data reduction.

– cataloging, classifying, segmenting data • Helps scientists in hypothesis formation.

Page 9: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

What does data-mining?

• Description models– Find human-interpretable patterns that describe

the data.

• Prediction models– Use some variables to predict unknown or future

values of other variables. – Examples: classification, regression, clustering,

time series prediction.

From [Fayyad, et.al.] Advances in Knowledge Discovery andData Mining, 1996

Page 10: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Example of descriptive measures

• Think to difference between this two types

of information:

1. the whole historical record of your bank

transactions

2. the evaluation if you are a good or bad client (in

your bank’s perspective)

Page 11: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Prediction task

PHENOMENON

MODEL

input output

prediction

error

•Finite amount of noisy observations.

•No a priori knowledge of the phenomenon.

OBSERVATIONS

Page 12: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Some theory..

Page 13: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Prediction modeling issues

• What are the relevant factors (inputs & outputs) in the data?

• Which kind of correlation (or model) exists between input and outputs? (linear, nonlinear,… )

• Is this correlation stationary or time varying?• Which level of accuracy in our prediction can

we guarantee?

Page 14: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Model discovery

MODELGENERATION

MODELVALIDATION

PARAMETRICIDENTIFICATION

MODELDELIVERY

Page 15: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5Target function

Nonlinear relationship

input

output

Page 16: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5Training set

Observations

output

query queryquery

Page 17: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Global modeling

input

output

Page 18: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Prediction with global models

queryquery query

Page 19: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Local modeling

Page 20: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Prediction with local models

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

queryquery query

Page 21: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Our competences in modeling

• Conventional statistic techniques• Neural networks techniques• Regression trees• Local modeling techniques• Neuro-fuzzy techniques

Page 22: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

In practice…

Page 23: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Financial prediction

Task: predict the future trends of the financial series.

daily stock market index

0

5000

10000

15000

20000

25000

30000

10/0

1/94

10/0

5/94

10/0

9/94

10/0

1/95

10/0

5/95

10/0

9/95

10/0

1/96

10/0

5/96

10/0

9/96

10/0

1/97

10/0

5/97

10/0

9/97

10/0

1/98

10/0

5/98

10/0

9/98

10/0

1/99

10/0

5/99

10/0

9/99

MIB

Goal: automatic trading system to anticipate the fluctuations of the market.

Industrial partner: Masterfood, Belgium.

Page 24: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Car matriculations in Belgium

300000320000340000360000380000400000420000440000460000480000500000

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

Economic variables

Task: predict how many cars will be matriculated next year.

Goal: support the marketing campaign of a car dealer.

Industrial partner: D’ieteren, Belgium.

Page 25: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Modeling of industrial plants

Task: predict the flow stress of the steel plate as a function of the chemical and physical properties of the material.

Rolling steel mill

Goal: cope with different types of metals, reduce the production time and improve final quality.

Industrial partner: FaFer Usinor, Belgium .

Page 26: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Control

Task: model the dynamics of the plant on the basis ofaccessible information.

Waste water treatment plant

Goal: control the level of water pollutants.

Industrial partner: Honeywell Technology Center, US.

Page 27: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Environmental problems

Task: predicting the biological state (e.g. density of algae communities)as a function of chemicals.

Goal: make automatic the analysis of the state of the river by monitoring chemical concentrations.

Algae summer blooming

Page 28: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Performance modeling of embedded systems

Task: model the performance of software tasks on different hardware platforms on the basis of profiling information.

Embedded microprocessor

Goal: select the hardware architecture that better meets theproduct requirements.

Industrial partner: Philips, The Netherlands.

Page 29: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Multimedia quality of service

Task: model the time/energy requirements of a multimedia application

Multimedia application

Goal: find the best trade-off quality/performance

Industrial partner: IMEC, Belgium.

Page 30: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Chaotic time series prediction

0 100 200 300 400 500 600 700 800 900 10000

50

100

150

200

250

300

Task: predict the continuation of the series for next 100 steps.

Page 31: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Our prediction

Our methods are able to predict the abrupt change at t =1060 !

Page 32: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

Awards in international competitions

• Data analysis competition: awarded as a runner-up among 21 participants at the 1999 CoILInternational Competition on Protecting rivers and streams by monitoring chemical concentrations and algae communities.

• Time series competition: ranked second among 17 participants to the International Competition on Time Series organized by the International Workshop on Advanced Black-box techniques for nonlinear modeling in Leuven, Belgium

Page 33: Data mining for predictiondi.ulb.ac.be/map/gbonte/ftp/mining.pdfData mining for prediction Prof. Gianluca Bontempi Département d’ Informatique Faculté de Sciences ULB Université

What can we do for you?

• Take and analyze your most valuable data (in electronic form) .

• Understand if some relevant information is hidden within them.

• Choose, using our techniques, the best model for your data.

• Make (or providing you tools for doing) predictions on the basis of your historical information.