data mining and knowledge management laboratory. 2 contents 1. introduction personnel research...

58
Data Mining and Knowledge Management Laboratory

Upload: jemima-hall

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

Data Mining and Knowledge Management

Laboratory

Page 2: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

2

Contents

1. Introduction• Personnel• Research projects• Background and methods

2. Research Examples• Financial benchmarking using the self-organizing map

• Pulp and paper industry• Telecom industry

• Qualitative analysis using a text mining approach• Glass process modeling using a feedforward neural

network

Page 3: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

3

Laboratory members

Professor Barbro Back (ÅA/IAMSR)Professor Kaisa Sere (ÅA/Department of Computer Science)Professor Eija Karsten (TY/Department of Information

Technology)Dr.Sc. Iulian Nastac (ÅA/IAMSR)Dr.Sc. Eija Koskivaara (TuKKK/Department of Management,

Information Sytems Science) Ph.D. student Adrian Costea (ÅA/IAMSR)Ph.D. student Tomas Eklund (ÅA/IAMSR)Ph.D. student Minna Kallio (ÅA/IAMSR)Ph.D. student Piia Hirkman (ÅA/IAMSR)Ph.D. student Dorina Marghescu (ÅA/IAMSR)

Page 4: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

4

Affiliated Researchers

Prof. Hannu Vanharanta, Tampere University of Technology/Pori School of Economics and Technology

Prof. Ari Visa, Tampere University of TechnologyPh.D. student Jarmo Toivonen, Tampere University of TechnologyPh.D. student Antti Arppe, University of Helsinki Ph.D. student Camilla Magnusson, University of Helsinki

Page 5: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

5

Academic Achievements

Dr.Sc. degrees• Adekunle Okunoye 2003• Antonina Kloptchenko 2003• Eija Koskivaara 2004

Lic.Sc. degrees• Eva Wilppu 1998• Jonas Karlsson 2002• Tomas Eklund 2002

Page 6: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

6

Research Goals

The focus of research is around developing, implementing and evaluating new methods for data mining and knowledge management. We also conduct practical implementation studies in organizations.

Page 7: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

7

Projects

GILTA- project – 2003 (Tekes)Countess – 2003 (Academy of Finland)

Ongoing:4M-Project – 2003-2006 (Tekes)Domino-project 2004-2007 (Academy of Finland)

In addition, a number of industry funded projects have also been carried out in the laboratory.

Page 8: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

8

Gilta

The project focused on managing large data and text masses from databases and the Internet using smart encoding of texts, self-organizing maps, and document histograms. The project was an interdisciplinary, joint project between four universities:

Åbo Akademi University (IAMSR), Tampere University of Technology Tampere University of Technology, Pori University of Helsinki (Department of Linguistics)

The GILTA- project was funded by Tekes 2000-2002, 2.7 mmk (ÅA 0.6mmk).

Page 9: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

9

Countess

The project aimed at building advanced computational intelligence into computer-based decision support systems and at turning large amounts of information, mainly quantitative data, into knowledge. Application areas are analytical review in auditing, financial benchmarking, financial performance analyses, bankruptcy predictions.

The Countess-project was funded by the Academy of Finland 2000-2002, 0.8 mmk.

Page 10: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

10

Publications

Gilta > 15 publications• Back, Vanharanta, Visa, Toivonen, Eklund, Karlsson,

Kloptchenko, Costea

Countess > 20 publications• Back, Sere, Laitinen, Koskivaara, Wilppu, Hekanaho, Zhang

Page 11: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

Research Examples in the Laboratory

Page 12: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

12

Research Background

Access to immense amounts of data:

The capacity of digital data storage worldwide has doubled every nine months for at least a decade (Porter J.,1998).

This is twice the rate predicted by Moore’s law for the growth of computing power during the same period.

We can capture and store data, but are we able to process and utilize it effectively and efficiently?

Page 13: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

13

What is data mining?

Fayyad et al. 2002 define datamining as the identification of interesting structure in data.

Structure implies patterns, statistical or predictive models and relationsships of the data.

Page 14: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

14

What are neural networks

Tools built up in a pattern similar to the human nervous system

Consists of nodes (neurons) connected by weighted connections (synapses). The neurons receive information (stimuli) through these connections

Weights of connections are adjusted in order to change output

Can be supervised (learning via target outcomes, e.g. back-propagation algorithm) or unsupervised (exploratory, e.g. self-organizing maps)

Page 15: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

15

What are self-organizing maps

clusteringcategorizationvisualizationhidden factor analysis

Self-organizing maps (SOMs) are two-layer neural networks, which use the unsupervised learning method. SOMs are useful tools for exploratory data analysis.

Common problems that self-organizing maps can be applied to include:

Page 16: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

16

The SOM algorithm

Before the SOM algortihm is started, the map is randomly initialized. Each neuron is assigned a parametric reference vector.

The SOM algorithm consists of two steps. In step 1, the best matching neuron is found using the euclidian distance function below. In step 2, the neighboring neurons ”learn” from the input data.

ii

c mxmx min

Page 17: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

17

The map is randomly initialized, and each neuron is assigned a parametric reference vector, denoted m.

It can also be linearly initialized.

The vectors are illustrated using arrows.

Random Initialization

© Copyright Gilta Group 2001

m2 = 56, 23,10

m3 = 7, 34,21

m4 = 10, 10, 29m5 = 50, 55,30

Page 18: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

18

x1 = 11, 13, 18

Each unit of input data, represented by vector x, is compared to the reference vectors, m1,2,...n, of the network. The closest match, vector c, is regarded as the winning neuron.

Step 1: Locate the Closest Match

m2 = 56, 23,10

m3 = 7, 34,21

m4 = 10, 10, 29m5 = 50, 55,30

m4 = 10, 10, 29

c

m4 = 11, 13, 18

c

Page 19: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

19

c

x1 = 11, 11, 18

hciThe neurons within the neighborhood hci of neuron c, tune to, or learn something from, the input data vector x. How much these neurons learn depends upon the learning rate factor, .

Step 2: The Learning Phase

Page 20: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

20

Steps 1 & 2 are repeated for all input data vectors, for a predefined amount of times, or until another stopping criterion is reached. The fully trained network should display a number of groups of vectors. These groups should smoothly transition into each other.

Training the Network

Page 21: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

21

The U-matrix method is commonly used to visualize SOMs. The U-matrix method represents small distances with light colors, and large distances with dark colors, i.e creating a landscape with valleys and peaks. The upper map is visualized using Nenet v1.1, and the lower using SOM_PAK v3.1.

Visualizing the Maps

Page 22: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

22

Application areas for self-organizing maps within

business

BenchmarkingMonitoring book-keeping dataTaxationDistribution controlBankruptcy predictions

Page 23: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

Financial Benchmarking Using the Self-Organizing Map

Page 24: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

24

Example Application: Financial benchmarking

Managers, analysts, stakeholders, and investors facing the increasing amounts of information for decision making

Access nearly infinite because of the InternetNew tools needed for financial competitor benchmarkingResults from a survey (Åbo Akademi) indicate that very

few advanced methods are used for financial benchmarking in Finnish companies

Page 25: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

25

Studying annual reports

Operating margin = 16,27%

Page 26: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

26

Comparing Ratios

Stora EnsoUPM-KymmeneInternational PaperGeorgia-PacificKimberly Clark

Operating margin

11.0116.27-1.273.14

16.10

ROE

10.5514.73

-10.79-7.6628.21

ROTA

7.6811.17-0.852.74

15.98

Equity to capital

42.9346.1025.9717.1438.30

Quick ratio

0.830.631.040.510.58

Interest coverage

4.195.21

-0.590.79

12.77

Receivables turnover

6.23 6.03

8.679.898.34

- Comparing changes over several years?

Difficulties arise as the dimensionality increases, as there is no natural order in these data!

Page 27: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

27

Ranking using a point system

For example: max 500 p.

Produces a ranking:

-Max 200 p. for profitability, max 100 p. for liquidity, max 100 p. for solvency, etc.

For example:No. 1 Kimberly Clark, 432 p.No. 2 Buckeye Technologies, 411 p.Etc…

Problem: Provides no detailsHow is the company’s profitability compared to others?

Page 28: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

28

Introduction

•A financial benchmarking of a number of companies in the international pulp and paper industry was performed• The information used was taken from companies annual reports, collected through the Internet• Data for 1995-2002 • Seven financial ratios were calculated based on the data collected• A data-mining tool, the self-organizing map, was used to analyze the data, and find hidden patterns and regularities• The resulting maps were analyzed, and the actual benchmarking of a number of companies was performed

Page 29: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

29

Companies Included

Selected from Pulp and Paper International’s annual Top 150 list

Three different regions: North America (42), Northern Europe (13), and Japan (13). Also included were a number of companies from Europe (7) and Australasia (2).

Total of 77 companies, and 7 national or regional averages

Many companies were not possible to include because of insufficient data

Page 30: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

30

Companies Included1 Average 33 Kimberly-Clark 95-01 66 Average2 Ahlström 95-01 34 Longview Fiber Corp. 95-01 67 Daio Paper 95-993 M-Real 95-01 35 Mead 95-00 68 Daishowa Paper Manuf 95-994 Stora Enso OY (Enso Oy 95-96) 97-01 36 Packaging Corp 99-01 69 Chuetsu Paper 95-995 UPM-Kymmene OY 95-01 37 P.H. Glatfelter 95-01 70 Hokuetsu Paper Mills 95-99

38 Pope & Talbot 95-01 71 Japan Paperboard Industr 95-996 Average 39 Potlatch Corp. 95-01 72 Kishu Paper 97-997 Sveaskog (AssiDomän 95-00) 95-01 40 Rayonier 95-01 73 Mitsubishi Paper 95-998 Korsnäs 95-01 41 Riverwood Holding 95-01 74 Nippon Kakoh Seishi 95-999 MoDo AB 95-01 42 Rock-Tenn Company 95-01 75 Nippon Unipac (Nippon Paper - 00) 95-01

10 Munksjö AB 95-01 43 Schweitzer-Mauduit Intl. 95-01 76 Oji Paper 95-0111 Rottneros AB 95-01 44 Sonoco Products 95-01 77 Rengo 95-9912 SCA AB 95-01 45 Stone Container 95-97 78 Settsu 95-9813 Södra AB 95-01 46 Temple-Inland 95-01 79 Tokai Pulp & Paper 95-9914 Trebruk 98-01 47 Union Camp. 95-98

48 Wausau-Mosinee Paper 95-01 80 Average15 Average 49 MeadWestvaco (Westvaco 95-00) 95-01 81 Cartiere Burgo (ITA) 95-0116 Peterson Group 95-01 50 Weyerhaeuser 95-01 82 David S. Smith Holdings (UK) 98-0117 Norske Skog A.S. 95-01 51 Willamette Industries 95-01 83 ENCE Group (Spain) 96-01

84 Execompta Clairefontaine (FRA) 97-0118 Average 52 Average 85 Frantschach (AUT) 95-9919 Boise Cascade 95-01 53 Abitibi Consolidated 95-01 86 Gascogne (FRA) 98-0120 Bowater 95-01 54 Alliance 95-00 87 Kappa (NLD) 98-0121 Buckeye Techologies 95-01 55 Canfor 95-01 88 Industrieholding Cham (SUI) 95-0122 Caraustar Industries 95-01 56 Cascades Inc. 95-01 89 Inveresk (UK) 95-0123 Champion International 95-99 57 Crestbrook Forest Ind.Ltd. 95-97 90 Mayr-Melnhof (AUT) 95-0124 Consolidated Papers 95-99 58 Doman Industries 95-01 91 Mercer International (SUI) 95-0125 Crown Vantage 95-99 59 Domtar Inc. 95-01 92 Reno de Medici (ITA) 95-0126 FiberMark 95-01 60 Donohue 95-9927 Fort James 95-99 61 MacMillan Bloedel 95-98 93 Aracruz Celulose (BRA) 96-0128 Gaylord Container Corp 95-01 62 Millar Western Forest Products 98-01 94 Amcor (AUS) 95-0029 Georgia-Pacific Corp 95-01 63 Nexfor 95-01 95 Bahia Sul Selulose (BRA) 98-0130 Greif Bros 95-01 64 Tembec Inc. 95-01 96 Empresas CMPC (CHL) 98-0131 International Paper 95-01 65 West Fraser Timber 95-01 97 Fletcher Challenge Group (NZE) 95-9932 Jefferson-Smurfit Corp. 95-01 98 Sappi (ZAF) 98-01

Others

Norway

USA

Finland

Sweden

Europe

JapanUSA (Continued)

Canada

Page 31: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

31

The Financial Ratios

Return on Equity

Return on Total Assets

Quick Ratio

Equity to Capital

Interest Coverage

Receivables Turnover

Operating Margin 100Sales

ofitPrOperatingMarginOperating

100averageEarningstainedReCapitalShare

IncomeNetEquityonturnRe

))((

100averageAssetsTotal

ExpenseInterestIncomeTotalAssetsTotalonturnRe

)(

sLiabilitieCurrent

InventoryAssetsCurrentRatioQuick

100AssetsTotal

EarningstainedReCapitalShareCapitaltoEquity

ExpenseInterest

IncomeNetTaxIncomeExpenseInterestoverageCInterest

)(averageceivableReAccounts

SalesNetTurnoverceivablesRe

Page 32: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

32

Operating Margin Return on Equity Return on Total Assets

Equity to Capital Quick Ratio Interest Coverage

Receivables Turnover

ExcellentVery Good

Good

Average

Poor

Very Poor

Terrible

Feature Planes

The feature planes display

the distribution across the

map according to individual

financial ratios

Page 33: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

33

A

CB

D

E

F

G

H Group A is the best performing group, with

high to very high values in all ratios except for Quick

Ratio

Group B: Well above average group, not quite as good as Group A, low

liquidity

Group G is a poor group, with low values

in all ratios.

Group C: Slightly above average group, average to

high profitability and liquidity, average

efficiency. Very low values in solvency.

Group D: Average group, with average

profitability and solvency, average to

high liquidity, and high efficiency.

Group E: Average. Low to average profitability, solvency and liquidity

very high.

Group F: a below average group. Low to average profitability,

solvency, and liquidity, average to high

efficiency.

Group H is the poorest group, with very low values in profitability

and solvency. Liquidity and efficiency,

however, can reach high values

The Identified Clusters

Operating Margin ROE ROTA Equity to Capital Quick Ratio Interest Coverage Receivables Turnover

The map consists of clusters of companies displaying

similar financial

performance.

Page 34: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

34

IP1995

UPM1995

IP1998

UPM1998

UPM2001

IP2001

C

IP1997

UPM1997

IP1996 UPM1996

UPM1999

IP1999

UPM2000

IP2000

UPM2002

IP2002

A

B

D

E

F

G

H

International Paper vs UPM-Kymmene

Operating Margin ROE ROTA Equity to Capital Quick Ratio Interest Coverage Receivables Turnover

UPM-Kymmene is generally

speaking a much better

performing company than

the much larger International

Paper.

Page 35: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

35

C

B

D

E

F

G

H

A

Stora Enso Acquires Consolidated Paper

Consolidated 95

Stora Enso 97Stora Enso 98 Stora Enso 99

Stora Enso 00

Stora Enso 01Consolidated 96

Consolidated 97

Consolidated 98

Consolidated 99

Stora Enso acquires Consolidated Paper in 2000. In 2002, Stora Enso is forced to write down the asset value of the acquisition, heavily affecting

the financial result for the year.

Stora Enso 02

Operating Margin ROE ROTA Equity to Capital Quick Ratio Interest Coverage Receivables Turnover

Page 36: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

36

C

B

D

E

F

G

H

A

Mead merges with Westwaco

Mead 98

Westvaco 98

Mead 95

Westvaco 95

Mead 96Westvaco 96

Westvaco 97

Mead 97

Westvaco 99

Mead 99

Mead 00

Westvaco 00

MeadWestvaco 01

Operating Margin ROE ROTA Equity to Capital Quick Ratio Interest Coverage Receivables Turnover

The merger between Mead and Westvaco has been referred to

as “a merger between equals”. This is also true from a financial

perspective

Page 37: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

37

Conclusions

Most of the largest P&P companies in the world are not especially strong performers, except for Kimberly-Clark

1995 and 2000 were profitable years for the P&P industry

2001-02 have been difficult years for the P&P industry, especially for Finnish companies

The Asian financial crisis heavily affected Japanese companies during 1997-99, although Japanese performance remains poor even after this

Quarterly results for 2003 indicate poor annual financial results for Finnish P&P companies for 2003

Page 38: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

38

Conclusions

Disadvantages:• Dimensional reduction -> certain loss of

accuracy• Large initial data requirement• Some annual reports difficult to find:

– Family companies– Small companies

Page 39: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

39

Conclusions

Advantages:• Quick overview (the Big Picture) > visual method• Multidimensional comparison with drill-down capabilities• Time-series• Database character -> can be updated continuously• It is also possible to see changes using quarterly reports• Shows new investments, problems, turnarounds, and new

moves in the industrial sector very well • Can potentially be combined with simulation capability in

the future

Sample publication: Eklund, T., B. Back, H. Vanharanta, A. Visa (2003), Using the Self-Organizing Map

as a Visualization Tool in Financial Benchmarking. Information Visualization, Vol. 2, No. 3, pp. 171-181.

Page 40: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

Benchmarking Telecom Companies Using the Self-Organizing Map

Page 41: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

41

Example application: Benchmarking telecom companies worldwide

Number Company Years Number Company Years Number Company Years

1 Benefon 1995-99 33 3Com 1995-99 66 Bell Mobility 1995-992 Doro 1995-99 34 ADC 1995-99 67 Clearnet 1995-993 Ericsson 1995-99 35 Alltel 1995-99 68 Mitel 1995-994 HPY 1995-99 36 Andrew Corp. 1995-99 69 Nortel 1995-995 Netcom 1995-99 37 AT&T 1995-99 70 Sasktel 1995-996 Nokia 1995-99 38 Audiovox 1995-99 71 Telus 1995-997 Sonera 1995-99 39 Bell Atlantic 1995-99 72 Canada Average 1995-998 Tele Denmark 1995-99 40 BellSouth 1995-999 TeleNor 1995-99 41 CenturyTel 1995-99 73 DDI 1995-99

10 Telia 1995-99 42 Cisco 1995-99 74 Indosat 1995-9911 Nordic Average 1995-99 43 Comsat 1995-99 75 Iwatsu 1995-99

44 Comverse 1995-99 76 Japan Radio 1995-9912 Alcatel 1995-99 45 Elcotel 1995-99 77 Japan Telecom 1995-9913 Ascom 1995-99 46 GTE 1995-99 78 Kokusai 1995-9914 British Telecom 1995-99 47 IDT 1995-99 79 Kyocera 1995-9915 Cable & Wireless 1995-99 48 Intellicall 1995-98 80 Matsushita 1995-9916 Colt 1995-99 49 Interdigital 1995-99 81 Mitsubishi Electric 1995-9917 Deutsche Telekom 1995-99 50 LSI Logic 1995-99 82 NEC 1995-9918 France Telecom 1995-99 51 Lucent 1995-99 83 NTT 1995-9919 Marconi 1995-99 52 MCI 1995-99 84 OKI 1995-9920 MATAV 1996-99 53 Molex 1995-99 85 Samsung 1995-9921 Mobilcom 1996-99 54 Motorola 1995-99 86 Sanyo 1995-9922 Olivetti 1995-99 55 Nextel 1995-99 87 Sharp 1995-9923 Orange 1995-98 56 Powertel 1995-99 88 Sony 1995-9924 Philips 1995-99 57 Powerwave 1995-99 89 Telstra 1995-9925 Portugal Telecom 1995-99 58 Qualcomm 1995-99 90 Toshiba 1995-9926 Rostelecom 1995-98 59 SBC 1995-99 91 Uniden 1995-9927 Sagem 1996-99 60 Sprint 1995-99 92 Videsh Niagam 1996-9928 Siemens 1995-99 61 Tellabs 1995-99 93 Asia Average 1995-9929 Swisscom 1995-99 62 UsWest 1995-9930 TeleWest 1995-99 63 Viatel 1995-9931 Vodafone 1996-99 64 Xircom 1995-9932 Europe Average 1995-99 65 USA Average 1995-99

Page 42: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

42

The Identified Clusters

© Copyright Gilta Group 2000

Group A1 consists of companies with the best values in the profitability ratios on the map.

Group A2, the second of the best groups, is characterised by very high values in the liquidity and solidity ratios. The profitability is slightly lower than Group A1, but still the values are very good.

Group B is an ”above average” group with somewhat lower profitability than the A groups. The companies located here are distinguished by very high values in the Return on Equity ratio.

Group C1 is the better of the two middle-groups. The companies located here are distinguished by decent profitability and good liquidity.

Group C2 is the slightly worse middle-group. The companies situated here are characterised by decent profitability but the liquidity is generally worse than in Group C1.

Group D consists of companies with the worst financial performance. Lousy profitability and solidity. Generally the companies are service providers from either Europe or North America.

Page 43: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

43

Sony is constantly moving around the average group. They

are also showing a tiny backtracking during 1999-2001.

Sony

Annual MovementsManufacturers, 1995-2001

Motorola also shows difficulties with the rapidly changing market.

Starting in Group A2 in 1995, Motorola backtracks constantly

and enters Group D in 2001. Much similar performance as Ericsson.

Motorola

Ericsson is showing good financial performance during 1995-2002,

constantly situated in Group B. In 2001, Ericssons financial problems are

visualised by the backtracking into Group D. Almost all key ratios show

worsen figures at this time.

Ericsson

Nokia is performing well, moving from Group B to Group A1 in 1997,

but also backtracking slightly in 2001, probably as a result of making

a one-time charge to uncertain receivables. Nokia is constantly showing excellent profitability.

Nokia

© Copyright Gilta Group 2002

Page 44: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

44

Sony

Sony is also moving over the whole map, without any

particular patterns. They are however, constantly moving around the average to poor

groups.

Motorola

Motorola has showed mediocre to poor result for the last two years.

During 2001-2002 they show similarities to Ericsson, and they

also enter the worst group.

Ericsson

Ericsson is showing severe difficulties during 2001-2002. Previously they have been situated close to or inside the above average groups, but during the last two years they are showing

much poorer result, and are situated in the worst group.

NokiaNokia is performing well, situated in

Group A1 constantly during the years 2001-2002. Exept for the the third quarter of 2001, when they backtracked into Group C2, as a

result of making a one-time charge to uncertain receivables.

Quarterly Movements

© Copyright Gilta Group 2002

Manufacturers, 1998-2002Manufacturers, 1998-2002

Page 45: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

© Copyright Gilta Group 2002

Qualitative Analysis Using the Prototype Matching Algorithm

Page 46: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

46

Qualitative Analysis

•GILTA-text mining approach developed at Tampere University of Technology.

•Based on smart encoding of text:• Filtering of text• Encoding from letters to ASCII codes

– e.g. design y=k5 acsii (d) + k4 acsii (e)+ k3 acsii (s)

+ k2 acsii (i)+ +k acsii (g)+ acsii (n) if k=2 then y = 6472 )

• Word-, sentence-, and paragraph-level histograms

•Clustering of texts based upon degree of similarity• Similarity according to the Euclidean distance measure

Page 47: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

47

Qualitative Analysis

Research question: Does the textual part of the annual/quarterly report contain more information about the future than the financial figures do?

Tested using the Telecom SOM database combined with qualitative analysis. Tested on the quarterly reports of three major telecom companies: Nokia, Ericsson, and Motorola, for the period 2000-2001.

Page 48: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

48

• Quarterly reports from Nokia

• Quarterly reports from Motorola

• Quarterly reports from Ericsson

Analysis with qualitative dataThe report in bold represents the

prototype, and the reports below it are its closest matches. The

method appears to have captured a trend for Ericsson, although less

so for the others.

Q1 – Q4 represent the quarter for the

report, preceded by

the year

The letter following the

report indicates the cluster from the quantitative (SOM) analysis.

Page 49: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

49

The qualitative analysis

precedes the quantitative

analysis by one quarter for Ericsson.

Page 50: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

50

Conclusions

•Clusters from qualitative and quantitative analysis did not coincide

•We captured a tendency:• the text reports tend to foresee the changes in financial

states of the companies, before those changes influence the financial ratios

• Trend for Ericsson clear, Nokia and Motorola not quite as clear

•Promising results, but:

•Limitations: very small dataset

Sample publication: Kloptchenko, A., Eklund, T., Back, B., Karlsson, J., Vanharanta, H., Visa A. (2004),

Combining Data and Text Mining Techniques for Analyzing Financial Reports. International Journal of Intelligent Systems in Accounting, Finance, and Management, Vol. 12, No. 1, pp. 29-41.

Page 51: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

Forecasting the Output Variables of a Glass

Manufacturing Process

Page 52: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

52

Forecasting the Output Variables of Glass Manufacturing Process

The objective of the study was to model a glass manufacturing process in order to identify critical process variables. The research was performed using a supervised artificial neural network based upon adaptive retraining.

Page 53: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

53

Forecasting the Output Variables of Glass Manufacturing Process

Iulian Nastac and Adrian Costea

• 29 input variables and 5 output variables

• The data consists of 9408 rows - one data set every 15 minutes during 14 weeks.

• For the first 12 weeks (8064 rows) input and output data are given.

• For the last 2 weeks (1344 rows) only input data are provided.

Solutions were ranked by accuracy of the output forecasting according to the following formula:

5

1 1

)(100

5

1

i

N

n Rni

FniRni nfO

OO

NERR

Page 54: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

54

Forecasting Neural Network Model

Page 55: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

55

THE RETRAINING PROCEDURE OF AN ANN

Training an Artificial Neural Network in normal way with validation stop

Reduction of the first network weights with a scaling factor (0 < < 1)

Retraining the network with the new initial weights

Compare the validation error in both cases

Page 56: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

56

Real data (green lines), neural network values (blue lines) after training process, and predicted outputs (red lines)

Page 57: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

57

Conclusions

Retraining procedure reduces the output error and improves the performance of training with validation stop

Level values provide better result

Delay vectors with more than 9 elements can increase the performance of our tool

It is very easy to change in our tool the SCG algorithm with other one because at the basic level the architecture and the retraining procedure are independent of the training algorithm.

opt[0.4, 0.6]

Sample publication: Iulian Nastac and Adrian Costea. Advanced Data Forecasting Using

Retraining Neural Network Technique. Technical Report 542, TUCS, 2003.

Page 58: Data Mining and Knowledge Management Laboratory. 2 Contents 1. Introduction Personnel Research projects Background and methods 2. Research Examples Financial

Thank you for your interest!

For more information, please see the publications on the laboratory’s homepage:

http://www.tucs.fi/research/labs/datamin.php