why data mining research does not contribute to business? mykola pechenizkiy, seppo puuronen...

19
Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy , Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal Department of Computer Science Trinity College Dublin Ireland DMBiz’05 Porto, Portugal October 3, 2005

Upload: lewis-davis

Post on 17-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

Why Data Mining Research Does Not Contribute to

Business?

Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science

University of Jyväskylä Finland

Alexey Tsymbal

Department of Computer ScienceTrinity College Dublin

Ireland

DMBiz’05 Porto, Portugal October 3, 2005

Page 2: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

2

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

OutlineOutline

• Introduction and What is our message?

• Where we are? – rigor vs. relevance in DM

• Towards the new framework for DM research– DM System as adaptive Information System (IS)

– DM research as IS Development: DM system as artefact

– DM success model: success factors

• Further plans and Discussion

Page 3: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

3

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Our MessageOur Message

• DM is still a technology having great expectations to enable organizations to take more benefit of their huge databases.

• There exist some success stories where organizations have managed to have competitive advantage of DM.

• Still the strong focus of most DM-researchers in technology-oriented topics does not support expanding the scope in less rigorous but practically very relevant sub-areas.

• Research in the IS discipline has strong traditions to take into account human and organizational aspects of systems beside the technical ones.

Page 4: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

4

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Our MessageOur Message

• Currently the maturation of DM-supporting processes which would take into account human and organizational aspects is still living its childhood.

• DM community might benefit, at least from the practical point of view, looking at some other older sub-areas of IT having traditions to consider solution-driven concepts with a focus also on human and organizational aspects.

• The DM community by becoming more amenable to research results of the IS community might be able to increase its collective understanding of

– how DM artifacts are developed – conceived, constructed, and implemented,

– how DM artifacts are used, supported and evolved,

– how DM artifacts impact and are impacted by the contexts in which they are embedded.

Page 5: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

5

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Existing Frameworks for DMExisting Frameworks for DM

• Theory-oriented– Databases;

– Statistics;

– Machine learning;

– Data compression

• Process-oriented– Fayyad’s

– CRISP-DM

– Reinartz’s

– Reductionist approach of viewing DM as statistics has advantages of the strong background, and easy-formulated problems.

– The DM tasks concerning processes like clustering, regression and classification fit easily into these approaches.

– More recent (process-oriented) frameworks address the issues related to a view of DM as a process, and its iterative and interactive nature

Page 6: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

6

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Rigor and Relevance in DM Rigor and Relevance in DM ResearchResearch

• Lin in Wu et al. notices that a new successful industry (as DM) can follow consecutive phases: 1. discovering a new idea,

2. ensuring its applicability,

3. producing small-scale systems to test the market,

4. better understanding of new technology and

5. producing a fully scaled system.

• At the present moment there are several dozens of DM systems, none of which can be compared to the scale of a DBMS system.– This fact indicates that we are still in the 3rd phase in

the DM area!

Page 7: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

7

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Rigor vs Relevance in DM Rigor vs Relevance in DM ResearchResearch

Relevance

Rigor

Relevance Rigor

Page 8: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

8

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Where is the focus?Where is the focus?• Still! … speeding-up, scaling-up, and increasing the accuracies of

DM techniques.

• Piatetsky-Shapiro : “we see many papers proposing incremental refinements in association rules algorithms, but very few papers describing how the discovered association rules are used”

• Lin claims that the R&D goals of DM are quite different:

– since research is knowledge-oriented while development is profit-oriented.

– Thus, DM research is concentrated on the development of new algorithms or their enhancements,

– but the DM developers in domain areas are aware of cost considerations: investment in research, product development, marketing, and product support.

• However, we believe that the study of the DM development and DM use processes is equally important as the technological aspects and therefore such research activities are likely to emerge within the DM field.

Towards the new framework for DM research …

Page 9: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

9

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

DMS in the Kernel of an DMS in the Kernel of an Organization Organization

DM Task(s)

DMS (Artifact)

Organization

Environment

• DM is fundamentally application-oriented area motivated by business and scientific needs to make sense of mountains of data.

• A DMS is generally used to support or do some task(s) by human beings in an organizational environment both having their desires related to DMS.

• Further, the organization has its own environment that has its own interest related to DMS, e.g. that privacy of people is not violated.

Page 10: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

10

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

The ISs-based paradigm for The ISs-based paradigm for DMDM

Ives B., Hamilton S., Davis G. (1980). “A Framework for Research in Computer-based MIS” Management Science, 26(9), 910-934.

“Information systems are powerful instruments for organizational problem solving through formal information processing”

Lyytinen, K., 1987, “Different perspectives on ISs: problems and solutions.” ACM Computing Surveys, 19(1), 5-46.

User Environment

IS Development Environment

IS operations

environment

The Use

Process

The Development

Process

The Operation Process

The Organizational Environment

The External Environment

The Information Subsystem

(ISS)

Page 11: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

11

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

DM Artifact DevelopmentDM Artifact Development

DM ArtifactDevelopment

Experimentation

Theory Building

Observation

Adapted from: Nunamaker, W., Chen, M., and Purdin, T. 1990-91, Systems development in information systems research, Journal of Management Information Systems, 7(3), 89-106.

A multimethodological approach to the construction of an artefact for DM

Page 12: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

12

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

The Action Research and Design The Action Research and Design Science Approach to Artifact Creation Science Approach to Artifact Creation

DesignKnowledge

Awareness of business problem

Action planning

Action taking

Conclusion

BusinessKnowledge

Artifact Development

Artifact Evaluation

ContextualKnowledge

Page 13: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

13

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

DM Artifact Use: Success Model 1 DM Artifact Use: Success Model 1 of 3of 3

SystemQuality

InformationQuality

Use

UserSatisfaction

IndividualImpact

OrganizationalImpact

Service Quality

Adapted from D&M IS Success Models

Page 14: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

14

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

DM Artifact Use: Success Model 2 DM Artifact Use: Success Model 2 of 3of 3

• What are the key factors of successful use and impact of DMS both at the individual and organizational levels.

1. how the system is used, and also supported and evolved, and

2. how the system impacts and is impacted by the contexts in which it is embedded.

Coppock: the failure factors of DM-related projects.• have nothing to do with the skill of the modeler or the

quality of data.• But those do include:

1. persons in charge of the project did not formulate actionable insights,

2. the sponsors of the work did not communicate the insights derived to key constituents,

3. the results don't agree with institutional truths

the leadership, communication skills and understanding of the culture of the organization are not less important than the traditionally emphasized technological job of turning data into insights

Page 15: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

15

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

DM Artifact Use: Success Model 3 DM Artifact Use: Success Model 3 of 3of 3

• Hermiz communicated his beliefs that there are

the four critical success factors for DM projects: • (1) having a clearly articulated business problem that needs

to be solved and for which DM is a proper tool;

• (2) insuring that the problem being pursued is supported by the right type of data of sufficient quality and in sufficient quantity for DM;

• (3) recognizing that DM is a process with many components and dependencies – the entire project cannot be "managed" in the traditional sense of the business word;

• (4) planning to learn from the DM process regardless of the outcome, and clearly understanding, that there is no guarantee that any given DM project will be successful.

Page 16: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

16

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

New Research Framework for DM New Research Framework for DM Research Research

Ap

plic

able

K

no

wle

dg

e

(Un-)Successful Applications in the appropriate environment

People Organizations Technology

Environment Knowledge Base

Foundations Design knowledge

Develop/Build

Justify/Evaluate

Assess Refine

Contribution to Knowledge Base

DM Research

Bu

siness

Need

s

Relevance Rigor

Page 17: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

17

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

New Research Framework for DM New Research Framework for DM Research Research

People Roles Capabilities Characteristics Organizations Strategy Structure&Culture Processes Technology Infrastructure Applications Communications Architecture Development Capabilities

Environment Knowledge Base

Foundations Base-level theories Frameworks Models Instantiation Validation Criteria Design knowledge Methodologies Validation Criteria (not instantiations of models but KDD processes, services, systems)

Develop/Build Theories Artifacts

Justify/ Evaluate Analytical Case Study Experimental Field Study Simulation

Assess Refine

(Un-)Successful Applications in the appropriate environment

Contribution to Knowledge Base

DM Research

Ap

plic

able

Kn

ow

led

ge

Bu

sines

s Ne

eds

Relevance Rigor

Page 18: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

18

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Further WorkFurther Work

• Definition of Relevance concept in DM research

• The revision of the book chapter

• Further work on the new framework for DM research

• Organization of Workshop/Working conf. or ST on – more social directions in DM research likely with one of the

focuses on IS as a sister discipline.

– SIAM DM 2006 Interests include

• Human Factors and Social Issues:

Ethics of Data Mining Intellectual Ownership Privacy Models Privacy Preservation Techniques Risk Analysis User Interfaces Data and Result Visualization

Page 19: Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland

19

DMBiz’05 Porto, Portugal October 3, 2005Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A.

Tsymbal

Thank You!Thank You!

Book chapter draft is available on request from

Mykola Pechenizkiy

Department of Computer Science and Information Systems,

University of Jyväskylä, FINLANDE-mail: [email protected]

Tel.: +358 14 2602472 Fax: +358 14 260 3011

http://www.cs.jyu.fi/~mpechen

Feedback is very welcome:• Questions

• Suggestions

• Collaboration