semantic-driven design and management of kdd processes

15
Semantic-Driven Design and Management of KDD Processes Emanuele Storti [email protected] Università Politecnica delle Marche Dipartimento di Ingegneria Informatica, Gestionale e dell'Automazione Ancona, Italy CTS 2010, Chicago, May 19

Upload: emanuele-storti

Post on 11-May-2015

185 views

Category:

Technology


0 download

DESCRIPTION

Full paper: http://boole.diiga.univpm.it/paper/cts10.pdf

TRANSCRIPT

Page 1: Semantic-Driven Design and Management of KDD Processes

Semantic-Driven Designand Management of KDD Processes

Emanuele [email protected]

Università Politecnica delle MarcheDipartimento di Ingegneria Informatica, Gestionale e dell'AutomazioneAncona, Italy

CTS 2010, Chicago, May 19

Page 2: Semantic-Driven Design and Management of KDD Processes

Introduction

Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

Organizations need methods and technologies to analyze huge amounts of data, to support decisional processes

Page 3: Semantic-Driven Design and Management of KDD Processes

Introduction

Process iteration, many steps

Knowledge user interaction

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

Team work virtual organizations

Page 4: Semantic-Driven Design and Management of KDD Processes

Introduction

Process iteration, many steps

Knowledge user interaction

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

Team work virtual organizations

Page 5: Semantic-Driven Design and Management of KDD Processes

Introduction

Process iteration, many steps

Knowledge user interaction

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

Team work virtual organizations

Page 6: Semantic-Driven Design and Management of KDD Processes

Introduction

Process iteration, many steps

Knowledge user interaction

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

Team work virtual organizations

domain experts

DBA

DM expert

KDD expert

KDD in a Collaborative Distributed Scenario

Examples: KD for enterprises e-Science workflows

Page 7: Semantic-Driven Design and Management of KDD Processes

Major issues

Many KDD tools are available for each phase/task:

How to set-up/execute the tools? How to compose them? How to support novice users?

heterogeneity integrationcomplexity

Some general questions:

How to provide support for process design? How to manage execution and interactions?

localization coordination

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

Distribution of users and tools:

How to locate the needed tools? How to manage coordination?

Page 8: Semantic-Driven Design and Management of KDD Processes

Approach (i)Service-oriented platform for sharing, discovering, accessing, executing data analysis and knowledge discovery tools

KDD tools produced by different organizations are remotely accessible as basic services through standard protocols

Formalization of experts' knowledge in a conceptual semantic model, to support advanced services (process composition)

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

KDDONTO: an ontology for describing algorithms, interfaces, data structures, methods, tasks:

sharing of knowledge / agreement on definitions: each actor can refer to the same definition of an algorithm or data

human/machine understandable (conceptual/formal model) automatic reasoning support for non-expert users

Page 9: Semantic-Driven Design and Management of KDD Processes

CTS 2010, Chicago, May 19

Algorithm

ClassificationAlgorithm

ID3

DecisionTreeAlgorithmID3_v.2.3service

is-a

is-a

is-a

Approach (ii)

KDDONTO fragment

Service + descriptor

Emanuele Storti, UNIVPM, Italy

Separation of information in different layers (reusability):

Algorithm, described into the ontology

Service, implements a specific algorithm its descriptor points to the corresponding ontological concept

Page 10: Semantic-Driven Design and Management of KDD Processes

Process composition

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

composer goal,datasetrequirements

KDDONTO

CO

MPO

SITI

ON

Abstract process

Page 11: Semantic-Driven Design and Management of KDD Processes

Process composition

Planner for semiautomatic composition of abstract KDD process

1. algorithm match: given 2 algorithms, are they compatible? (based on ontology properties - exact vs. approximate match)

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

y is equal to yX2 is part_of X

Page 12: Semantic-Driven Design and Management of KDD Processes

Process composition

CTS 2010, Chicago, May 19

KDDComposerPrototype

Emanuele Storti, UNIVPM, Italy

2. goal-oriented composition procedure: iterative execution of algorithm match

Input: goal, dataset, some constraintsExecution: backwards, from goal to datasetOutput: a ranked list of valid abstract processes

Page 13: Semantic-Driven Design and Management of KDD Processes

Translation to concrete process

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

composer goal,datasetrequirements

KDDONTO

Abstract process

Concrete process

CO

MPO

SITI

ON

TRA

NSL

ATIO

N

broker UDDIsyntactic verification

Page 14: Semantic-Driven Design and Management of KDD Processes

Verification and Execution

Collaborative/distributed scenario: complex interactions among actors and time-consuming transactions.

CTS 2010, Chicago, May 19

It is needed to provide guarantees about process correctness at design-time

Reo, a “glue code” for explicitly modeling interaction among components (tools, GUI, ...)

1

2

3

Specification of the interaction protocol

Interaction design

Specs verification

Emanuele Storti, UNIVPM, Italy

Page 15: Semantic-Driven Design and Management of KDD Processes

Verification and Execution

CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

composer goal,datasetrequirements

KDDONTO

Abstract process

Concrete process

CO

MPO

SITI

ON

TRA

NSL

ATIO

NVE

RIF

ICAT

ION

EXEC

UTI

ON

REOmodeler

modelchecking exec

broker UDDIsyntactic verification