an agile process for the creation of conceptual models from content descriptions

20
An agile process for the creation of conceptual models from content descriptions Hans-Werner Sehring Centre for Sustainable Content Logistics TuTech Innovation GmbH / Hamburg University of Technology Joint work with: Sebastian Boßung Henner Carl Joachim W. Schmidt

Upload: crescent

Post on 10-Jan-2016

35 views

Category:

Documents


3 download

DESCRIPTION

An agile process for the creation of conceptual models from content descriptions. Hans-Werner Sehring Centre for Sustainable Content Logistics TuTech Innovation GmbH / Hamburg University of Technology Joint work with: Sebastian Boßung Henner Carl Joachim W. Schmidt. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An agile process for the creation of conceptual models from content descriptions

An agile processfor the creation of conceptual models

from content descriptions

Hans-Werner Sehring

Centre for Sustainable Content Logistics

TuTech Innovation GmbH / Hamburg University of Technology

Joint work with:

Sebastian Boßung Henner Carl Joachim W. Schmidt

Page 2: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 2

Outline

1.Conceptual Content Management

2.Asset expressions and schemata

3.The Asset Schema Inference Process

4.Straight-forward schema inference

5.Cluster-based schema inference

6.Process evaluation

7.Summary and outlook

Page 3: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 3

1. Conceptual Content Management

Conceptual Content Management (CCM)– an approach to domain modelling– inspired by epistemology:

entity description by classes and instances, called Assets– Assets are dual entity descriptions consisting of

content visualising it and a conceptual model describing it– model-based system generation

Features:– modelling is carried out by domain experts– domain models are open to changes– existing work is preserved, even if changes are applied– communication between domain experts with individual

models is maintained

Page 4: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 4

CCM dynamics

CCM systems (CCMSs) are dynamically generated from domain models:

– immediately realizing model changes– preserving existing Assets– maintaining communication

Key contributions to this end:– modelling language– model compiler– architecture for

evolvable systems

model Historiographyfrom Time import Timestampfrom Topology import Placeclass Professor { content image concept characteristic n :String relationship publs :Work* }

Intermediate model(parse tree)

… … … …

a:AssetClass b:AssetClass

m:ModelsuperClass

Political_Iconography (PI)

ArtistsRegents

mclient1

client ( Regents )

mclient

client ( PI )

mmed2

mediation ( Regents , Artists )

DB

(Regents )

mclient2

client ( Artists )

DB

(Artists )

mmed1

mediation ( PI , ( Regents , Artists ))

mdistrib1

distribution ( PI , Regents )

mdistrib2

distribution ( PI , Artists )

DB

(PI )

Page 5: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 5

Model-driven development

All SW development starts with a conceptual model– especially model-driven development approaches call for

models with a sufficient degree of formality– CCM is similar to model-driven development in the respect

that software creation is highly automated– in CCM, software generation is even dynamic

A CCM model is required as a starting point for CCMSs– usually, some modelling expert (analyst) is consulted– due to dynamics requirement, such a modelling expert

cannot be employed in CCM– domain experts are not modelling experts; usually have

problems with, e.g., sufficient formality– but: experts can “tell their story” by providing examples

Page 6: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 6

2. Asset expressions and schemata

In many domains research starts by regarding instances (samples), not concepts

concerns: Teacher

name : Name

Ludwig Heydenreich

issued : Place

issuedWhen : Timestamp

issuedBy : Professor

name : NameGeorg Thilenius

: CareerStep

: Professorpublications: Work*

: Dissertation

title: Name

Die Sakralbau-Studien Leonardo da Vinci' s

reviewer: Professor

: Professorname : Name

Erwin Panofsky

: Book

title: Name

Architecturein Italy

: City

: FullProfessor

24 Feb 1934

where : GeoPoint

Page 7: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 7

Asset model from the example

Manually defined classes for the example:model Historiographyfrom Time import Timestampfrom Topology import Placeclass Professor { content image concept characteristic name :String relationship publications :Work* }class Work { content scan concept characteristic title :String relationship concerns :Professor* relationship issued :Issuing relationship reviewers :Professor*}class Issuing { concept relationship issued :Place relationship issuedBy :Professor relationship issuedWhen :Timestamp }

Models consisting of classesClasses with

• content handles and• attributes (and constraints)

• characteristics• relationships

Models consisting of classesClasses with

• content handles and• attributes (and constraints)

• characteristics• relationships

Page 8: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 8

Asset model from the example (cont’d)

Example of personalisation: a domain expert introduces the distinction of documents:model MyHistoriographyfrom Historiography import Work, Professorclass Work { concept relationship reviewer unused}class Dissertation refines Work { concept relationship reviewer :Professor*}

Import and redefinition of classes for• schema evolution (user communities)• personalisation (single users)• …

Import and redefinition of classes for• schema evolution (user communities)• personalisation (single users)• …

Page 9: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 9

3. Asset Schema Inference Process (ASIP)

Bootstrapping: CCM itself requires an initial model as a starting point for the open dynamic modelling process

Required: sytematic support for domain experts in finding suitable models

Start with Asset Expressions:– content abstractions and applications:

assigned names and bound values– semantic types (concepts): no inner structure

Concepts and classes are not distinguished in CCM models, intensional and extensional definitions

Free-form entity descriptions are used as samples; later they become instances of classes

reviewer: Professor

: Professor

Page 10: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 10

Agile CCMS development

Agility:– based on the possibility to generate CCMSs dynamically– domain experts review their models based on experiences

with an operational CCMS– if changes to the model are required, another iteration of

the process is started– entity descriptions created within the CCMS can be used as

samples for the next iteration of the process

Create Asset expressions

Construct schema

Generate CCMS

Page 11: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 11

ASIP phases

The ASIP has four phases

Sample acquisition

Schema inference

Feedback questions

Prototype generation

System generation

unhappy with schema:-modify samples(- modify schema)

answer questions

Phase 1

Phase 2

Phase 3

Phase 4

Page 12: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 12

Two schema inference experiments

Experiments with alternatives for phases 2 and 3:– (traditional) schema inference plus user feedback

straight-forward approach starting from singletons– clustering, supervised by domain experts

statistical approach, semi-supervised learning

Phase 3 (generation of questions to gather feedback) is determined by the alternative chosen

Result of phases 1-3 is a CCM model:– prototype generation and system generation (phase 4)

are carried out by the CCM model compiler– the domain expert can modify the inferred schema

(openness and dynamics)

Page 13: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 13

4. Straight-forward schema inference

Schema construction by traditional schema inference1. derive naive classes directly from the set of samples

2. apply simplifications

3. if changes where applied to the schema, repeat step 2

Step 1: for each sample create an Asset class with– a content handle whose type is determined by the encoding

format of the sample’s content– attributes for all abstractions over the content

• characteristics for certain known types• relationships for other types• no further constraints

Page 14: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 14

Schema simplification

Step 2: simplifications, repeatedly applied in the specified order

– identical class: unify classes with attributes and content handles with identical names and types

– inheritance: subtype relationship of classes whose sets of attributes are in a subset relationship

– type match: if two classes have attributes and content handles of identical types, prompt expert for unification

– inheritance orphan: ask domain expert about removal of classes with only few instances

Note:– often classes considered equal if the attributes’ types match– here the name is considered, or else feedback is collected

Page 15: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 15

5. Cluster-based schema inference

Schema construction by clustering:– cluster samples, create classes from clusters– experiment based on k-means algorithm

Clustering steps:– classification: assign classes to clusters based on distance

measure d:d(s,c) = α dsem(s,c) + (1-α) dstruct(s,c), α[0..1]

– optimisation: recompute the cluster centres– inheritance hierarchy creation: like in the simple approach– feedback: visualise the clusters, allow to partition clusters

=> semi-supervised learning

Less user interaction than in the traditional approach

Page 16: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 16

Structural distance measure

dstruct is based on the length of the shortest edit script (similar to string matching)

Costs like:edit operation cost magnitudeadd attribute lowremove attribute highchange attribute name lowbroaden attribute type mediumnarrow attribute type very lowincrease cardinality of attribute value mediumdecrease cardinality of attribute value very low

Page 17: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 17

Semantic distance measure

dsem is determined by the shortest paths in the class hierarchy

1/2h(T1) if T1 is direct supertype of TC

dsem(T1,Tm) + dsem(Tm,TC) if T1 is direct supertype of Tm

dsem(s,c) = and Tm is supertype of TC

dsem(TS,T1) + dsem(TS,TC) if TS is the most specific commonsupertype of T1 and TC

Any

WorkOfArt Person

Text Image

Bookh(T)

Any

WorkOfArt

Text

Book

Image

Person1/2 1/2

1/4

1

0

1

2

3distance?

Page 18: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 18

6. Process evaluation

Schema quality:– generally difficult to judge– for domain modelling: not a schema that describes sample

best, but model that best represents the application domain

Criteria [Cherfi, Akoka, Comyn-Wattiau]:– specification:

• graphical legibility• simplicity• expressiveness• syntactical correctness• semantic correctness

– usage: completeness, understandability– implementation: implementability, maintainability

Page 19: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 19

Process evaluation (cont’d)

Selected parameters:– simplicity: in general depends on

• the given sample set• domain expert’s answers in feedback phase

– syntactical correctness: granted by model generation– semantic correctness: can be negatively impacted by

structurally coinciding classes with different meanings– understandability:

• generated class names can be an obstacle• but: generated system lowers impact of schema

– implementability: by generation– maintainability: through dynamics

Page 20: An agile process for the creation of conceptual models from content descriptions

30 September 2007 An agile modelling process - Hans-Werner Sehring, 2007 20

7. Summary and outlook

Summary:– Conceptual Content Management allows domain experts

to provide and individually change domain models– domain experts are usually no modelling experts, and they

prefer to start with samples describing observations– a process helps domain experts defining initial models to

start the open dynamic CCM activity– as one novel approach a cluster-based schema inference

process has been investigated

Outlook: future work will include …– the inclusion of the cluster-based approach into the open

modelling for extensional concept definitions– the employment of reasoning techniques (induction,

abduction) to guide the schema construction process