1 reuse of a repository of conceptual schemas in a large scale project carlo batini university of...

59
1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy [email protected]

Upload: alexia-poole

Post on 28-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

1

Reuse of a repository of conceptual schemas

in a large scale project

Carlo BatiniUniversity of Milano Bicocca, Italy

[email protected]

Page 2: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

2

Goal and contents • Use of an existing repository of schemas,

representing relevant info managed in Central Public administration.– 500 main databases – 500 conceptual schemas organized in a Repository– 5.000 entities and 10.000 attributes

• Produce the corresponding repository for a group of regional local administrations in Piedimont – 450 relational schema available– About 15.000 relational tables

• Human resources available– 2 person/years

• Heuristics and related methodology• Experiments• Recent developments

Page 3: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

3

Organization of the Central and Local Public Administration in

Italy

Page 4: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

4

Organization of Central and Local Public Administration in

Italy

• Central PA – 50 Ministeries and other Agencies

• Local PA– 21 Regions– More than 100 Provinces– More than 8.000 municipalities

Page 5: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

5

Organization of the central PA Repository

Page 6: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

6

An example of a repository in the small

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

Born

CITY

Department structure

Page 7: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

7

An example of a repository in the small

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

Born

CITY

Department structure

integrationWARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

Page 8: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

8

An example of a repository in the small

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

ITEM ORDER

DEPART EMPLOYEE CITY

SElLER

Man

PURCHIn Of

Acq

Born

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integration

abstraction

WARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

Page 9: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

9

An example of a repository in the small

DEP EMPMan

CITY

Born

DEP EMP-DATAD-E

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM ORD

EMPL

SELLER

PURIn Of

Acq

ITEM ORD-DATA

EMP-DATA

In

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

ITEM

DEP EMP-DATAD-E

ProducT

ITEM

DEPART EMPLOYEE

Product

Company SalesProduction

ITEM ORDER

DEPART EMPLOYEE CITY

SElLER

Man

PURCHIn Of

Acq

Born

Born

CITY

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integration

view

view

abstraction

WARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

Page 10: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

10

Views not represented

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

ITEM ORDER

DEPART EMPLOYEE CITY

SElLER

Man

PURCHIn Of

Acq

Born

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integration

abstraction

WARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

Page 11: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

11

Only some abstractions represented

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integrationWARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

Page 12: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

12

Sparse approach

SI12345678

SI123 SI456 SI78

S1 S2 S3 S4 S5 S6 S7 S8

Page 13: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

13

Structure of the Central PA Repository

Social security Justice Environment

Health

Page 14: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

14

Structure of the Central PA Repository

Social security

Justice Environment

Health

Abstract Schemas

50

BasicSchemas

500

Page 15: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

15

COMMUNICATION AND TRANSPORTSPRODUCTIONLABOUREDUCATIONHABITAT

BUILDINGCULTURESOCIAL HEALTHSECURITY JUSTICEDEFENCEFOREIGN AFFAIRS

SOCIALINSURANCECERTIFICATION

INTEGRATED DIAGRAM OF 1st LEVEL PA DATABASE

INTEGRATED DIAGRAM OF 2nd LEVEL PA DATABASE

INTEGRATED DIAGRAM OF 3rd LEVEL PA DATABASE

SERVICES

GENERAL SERVICES DIRECT SERVICESSOCIAL AND ECONOMIC SERVICES

LA

ND

RE

GIS

TR

Y

SO

CIA

L S

EC

UR

ITY

FO

RE

IGN

RE

LA

TIO

NS

IN

IT

AL

Y

ITA

LIA

N R

EL

AT

ION

S A

BR

OA

D

LE

GA

L A

CT

ITIT

IES

UR

BA

N C

RIM

INA

LIT

Y

INT

ER

NA

L S

EC

UR

ITY

AS

SIS

TA

NC

E

HA

EL

TH

SE

RV

ICE

CU

LT

UR

E

HA

BIT

AT

CU

LT

UR

AL

HE

RIT

AG

E

LA

BO

UR

MA

RK

ET

FA

RM

CO

MP

AN

IES

IND

US

TR

IAL

CO

MP

AN

IES

TR

AN

SP

OR

TS

SOCIAL SERVICES ECONOMIC SERVICES

FUN

D T

RA

NSF

ER

TO

LO

CA

L B

OD

IES

FOR

PU

BL

IC A

CT

IVIT

IES

EX

PEN

SES

CH

AP

TE

R

STATISTICSSUPPORTRESOURCES

FINANCIAL RESOURCES

INSTRUMENTAL AND REAL ESTATE RESOURCES

HUMAN RESOURCES

PRO

TO

CO

L

CO

LL

EC

TIV

E B

OD

Y

TA

X O

FFIC

E

CU

STO

MS

HO

USE

RESOURCES

INT

RU

ME

NT

S

MO

TO

R V

EH

ICL

ES

RE

AL

ES

TA

TE

EM

PLO

YE

ES

TR

AIN

ING

DE

LE

GA

TIO

NS

2/93

2/12

8/29

36/

693/

182

3/30

2/89

3/59 2/65

37/3

36

3/75

3/66

9/11

8

4/36

6/53 10

/76

6/7

66/

130 5/

566/

155 3/

134

8/21

3

10/1

00

9/11

8

3/53

9/11

2 10/1

78

The whole repository of schemas

Page 16: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

16

Individual

Document

Legal person

Subject

Property

Place

The top level schema of the repository

Page 17: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

17

Input knowledge for the production of the repository

of local conceptual schemas

Logical schemas

Conceptual schemas

Local Public Administration

Central Public Administration

Abstractschemas

Basic schemas

Repository of local Conceptual schemas

Page 18: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

18

Conjecture (1) and strategy (2)

• 1. Knowledge appearing in the abstract schemas of the Central PA Repository should appear unchanged also in the Local PA Repository

• 2. Knowledge appearing in the basic schemas of the Central PA Repository should be changed/updated according to the knowledge appearing in the local logical schemas

Page 19: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

19

Using a more compact representation

Abstractschemas

Basic schemas

Generalizationhierachies of

-Individual-Legal person-Document-Place -Property

Page 20: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

20

A fragment of the generalization hierarchy for Individual

Individual Employment

Unemployed Employed Dependant AutonomousIn search of employment Retired

State pension retired Private pension retired Early retired Disability retired

Education ……..…

Page 21: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

21

Input knowledge for the production of the Repository

of local conceptual schemas

Central Public Administration Local Public Administration

Conceptual schemas

Logical schemas

Abstractschemas

Basic schemas

Generalizationhierachies of -Individual-Legal person-Document-Place -Property

Repository of local Conceptual schemas

Page 22: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

22

The two phases of the methodology

Automatic local schemaconstruction

Draftschema

Final schema

Manualstep

Domainexpert

Page 23: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

23

The methodology at a glance

• Phase 1– 1. Extract entities– 2. Add generalizations– 3. Extract relatioships– 4. Add relationships related to integrity

constraints• Phase 2: Expert domain step

Page 24: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

24

Step 1: Extract entities

• Inputs

Generalizationhierachies of

-Individual-Legal person-Document-Place -Property

Relational local PA schemas

Output

Draft schema

Page 25: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

25

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

Page 26: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

26

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

E1

Page 27: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

27

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

E1

E2

Page 28: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

28

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

E1

E2

E3

Page 29: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

29

Step 1: Extract entities

Generalizationhierachies

E1

E2

E3

Tables andattributes

E1

E2

E3

…..

Page 30: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

30

Step 2: Add generalizations

• Inputs

Generalizationhierachies of

-Individual-Legal person-Document-Place -Property

E1

E2

E3

Draft schema

Output

New draft schema

Page 31: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

31

Add generalizations

Tables andattributes

E1

E2

E3

E1

E2

E3

…..

Page 32: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

32

Step3: Extract relationships

• Inputs

E1

E2

E3

Draft schema

Social security Justice Environment Health

Basic schemas of the central PA repository

Output

New draft schema

Page 33: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

33

Extract relationships

E1

E2

E3

Page 34: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

34

Extract relationships

E2

E1

E2

E3

Page 35: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

35

Extract relationships

E2 E1 E1

E1

E2

E3

Page 36: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

36

Extract relationships

E2 E1 E3 E1 E3

E3

E2

E1

Page 37: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

37

Extract relationships

E2 E1 E1 E3

E1

E2

E3

Page 38: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

38

Step 4: Add relationships related to integrity constraints

• Inputs

E1

E2

E3

Draft schema

K3

K2

Referential integrity constraints

Output

Final draft schema

Page 39: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

39

Add relationships related to integrity constraints

…..Tables andattributes

E1

E2

E3

K3

K2

E1

E2

E3

Page 40: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

40

Experiments

Page 41: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

41

Experiments on 9 databases in 3 areas

Domain/Type of administration

Region Province

Municipality

Territory x x xBusiness xHealth x

Page 42: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

42

Relevant qualities of the process:correctness

• Correctness of the conceptual schema with respect to the “true” one, i.e. the schema that could be obtained directly by the domain expert through a traditional analysis or else a reverse engineering activity.

• Correcteness is measured with an approximate indirect metrics, corresponding to the percentage of new/deleted concepts in the schema produced by the expert at the end of step 5 in comparison with concepts produced in the semi automatic steps 1-4.

Page 43: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

43

Relevant qualities of the process:completness

• Completeness of the conceptual schema with respect to the corresponding reengineered logical schema. Completeness is measured by the percentage of tables that are catched in steps 1-5, in comparison with the total number of tables, after excluding tables not carrying relevant information, such as redundant tables, tables of codes, etc.

Page 44: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

44

Results

• Correctness: more than 80% • Completness: only 50% of tables are catched. • Completeness decreases significantly when the referential

integrity constraints are not documented or partially documented.

• Another cause of reduced completeness is the static nature of generalization hierarchies used in step 1, and the unequal semantic richness in representing related top level concepts.

• For instance, in the initial Subject hierarchy, 20 concepts represent individuals, while only 3 represent legal persons.

• An improvement we are applying concerns their incremental update with abstract concepts generated by the domain expert in the process

Page 45: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

45

Resources

• For a basic/abstract schema of the central PA repository ½ person month

• For a basic schema of the local PA repository 1 person day

Page 46: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

46

Present developments

Page 47: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

47

Heuristics for abstract schemas

Level 1

Level 2

Level 3

Level 4

Initial schema

Enriched schema

Page 48: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

48

Heuristics for abstract schemas - 1

Level 1

Level 2

Level 3

Level 4

Enriched schema

Page 49: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

49

Heuristics for abstract schemas - 2

Level 1

Level 2

Level 3

Level 4

Enriched schema

Page 50: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

50

Heuristics for abstract schemas - 3

Level 1

Level 2

Level 3

Level 4

Enriched schema

Page 51: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

51

Heuristics for abstract schemas - 4

Level 1

Level 2

Level 3

Level 4

Enriched schema

Page 52: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

52

Heuristics for abstract schemas - 5

Level 1

Level 2

Level 3

Level 4

Enriched schema

Page 53: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

53

Heuristics for abstract schemas - 6

Level 1

Level 2

Level 3

Level 4

Enriched schema

Page 54: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

54

Heuristic for abstract schemas - 7

Level 1

Level 2

Level 3

Level 4

Enriched schema

Page 55: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

55

Individual

Italian citizen

DocumentBusiness

Registry act

Legal person Grant

Concession rule

Project budget

ProcedureSource

Canceled grant

Paid off grant

Awarded grant

Subject

Individual

Italian citizen

DocumentBusiness

Registry act

Legal person

Rule

Subject

Individual

Italian citizenDocument

Business

Legal person

Rule

Subject

Abstract schemas obtained from

the basic schema

Page 56: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

56

Strategies ofr building abstract local schemas

Strategy 1: Abstraction step followed by an integration step

Strategy 2: Abstraction/integration performed together

Actual LPA repository Step 1 Step 2

Actual LPA repository

Page 57: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

57

Leftover

Page 58: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

58

The structure of the cooperative architecture

Basic services

Transport services

Basic services

Transport services

Administration 1

Processes

Administration 1

Processes

Exporteddata

Exportedservices

Internal applications

InternalDBs

Exporteddata

Exportedservices

Internal applications

InternalDBs

Administration 1

Processes

Administration 1

Processes

Page 59: 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

59

Experiments results

Step # of tables extracted

% of tables extracted

Create entities 172 30

Add constraints

219 41

Domain expert check

275 51