foundational data modeling and schema transformations for xml data engineering stephen w. liddle...

42
Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David W. Embley Computer Science Department Brigham Young University, Provo, Utah

Upload: rolf-shelton

Post on 18-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

Foundational Data Modeling and Schema Transformations for XML Data Engineering

Stephen W. LiddleInformation Systems Department

Reema Al-Kamha & David W. EmbleyComputer Science Department

Brigham Young University, Provo, Utah

Page 2: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

224 April 2008 UNISCON 2008, Klagenfurt, Austria

XML Data Engineering

Model XML conceptually Map conceptual models to XML Reverse-engineer XML to conceptual models Ensure properties

Information preserving transformations Constraint preserving transformations Redundancy-free guarantees

Page 3: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

3

C-XML

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 4: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

424 April 2008 UNISCON 2008, Klagenfurt, Austria

Modeling XML Conceptually

Scaling the mountain of abstraction Delicate balance

Enough modeling constructs But not to many

High-level capture of essentials Avoidance of low-level implementation details

Formal but easily understood XML needs better abstractions

Page 5: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

524 April 2008 UNISCON 2008, Klagenfurt, Austria

XML Schema/Model Mismatch

XML features not explicitly supported in traditional conceptual models: Ordered lists of concepts Choice of concept from among several Mixed content Use of content from another model Nested information hierarchies

C-XML

Page 6: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

624 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (1)

Sequence structure Parent concept Ordered child concepts Constrained recurrence of children Constrained recurrence of sequence itself

<xs:sequence minOccurs="1" maxOccurs="2"> <xs:element name="FirstName" type="xs:string"/> <xs:element name="MiddleName" type="xs:string“ minOccurs="0" maxOccurs="2"/> <xs:element name="LastName" type="xs:string"/></xs:sequence>

Page 7: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

7

Missing Modeling Constructs (1)

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 8: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

824 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (2)

Choice structure Parent concept Choose one child concept from several

alternatives Constrained recurrence of chosen child Constrained recurrence of choice itself

<xs:choice maxOccurs="2"> <xs:element name="PhoneNumber" type="xs:string" minOccurs="1" maxOccurs="2" /> <xs:element name="Email" type="xs:string"/> <xs:element name="Fax" type="xs:string"/></xs:choice>

Page 9: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

924 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (3)

Mixed attribute Allows character and element data to be

intertwined <xs:complexType mixed="true">

Any and anyAttribute structures Insert structures from other namespaces Constrained recurrence <xs:any namespace="##other" minOccurs="0"/> <xs:anyAttribute namespace="##any"/>

Page 10: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

1024 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (4)

Nesting of hierarchical structures Key organizational characteristic of XML Arbitrarily complex nesting possible

Page 11: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

11

C-XML Example

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 12: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

12

C-XML TO XML SCHEMA

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 13: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

13

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="Root"> <xs:complexType> <xs:all> <xs:element ref="Students"/> <xs:element ref="Courses"/> <xs:element ref="GradStudents"/> <xs:element ref="UndergradStudents"/> </xs:all> </xs:complexType> <xs:keyref name="UndergradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./UndergradStudents/UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:keyref> <xs:keyref name="GradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./GradStudents/GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:keyref> </xs:element> <xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:choice minOccurs="1" maxOccurs="1"> <xs:element name="StudentName" type="xs:string"/> <xs:sequence> <xs:element name="FirstName" type="xs:string"/> <xs:element name="MiddleNames"> <xs:complexType> <xs:sequence> <xs:element name="MiddleName" minOccurs="0" maxOccurs="2"> <xs:complexType> <xs:attribute name="MiddleName" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="MiddleName-Key"> <xs:selector xpath="./MiddleName"/> <xs:field xpath="@MiddleName"/> </xs:key> </xs:element> <xs:element name="LastName" type="xs:string"/> </xs:sequence> </xs:choice> <xs:element name="Semester-Course-Grades"> <xs:complexType> <xs:sequence> <xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Semester" use="required"/> <xs:attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=>exists [0:*] <x1, x2, x3> (Course(x) Student(x1) Semester(x2) Grade(x3) )) --> <xs:attribute name="Grade" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="Semester-Course-Grade-Key"> <xs:selector xpath="./Semester-Course-Grade"/> <xs:field xpath="@Semester"/> <xs:field xpath="@Course"/> <xs:field xpath="@Grade"/> </xs:key> </xs:element> </xs:sequence> <xs:attribute name="StudentOID" type="xs:string" use="required"/> <xs:attribute name="StudentID" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="StudentOID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentOID"/> </xs:key> <xs:key name="StudentID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentID"/> </xs:key> </xs:element> <xs:element name="Courses"> <xs:complexType> <xs:sequence> <xs:element name="Course" maxOccurs="unbounded"> <xs:complexType> <xs:attribute ref="Course" use="required"/> <xs:attribute name="Department" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="Course-Key"> <xs:selector xpath="./Course"/> <xs:field xpath="@Course"/> </xs:key> </xs:element> <xs:element name="GradStudents"> <xs:complexType> <xs:sequence> <xs:element name="GradStudent" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="GradStudentOID" type="xs:string" use="required"/> <xs:attribute name="Advisor" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="GradStudentOID-Key"> <xs:selector xpath="./GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:key> </xs:element> <xs:element name="UndergradStudents"> <xs:complexType> <xs:sequence> <xs:element name="UndergradStudent" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="UndergradStudentOID" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="UndergradStudentOID-Key"> <xs:selector xpath="./UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:key> </xs:element> <xs:attribute name="Course" type="xs:string"/></xs:schema>

C-XML XML Schema

Page 14: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

14

Algorithm Overview

Generate a forest of scheme trees Translate an individual object set Translate scheme-tree collections of

object sets Create a root node Add uniqueness constraints Translate generalization/specialization

hierarchies

Page 15: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

15

(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*

Generate Scheme Trees

Page 16: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

16

(Course, Department)*

Generate Scheme Trees

Page 17: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

17

(GradStudent, Advisor)*(UndergradStudent)*

Generate Scheme Trees

Page 18: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

18

(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*

(Course, Department)*

(GradStudent, Advisor)* (UndergradStudent)*

Generate Scheme Trees

Page 19: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

19

Student, StudentID, StudentName, FirstName, LastName

MiddleName Course, Semester, Grade

Course, Department GradStudent, Advisor UndergradStudent

(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*

(Course, Department)*

(GradStudent, Advisor)* (UndergradStudent)*

Generate Scheme Trees

Page 20: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

20

Individual Object Sets

<xs:attribute name="Department" type="xs:string"/><xs:attribute name="Course" type="xs:string"/><xs:attribute ref="Course"/><xs:element name="FirstName" type="xs:string"/><xs:element name="Student"> <xs:complexType> ... <xs:attribute name="StudentOID" type="xs:string" use="required"/> </xs:complexType></xs:element>

Page 21: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

21

Scheme-Tree Translation

Students

Courses GradStudents UndergradStudents

MiddleNames

Course-Semester-GradesMiddleNames

Students

Student

MiddleName

Course GradStudent UndergradStudent

Course-Semester-Grade

Page 22: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

22

Scheme-Tree Translation

<xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> ... </complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>

<xs:element name="Semester-Course-Grades"> <xs:complexType> <xs:sequence> <xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> ... </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> ...</xs:element>

Page 23: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

23

Scheme-Tree Translation

<xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Semester" use="required"/> <xs:attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=> exists [0:*] <x1, x2, x3> (Course(x) Student(x1) Semester(x2) Grade(x3) )) --> <xs:attribute name="Grade" type="xs:string" use="required"/> </xs:complexType></xs:element>

Page 24: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

2424

Page 25: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

25

Root Element

Students

Courses GradStudents UndergradStudents

<xs:schema > <xs:element name="Root"> <xs:complexType> <xs:all> <xs:element ref="Students"/> <xs:element ref="Courses"/> <xs:element ref="GradStudents"/> <xs:element ref="UndergradStudents"/> </xs:all> </xs:complexType> ... </xs:element> ...</xs:schema>

Page 26: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

26

Uniqueness Constraints

<xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> ... </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="StudentOID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentOID"/> </xs:key> <xs:key name="StudentID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentID"/> </xs:key> </xs:element>

Page 27: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

27

Generalization/Specialization

<xs:keyref name="UndergradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./UndergradStudents/UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:keyref> <xs:keyref name="GradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./GradStudents/GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:keyref>

Page 28: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

28

XML SCHEMA TO C-XML

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 29: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

29

XML Schema C- XML

Page 30: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

30

Algorithm Overview

Generate object sets for each element & attribute Specify built-in and simple types in data frames Obtain relationship sets from parent-child connections Obtain participation constraints from minOccurs, maxOccurs, and use constraints

Page 31: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

31

Attribute Transformation

Page 32: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

32

Element Transformation

Page 33: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

33

Choice Transformation

Page 34: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

34

Sequence Transformation

Page 35: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

35

Key Constraints Transformation

Page 36: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

36

Substitution Group & Extension Transformation

Page 37: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

37

Observation on Transformations

These transformations to and from C-XML are not inverses of one another

However,

C-XMLXML Schema

C-XML XML Schema

Page 38: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

38

Demo

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 39: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

39

PROPERTY GUARANTEES

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 40: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

40

Transformation Properties: C-XML to XML Schema

Theorem 1: … preserves information. Proof: injective

Theorem 2: Allowing for pragma constraints, … preserves constraints. Proof: by construction

Theorem 3: … yields an XML-Schema instance whose complying XML documents are redundancy free. Proof: [TKDE, Aug06]

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 41: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

41

Transformation Properties: XML Schema to C-XML

Theorem 4: … preserves information. Proof: injective

Theorem 5: … preserves constraints. Proof: by construction

24 April 2008 UNISCON 2008, Klagenfurt, Austria

Page 42: Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David

4224 April 2008 UNISCON 2008, Klagenfurt, Austria

Conclusions

C-XML models XML conceptually Transformations

C-XML to XML Reverse-engineer XML to C-XML

Properties Information preserving Constraint preserving Redundancy-free guarantee

www.deg.byu.edu