foundational data modeling and schema transformations for xml data engineering stephen w. liddle...
TRANSCRIPT
Foundational Data Modeling and Schema Transformations for XML Data Engineering
Stephen W. LiddleInformation Systems Department
Reema Al-Kamha & David W. EmbleyComputer Science Department
Brigham Young University, Provo, Utah
224 April 2008 UNISCON 2008, Klagenfurt, Austria
XML Data Engineering
Model XML conceptually Map conceptual models to XML Reverse-engineer XML to conceptual models Ensure properties
Information preserving transformations Constraint preserving transformations Redundancy-free guarantees
3
C-XML
24 April 2008 UNISCON 2008, Klagenfurt, Austria
424 April 2008 UNISCON 2008, Klagenfurt, Austria
Modeling XML Conceptually
Scaling the mountain of abstraction Delicate balance
Enough modeling constructs But not to many
High-level capture of essentials Avoidance of low-level implementation details
Formal but easily understood XML needs better abstractions
524 April 2008 UNISCON 2008, Klagenfurt, Austria
XML Schema/Model Mismatch
XML features not explicitly supported in traditional conceptual models: Ordered lists of concepts Choice of concept from among several Mixed content Use of content from another model Nested information hierarchies
C-XML
624 April 2008 UNISCON 2008, Klagenfurt, Austria
Missing Modeling Constructs (1)
Sequence structure Parent concept Ordered child concepts Constrained recurrence of children Constrained recurrence of sequence itself
<xs:sequence minOccurs="1" maxOccurs="2"> <xs:element name="FirstName" type="xs:string"/> <xs:element name="MiddleName" type="xs:string“ minOccurs="0" maxOccurs="2"/> <xs:element name="LastName" type="xs:string"/></xs:sequence>
7
Missing Modeling Constructs (1)
24 April 2008 UNISCON 2008, Klagenfurt, Austria
824 April 2008 UNISCON 2008, Klagenfurt, Austria
Missing Modeling Constructs (2)
Choice structure Parent concept Choose one child concept from several
alternatives Constrained recurrence of chosen child Constrained recurrence of choice itself
<xs:choice maxOccurs="2"> <xs:element name="PhoneNumber" type="xs:string" minOccurs="1" maxOccurs="2" /> <xs:element name="Email" type="xs:string"/> <xs:element name="Fax" type="xs:string"/></xs:choice>
924 April 2008 UNISCON 2008, Klagenfurt, Austria
Missing Modeling Constructs (3)
Mixed attribute Allows character and element data to be
intertwined <xs:complexType mixed="true">
Any and anyAttribute structures Insert structures from other namespaces Constrained recurrence <xs:any namespace="##other" minOccurs="0"/> <xs:anyAttribute namespace="##any"/>
1024 April 2008 UNISCON 2008, Klagenfurt, Austria
Missing Modeling Constructs (4)
Nesting of hierarchical structures Key organizational characteristic of XML Arbitrarily complex nesting possible
11
C-XML Example
24 April 2008 UNISCON 2008, Klagenfurt, Austria
12
C-XML TO XML SCHEMA
24 April 2008 UNISCON 2008, Klagenfurt, Austria
13
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="Root"> <xs:complexType> <xs:all> <xs:element ref="Students"/> <xs:element ref="Courses"/> <xs:element ref="GradStudents"/> <xs:element ref="UndergradStudents"/> </xs:all> </xs:complexType> <xs:keyref name="UndergradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./UndergradStudents/UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:keyref> <xs:keyref name="GradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./GradStudents/GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:keyref> </xs:element> <xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:choice minOccurs="1" maxOccurs="1"> <xs:element name="StudentName" type="xs:string"/> <xs:sequence> <xs:element name="FirstName" type="xs:string"/> <xs:element name="MiddleNames"> <xs:complexType> <xs:sequence> <xs:element name="MiddleName" minOccurs="0" maxOccurs="2"> <xs:complexType> <xs:attribute name="MiddleName" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="MiddleName-Key"> <xs:selector xpath="./MiddleName"/> <xs:field xpath="@MiddleName"/> </xs:key> </xs:element> <xs:element name="LastName" type="xs:string"/> </xs:sequence> </xs:choice> <xs:element name="Semester-Course-Grades"> <xs:complexType> <xs:sequence> <xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Semester" use="required"/> <xs:attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=>exists [0:*] <x1, x2, x3> (Course(x) Student(x1) Semester(x2) Grade(x3) )) --> <xs:attribute name="Grade" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="Semester-Course-Grade-Key"> <xs:selector xpath="./Semester-Course-Grade"/> <xs:field xpath="@Semester"/> <xs:field xpath="@Course"/> <xs:field xpath="@Grade"/> </xs:key> </xs:element> </xs:sequence> <xs:attribute name="StudentOID" type="xs:string" use="required"/> <xs:attribute name="StudentID" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="StudentOID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentOID"/> </xs:key> <xs:key name="StudentID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentID"/> </xs:key> </xs:element> <xs:element name="Courses"> <xs:complexType> <xs:sequence> <xs:element name="Course" maxOccurs="unbounded"> <xs:complexType> <xs:attribute ref="Course" use="required"/> <xs:attribute name="Department" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="Course-Key"> <xs:selector xpath="./Course"/> <xs:field xpath="@Course"/> </xs:key> </xs:element> <xs:element name="GradStudents"> <xs:complexType> <xs:sequence> <xs:element name="GradStudent" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="GradStudentOID" type="xs:string" use="required"/> <xs:attribute name="Advisor" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="GradStudentOID-Key"> <xs:selector xpath="./GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:key> </xs:element> <xs:element name="UndergradStudents"> <xs:complexType> <xs:sequence> <xs:element name="UndergradStudent" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="UndergradStudentOID" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="UndergradStudentOID-Key"> <xs:selector xpath="./UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:key> </xs:element> <xs:attribute name="Course" type="xs:string"/></xs:schema>
C-XML XML Schema
14
Algorithm Overview
Generate a forest of scheme trees Translate an individual object set Translate scheme-tree collections of
object sets Create a root node Add uniqueness constraints Translate generalization/specialization
hierarchies
15
(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*
Generate Scheme Trees
16
(Course, Department)*
Generate Scheme Trees
17
(GradStudent, Advisor)*(UndergradStudent)*
Generate Scheme Trees
18
(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*
(Course, Department)*
(GradStudent, Advisor)* (UndergradStudent)*
Generate Scheme Trees
19
Student, StudentID, StudentName, FirstName, LastName
MiddleName Course, Semester, Grade
Course, Department GradStudent, Advisor UndergradStudent
(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*
(Course, Department)*
(GradStudent, Advisor)* (UndergradStudent)*
Generate Scheme Trees
20
Individual Object Sets
<xs:attribute name="Department" type="xs:string"/><xs:attribute name="Course" type="xs:string"/><xs:attribute ref="Course"/><xs:element name="FirstName" type="xs:string"/><xs:element name="Student"> <xs:complexType> ... <xs:attribute name="StudentOID" type="xs:string" use="required"/> </xs:complexType></xs:element>
21
Scheme-Tree Translation
Students
Courses GradStudents UndergradStudents
MiddleNames
Course-Semester-GradesMiddleNames
Students
Student
MiddleName
Course GradStudent UndergradStudent
Course-Semester-Grade
22
Scheme-Tree Translation
<xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> ... </complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
<xs:element name="Semester-Course-Grades"> <xs:complexType> <xs:sequence> <xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> ... </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> ...</xs:element>
23
Scheme-Tree Translation
<xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Semester" use="required"/> <xs:attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=> exists [0:*] <x1, x2, x3> (Course(x) Student(x1) Semester(x2) Grade(x3) )) --> <xs:attribute name="Grade" type="xs:string" use="required"/> </xs:complexType></xs:element>
2424
25
Root Element
Students
Courses GradStudents UndergradStudents
<xs:schema > <xs:element name="Root"> <xs:complexType> <xs:all> <xs:element ref="Students"/> <xs:element ref="Courses"/> <xs:element ref="GradStudents"/> <xs:element ref="UndergradStudents"/> </xs:all> </xs:complexType> ... </xs:element> ...</xs:schema>
26
Uniqueness Constraints
<xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> ... </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="StudentOID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentOID"/> </xs:key> <xs:key name="StudentID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentID"/> </xs:key> </xs:element>
27
Generalization/Specialization
<xs:keyref name="UndergradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./UndergradStudents/UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:keyref> <xs:keyref name="GradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./GradStudents/GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:keyref>
28
XML SCHEMA TO C-XML
24 April 2008 UNISCON 2008, Klagenfurt, Austria
29
XML Schema C- XML
30
Algorithm Overview
Generate object sets for each element & attribute Specify built-in and simple types in data frames Obtain relationship sets from parent-child connections Obtain participation constraints from minOccurs, maxOccurs, and use constraints
31
Attribute Transformation
32
Element Transformation
33
Choice Transformation
34
Sequence Transformation
35
Key Constraints Transformation
36
Substitution Group & Extension Transformation
37
Observation on Transformations
These transformations to and from C-XML are not inverses of one another
However,
C-XMLXML Schema
C-XML XML Schema
38
Demo
24 April 2008 UNISCON 2008, Klagenfurt, Austria
39
PROPERTY GUARANTEES
24 April 2008 UNISCON 2008, Klagenfurt, Austria
40
Transformation Properties: C-XML to XML Schema
Theorem 1: … preserves information. Proof: injective
Theorem 2: Allowing for pragma constraints, … preserves constraints. Proof: by construction
Theorem 3: … yields an XML-Schema instance whose complying XML documents are redundancy free. Proof: [TKDE, Aug06]
24 April 2008 UNISCON 2008, Klagenfurt, Austria
41
Transformation Properties: XML Schema to C-XML
Theorem 4: … preserves information. Proof: injective
Theorem 5: … preserves constraints. Proof: by construction
24 April 2008 UNISCON 2008, Klagenfurt, Austria
4224 April 2008 UNISCON 2008, Klagenfurt, Austria
Conclusions
C-XML models XML conceptually Transformations
C-XML to XML Reverse-engineer XML to C-XML
Properties Information preserving Constraint preserving Redundancy-free guarantee
www.deg.byu.edu