management of xml documents in object-relational databases
DESCRIPTION
Management of XML Documents in Object-Relational Databases. Thomas Kudrass Matthias Conrad HTWK Leipzig. EDBT-Workshop XML-Based Data Management Prague, 24 March 2002. Overview. Motivation Object-Relational Database Concepts Parsing XML Documents XML-to-ORDB Mapping Meta-Data - PowerPoint PPT PresentationTRANSCRIPT
Management of XML Documents in Object-Relational Databases
Thomas Kudrass Matthias Conrad
HTWK Leipzig
EDBT-WorkshopXML-Based Data ManagementPrague, 24 March 2002
© T. Kudrass, HTWK Leipzig
Overview
Motivation Object-Relational Database Concepts Parsing XML Documents XML-to-ORDB Mapping Meta-Data Special Issues Conclusions
© T. Kudrass, HTWK Leipzig
Motivation
Storing of XML documents in DBMS Use existing database technology Dealing with complex objects:
– XML documents = complex objects– avoid any decomposition– object-relational database technology good choice
to represent complex objects
© T. Kudrass, HTWK Leipzig
User-Defined Types in ORDB
Complex Data Types– Object Type– Collection Type
Object References Object Views
© T. Kudrass, HTWK Leipzig
Example: Object Types
CREATE TYPE Type_Professor AS OBJECT (PName VARCHAR(80),Subject VARCHAR(120) );
object-valued object table attribute
CREATE TYPE Type_Course AS OBJECT ( CREATE TABLE TabProfessor OF Name VARCHAR(100), Type_Professor;Professor Type_Professor );
© T. Kudrass, HTWK Leipzig
Example: Collection Types
CREATE TYPE Type_Professor AS OBJECT (PName VARCHAR(80),Subject VARCHAR(120) );
Array Nested Table
CREATE TYPE TypeVa_ Professor AS CREATE TYPE Type_TabProfessor AS VARRAY(5) OF Type_Professor; TABLE OF Type_Professor;
CREATE TABLE TabDept ( DName VARCHAR(80),
Professor Type_TabProfessor ) NESTED TABLE Professor STORE AS TabProfessor_List;
© T. Kudrass, HTWK Leipzig
Example: Object References
CREATE TYPE Type_Professor AS OBJECT (PName VARCHAR(80),Dept VARCHAR(120) );
CREATE TABLE TabProfessor OF Type_Professor;
CREATE TYPE Type_Course AS OBJECT (
Name VARCHAR(200),Prof_Ref REF Type_Professor );
CREATE TABLE TabCourse OF Type_Course;
Referenceto objects of object table TabProfessor
© T. Kudrass, HTWK Leipzig
Parsing DTD and XML
XML V2 Parser DTD Parser
XML Document DTD
Schema Definition
Well-FormednessValidity Check
XML2 Oracle
XML DOM Tree DTD DOM Tree
--------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
DBMS Oracle
JDBC / ODBC
Syntax Check
© T. Kudrass, HTWK Leipzig1 <!ELEMENT University (StudyCourse,Student*)>2 <!ELEMENT Student (LName,FName,Course*)>3 <!ATTLIST Student StudNr CDATA #REQUIRED>4 <!ELEMENT Course (Name,Professor*,CreditPts?)>5 <!ELEMENT Professor (PName,Subject+,Dept)>6 <!ENTITY cs “Computer Science“>7 <!ELEMENT LName (#PCDATA)>8 <!ELEMENT FName (#PCDATA)>9 <!ELEMENT Name (#PCDATA)>10 <!ELEMENT CreditPts (#PCDATA)>11 <!ELEMENT PName (#PCDATA)>12 <!ELEMENT Subject (#PCDATA)>13 <!ELEMENT Dept (#PCDATA)>14 <!ELEMENT StudyCourse (#PCDATA)>
© T. Kudrass, HTWK Leipzig
Object–Based–Mapping
DTD Classes Tables <!ELEMENT A (B,C)> CLASS A { CREATE TABLE A (<!ELEMENT C (D)> STRING b; a_pk INTEGER NOT NULL,<!ELEMENT D (#PCDATA)> C c; b VARCHAR(30) NOT NULL);<!ELEMENT B (#PCDATA)> CLASS C { CREATE TABLE C (
STRING d;} c_pk INTEGER NOT NULL, a_fk INTEGER NOT NULL,
d VARCHAR(10) NOT NULL);
• Modification of the Mapping Algorithm [Bourret]
No class definitions
Use objects of the DTD tree
© T. Kudrass, HTWK Leipzig
• Each Complex Element Table
• Each Set-Valued Element Table
• Primary Key in each Table
1 <!ELEMENT University (StudyCourse,Student*)>2 <!ELEMENT Student (LName,FName,Course*)>3 <!ATTLIST Student StudNr CDATA #REQUIRED>4 <!ELEMENT Course (Name,Professor*,CreditPts?)>5 <!ELEMENT Professor (PName,Subject+,Dept)>6 <!ENTITY cs “Computer Science“>7 <!ELEMENT LName (#PCDATA)>8 <!ELEMENT FName (#PCDATA)>9 <!ELEMENT Name (#PCDATA)>10 <!ELEMENT CreditPts (#PCDATA)>11 <!ELEMENT PName (#PCDATA)>12 <!ELEMENT Subject (#PCDATA)>13 <!ELEMENT Dept (#PCDATA)>14 <!ELEMENT StudyCourse (#PCDATA)>
1 <!ELEMENT University (StudyCourse,Student*)> CREATE TABLE TabUniversity (IDUniversity
2 <!ELEMENT Student (LName,FName,Course*)> CREATE TABLE TabStudent ( IDStudent
4 <!ELEMENT Course (Name,Professor*,CreditPts?)> CREATE TABLE TabCourse ( IDCourse
5 <!ELEMENT Professor (PName,Subject+,Dept)> CREATE TABLE TabProfessor ( IDProfessor
CREATE TABLE TabSubject (
IDSubject
Step 1
© T. Kudrass, HTWK Leipzig
Other Elements & Attributes Table Columns
CREATE TABLE TabUniversity (IDUniversity,attrStudyCourse,
CREATE TABLE TabStudent (
IDStudent,attrStudNr,attrLName,attrFName,
CREATE TABLE TblMatrikelNr (
IDMatrikelNr,attrMNummer,
CREATE TABLE TabCourse (IDCourse,attrName,attrCreditPts,
CREATE TABLE TabProfessor (IDProfessor,attrPName,attrDept,
CREATE TABLE TabSubject (IDSubject,attrSubject,
Step 2
1 <!ELEMENT University (StudyCourse,Student*)>2 <!ELEMENT Student (LName,FName,Course*)>3 <!ATTLIST Student StudNr CDATA #REQUIRED>4 <!ELEMENT Course (Name,Professor*,CreditPts?)>5 <!ELEMENT Professor (PName,Subject+,Dept)>6 <!ENTITY cs “Computer Science“>7 <!ELEMENT LName (#PCDATA)>8 <!ELEMENT FName (#PCDATA)>9 <!ELEMENT Name (#PCDATA)>10 <!ELEMENT CreditPts (#PCDATA)>11 <!ELEMENT PName (#PCDATA)>12 <!ELEMENT Subject (#PCDATA)>13 <!ELEMENT Dept (#PCDATA)>14 <!ELEMENT StudyCourse (#PCDATA)>
© T. Kudrass, HTWK Leipzig
Relationships between Elements Foreign Keys
CREATE TABLE TabUniversity (IDUniversity INTEGER NOT NULL,attrStudyCourse VARCHAR(4000) NOT NULL,PRIMARY KEY (IDUniversity));
CREATE TABLE TabStudent (
IDStudent INTEGER NOT NULL,IDUniversity INTEGER NOT NULL,attrStudNr VARCHAR(4000) NOT NULL,attrLName VARCHAR(4000) NOT NULL,attrFName VARCHAR(4000) NOT NULL,PRIMARY KEY (IDStudent),CONSTRAINT conMatrikel FOREIGN KEY (IDUniversity) REFERENCES TabUniversity (IDUniversity));
...
Step 3
© T. Kudrass, HTWK Leipzig
ORDBS Oracle and XML
Basic Idea:– Generate an object-relational schema from the DTD – Natural representation of an XML document by
combining user-defined types Different Mapping Rules:
– Simple elements– Complex elements– Set-valued elements– Complex set-valued elements
© T. Kudrass, HTWK Leipzig
XML Attributes & Simple Elements
Elements of #PCDATA type and XML attributes Attributes of the object type
Domain of Simple Elements:– No type information in the DTD:
numeric vs. alphanumeric? length?
– Restrictions of the DBMS (e.g. VARCHAR [Oracle] 4000 characters)
Mapping of an XML attribute of a simple element Definition of an object type for both attribute and element
© T. Kudrass, HTWK Leipzig
CREATE TABLE TabProfessor OF Type_Professor;
CREATE TYPE Type_Professor AS OBJECT ( attr PAddress VARCHAR(4000), attrPName VARCHAR(4000), attrSubject VARCHAR(4000), attrDept Type_Dept);
CREATE TYPE Type_Dept AS OBJECT ( attrDept VARCHAR(4000), attrDAddress VARCHAR(4000));
<!ELEMENT Professor (PName,Subject,Dept)><!ATTLIST Professor PAddress CDATA #REQUIRED> <!ELEMENT PName (#PCDATA)><!ELEMENT Subject (#PCDATA)><!ELEMENT Dept (#PCDATA)><!ATTLIST Dept DAddress CDATA #REQUIRED>
XML Attributes & Simple Elements
© T. Kudrass, HTWK Leipzig
Complex ElementsNesting of elements by composite DB object types
CREATE TABLE TabUniversity (attrStudyCourse VARCHAR(4000),attrStudent Type_Matrikel );
CREATE TYPE Type_Student AS OBJECT (
attrStudNr VARCHAR(4000),attrLName VARCHAR(4000),attrFName VARCHAR(4000),attrCourse Type_Vorlesung );
CREATE TYPE Type_Course AS OBJECT (
attrName VARCHAR(4000),attrProfessor Type_Professor,attrCreditPts
VARCHAR(4000)); CREATE TYPE Type_Professor AS OBJECT (
attrPName VARCHAR(4000),attrSubject VARCHAR(4000),attrDept VARCHAR(4000));
INSERT INTO TabUniversity VALUES ( ‘Computer Science' ,
Type_Student('23374','Conrad','Matthias', Type_Course(‘Databases II‘, Type_Professor(‘Kudrass‘ ,
‘Database Systems‘', ‘Computer Science‘), '4')));
SELECT u.attrStudent.attrLname FROM TabUniversity uWHERE u.attrStudent.attrCourse.attrProfessor.attrPName
= ‘Kudrass';
© T. Kudrass, HTWK Leipzig
Set-Valued Elements
Multiple Occurrence (in DTD): marked by + or * DBMS Restrictions
– collection type applicable to set-valued elements with text-valued subelements, e.g. ARRAY OF VARCHAR
– collection type not applicable to set-valued elements with complex subelements
subelements may be set-valued again
Solutions– use newer DBMS releases (e.g. Oracle 9i)– model relationships with object references
© T. Kudrass, HTWK Leipzig
Set-Valued Elements
CREATE TYPE Type_Student AS OBJECT ( attrJahrgang VARCHAR(4000) , attrUniversity REF Type_University ); CREATE TABLE TabStudent OF Type_Student; CREATE TYPE Type_University AS OBJECT( attrStudyCourse VARCHAR(4000)); CREATE TABLE TabUniversity OF Type_University;
• Set-valued element Student
• Modeling in object type Type_Student with a reference
to objects of the table TabUniversity
<!ELEMENT University (StudyCourse,Student*)>
Reference toUniversity Objects
© T. Kudrass, HTWK Leipzig
Set-Valued ElementsCREATE TYPE TypeVA_Course AS VARRAY(100) OF Type_Course;CREATE TYPE TypeVA_Professor AS VARRAY(100) OF Type_Professor;CREATE TYPE TypeVA_Subject AS VARRAY(100) OF VARCHAR(4000);
CREATE TABLE TabUniversity (attrStudyCourse VARCHAR(4000),attrStudent Type_Matrikel );
CREATE TYPE Type_Student AS OBJECT (
attrStudNr VARCHAR(4000),attrLName VARCHAR(4000),attrFName VARCHAR(4000),attrCourse Type_Vorlesung );
CREATE TYPE Type_Course AS OBJECT (
attrName VARCHAR(4000),attrProfessor Type_Professor,attrCreditPts VARCHAR(4000));
CREATE TYPE Type_Professor AS OBJECT (
attrPName VARCHAR(4000),attrSubject VARCHAR(4000),attrDept VARCHAR(4000));
© T. Kudrass, HTWK Leipzig
Set-Valued ElementsExample
INSERT INTO TabUniversity VALUES ( ‘Computer Science' , TypeVA_Student (
Type_Student('23374','Conrad','Matthias', TypeVA_Course ( Type_Course(‘Databases II‘, TypeVA_Professor ( Type_Professor(‘Kudrass‘ ,
TypeVA_Subject (‘Database Systems,‘Operating Systems‘),‘Computer Science‘)),‘4‘),
Type_Course(‘CAD Intro‘, TypeVA_Professor ( Type_Professor(‘Jaeger‘ ,
TypeVA_Subject ( ‘CAD‘,‘CAE‘), ‘Computer Science‘)),‘4‘),
...)),Type_Student(‘00011',‘Meier',‘Ralf', … ) … )...);
© T. Kudrass, HTWK Leipzig
Dealing with Null Values
Restrictions with NOT NULL constraints in object-relational DB schema– NOT NULL constraints in table - not in object type! – NOT NULL constraints not applicable to collection
types
Object-valued attributes: – use CHECK constraints for NOT NULL
Loss of DTD semantics DTD in the database
© T. Kudrass, HTWK Leipzig
Dealing with CHECK Constraints<!ELEMENT Course (CName, Address?)><!ELEMENT Addresse (Street, City?)> CREATE TYPE Type_Address AS OBJECT (
attrStreet VARCHAR(4000),attrCity VARCHAR(4000));
CREATE TYPE Type_Course AS OBJECT (
attrName VARCHAR(4000),attrAddress Type_Address);
CREATE TABLE TabCourse OF Type_Course (attrName NOT NULL,CHECK (attrAdresse.attrStrasse IS NOT NULL));
// ORA-02290: Desired error message
1. INSERT INTO TabCourse ( VALUES (‘CAD Intro’,Type_Address (NULL,’Leipzig’); // ORA-02290: Undesired error message
2. INSERT INTO TabCourse ( VALUES ('RN', NULL)
© T. Kudrass, HTWK Leipzig
Meta-Data about XML Documents
Unique DocumentID for each Document Prolog Information Document Location (URL) Name Space Element vs. Attribute
© T. Kudrass, HTWK Leipzig
Naming Conventions for DB Objects
Rules:– TabElementname Table Name– Type_Elementname Object Type Name – TypeVa_Elementname Array Name
No Conflicts with Keywords Introduction of a Schema ID Naming Rule:
SchemaID + Naming Convention + Name
CREATE TYPE DTD01_Type_University CREATE TYPE DTD02_Type_University AS OBJECT ( AS OBJECT ( attrStudyCourse VARCHAR(4000) ); attrRegister VARCHAR(4000) );
© T. Kudrass, HTWK Leipzig
Conclusions: Advantages
Non-atomic domains possible– Natural representation of XML Documents– Nesting of any complexity possible
Simple queries by using dot notation Using object references to represent
relationships (OIDs)
© T. Kudrass, HTWK Leipzig
Conclusions: Drawbacks
Mapping Deficiencies– Possible restrictions of element types in collections– No adequate mapping of NOT NULL constraints
Loss of Information– Prolog, Comments, Processing Instructions, Prolog– Entity References – Attribute vs. Element ?
Schema Evolution– Modification of DTD Modification of DB
Type Information– Target type: VARCHAR - not sufficient!
© T. Kudrass, HTWK Leipzig
Outlook
Graph-based creation of a schema Source: XML Schema Use CLOB datatype Enhance Meta-Schema
– Comments, Processing Instructions and their position in document
– Entity references and their substitution text