1 on view support for a native xml dbms ting chen, tok wang ling school of computing, national...
TRANSCRIPT
1
On View Support for a Native XML
DBMS
Ting Chen , Tok Wang Ling
School of Computing, National University of Singapore Daofeng Luo, Xiaofeng Meng Information School , Remin University of China
2
Outline View for XML Documents
Two Main Approaches Problems
ORA-SS: Object-Relationship-Attribute Model for Semi-structured Data ORA-SS class diagram and instance diagram ORA-SS for view schema definition
Element-Based Clustering (EBC) Basic Approach ORA-SS and EBC
XML View Transformation Problem Definition Algorithm
Conclusion
3
View for XML Documents
Two main approaches to define views Define views in script languages like XQuery or XSLT
General but demanding from users’ point of view because XQuery and XSLT scripts are complex
Difficult to optimize from performance view point Define views by Schema-Mapping
E.g. Clio[7] and eXeclon[3] A declarative approach; alleviate users from writing complex scripts
to perform view transformation Schema mappings can then be translated into XQuery (or XSLT)
scripts We focus on the problem of view transformation through
schema mapping
4
View for XML Documents
View for XML Documents via Schema-Mapping Problem: Current XML schema formats are not
able to express views with semantic constraints, resulting in ambiguity
E.g. (Next Slide) The source XML file contains information about researchers working under different projects and the publication list for each researcher.
5
View for XML Documents
The view schema in Fig (c) of the above diagram is such an example. It has at least two possible meanings which can lead to different view results!1. For each project, list all the papers published by project members; for each paper of the project, list all the authors of the paper.2. For each project, list all the papers published by project members; for each paper of the project, list all the authors of the paper who work for the project.
6
ORA-SS Data Model
ORA-SS[2] Object Class Relationship Type Attribute( Object attribute or Relationship attribute) E.g. An ORA-SS Instance Diagram
* Compared with the XML document in Slide 5, two extra fields (attribute Date and sub-element Position) are added in the above ORASS instance diagram
7
ORA-SS Data Model
ORA-SS Schema Diagram
•There are two binary relationship types in the schema: Project-Researcher(JR) and Researcher-Paper(RP). The set of papers under a researcher doesn’t depend on the project he/she works in.•Position is an attribute of relationship type JR instead of Researcher. This means that a researcher may hold different positions across projects he works in. •Date is a single-valued attribute of object class Paper. Different occurrences of the same paper will always have the same Date value. •J_Name,R_Name and P_ID are identifiers of object classes Project , Researcher and Paper respectively as indicated by solid circles. Identifier values are used to tell if two object occurrences are identical.
8
ORA-SS Data Model ORA-SS for View Schema Definition
It is able to define views with different semantics View (a) has two binary relationship types. The intention of the view
schema is to find all the papers published by researchers in a project; and for each paper to find all of its authors.
View (b) has only one ternary relationship type. The view is defined to find all the papers published by researchers in a project; however, for each paper View (b) only finds those authors working for the project.
View (a) View (b)
9
ORA-SS Data Model ORA-SS for View Schema Definition
The two view schemas in Slide 8 correspond to two different XSLT scripts. The following is the XSLT script for schema (a):
<root> <xsl:for-each-group select="root/Project" group-by="@J_Name"> <Project> <J_Name><xsl:value-of select="@J_Name"/></J_Name> <xsl:for-each-group select="current-group()/Researcher/Paper"
group-by="@P_Name"> <Paper> <xsl:variable name="vPName" select="@P_Name"/> <P_Name><xsl:value-of select="@P_Name"/></P_Name> <xsl:for-each-group select="/root/Project/Researcher[Paper/@P_Name =$vPName]" group-by="@R_Name">
<Researcher> <R_Name><xsl:value-of select="@R_Name"/></R_Name> </Researcher></xsl:for-each-group>
</Paper> </xsl:for-each-group> </Project> </xsl:for-each-group></root>
10
ORA-SS Data Model The following is the XSLT script for schema (b):
The main difference of two scripts lies in the third xsl:for-each-group directive for Researcher. Script for Schema (a) needs to search the whole document to find the complete author list of a paper because authors may not work for the same project. On the other hand, script for Schema (b) avoids the global search because it only needs find authors of the paper working for the same project.
<root> <xsl:for-each-group select="root/Project" group-by="@J_Name"> <Project> <J_Name><xsl:value-of select="@J_Name"/></J_Name> <xsl:for-each-group select="current-group()/Researcher/Paper"
group-by="@P_Name"> <Paper> <xsl:variable name="vPName" select="@P_Name"/> <P_Name><xsl:value-of select="@P_Name"/></P_Name>
<xsl:for-each-group select="current-group()/.." group-by="@R_Name"> <Researcher> <R_Name><xsl:value-of select="@R_Name"/></R_Name></Researcher></xsl:for-each-group>
</Paper> </xsl:for-each-group> </Project> </xsl:for-each-group></root>
11
Element-Based-Clustering (EBC) Element Based Clustering[5]
Extension of Element Based (EB[6]) Strategy Element nodes (records) with the same tag name are
clustered and organized as a list
A
A
B C
A
B
A
A
B C
A
B
12
Element-Based-Clustering (EBC)
EBC: Node labeling EBC gives labels for nodes in a XML document Labels for nodes in a XML data tree can be calculated
in the following manner:
1. The root element has label nil
2. Perform a pre-order traversal (i.e. Document order) on the XML document
For node x: (“+” here means string concatenation)label(x) = label(x.parent) + “.” + position of x in x.parent’s childList
Node A is ancestor of node B if Label(A) is the prefix of node Label(B) and vice versa
13
ORA-SS and EBC
Project j1(1) j2(2)
Researcher r1(1.1) r2(1.2) r2(2.1) r3(2.2)
Paper p1,05/2002(1.1.1) p1,05/2002(1.2.1) p2,03/2000(1.2.2)
•How does ORA-SS schema help tune XML document storage?
p1,05/2002(2.1.1) p1,03/2000(2.1.2) p2,05/2002(2.2.1)
Position Leader(1.2.3) Staff(2.1.3) Leader(2.2.2)
14
ORA-SS and EBC
1. Object identifier of an object will be stored together with the object. Objects of the same class form a cluster. Each cluster is a sequential file.
2. Relationship attribute values will be stored in separate cluster. 3. For object attribute values, we need some heuristics. It is attempting to store object attributes together with the object
since they are likely to be accessed at the same time. However, if an object class has too many objects attributes, to store all attribute values together with the object brings us to a situation similar to Subtree-Based storage strategy. One solution is to store only those essential (this can be determined by users) object attributes with the object and leave other to separated clusters.
4. Node (which can be object, relationship value and object attribute value) labels will be stored together with nodes.
15
Problem Definition
Given an XML document D1, a source schema V1 and a valid view schema V2 of V1, transform D1 to document D2, so that D2 is a valid document under V2.
View Transformation: Problem Definition
Source Document View Document
View SchemaSource Schema
User Defined Schema Mapping
View Transformation
16
View Transformation
Researcher
Proj
Paper
View Schema (Ambiguous)
Researcher
Proj
Paper
PR;2
JP;2
Researcher
Proj
Paper
JPR;3
Slide 8(a)
Slide 8(b)
•View schema defined in DTD can have different interpretations which result in different views
?
17
View Transformation View schema expressed in ORA-SS is unambiguous
The relationship set of an ORA-SS view schema clearly defines how view document should be constructed
Two basic techniques are used in construction of a single relationship in view schema Structural join: SJ (based on object labels)
For each relationship R in view schema, we first use structural join to find the set of paths of type R such that the object occurrences in each path locate on the same path in source document.
Value join : Merge (based on logical object identifiers)
In XML document an object can have many occurrences. Two occurrences are considered as the same if they have identical object identifier. The set of paths resulted from structural join is then value joined (or merged) using logical object keys.
18
View Transformation
What makes things complicated? A path in view schema can have more than one relationship and we need to
join two relationship together. E.g. View schema in Slide 8(a) contains two relationships.
Two relationships are “joined” on their overlapping object classes. So the results constructed for one relationship may be used in construction of another.
Value join (merge) based on logical keys destroys the “sorted-ness” of the output path list of structural join. The consequence is that the output of value join can’t be used in subsequent structural join with paths from other relationships efficiently. (Structural join requires sorted input lists)
Solution: Duplicated-Preserving Merge (D-Merge). It keeps the structural join output path list intact. For two occurrences of the same object in the list, their child contents will be merged and then each of them will have its own copy of the merged content. D-Merge is used only when a relationship needs to join with another relationship.
19
View Transformation: Algorithm The relationship set of a view schema determines the
view transformation process. Take the two view schemas in Slide 8 as an example:
View(a):
1. L := SJ(list(R), list(P), {P,R})
2. L := D-Merge(L, {P})
3. L := SJ(L, list(J), {J,P});
4. L := Merge(L,{J});
View(b):
1. L := SJ(list(J), list(P), {J,P})
2. L := SJ(L, list(R), {J,P,R});
3. L := Merge(L,{J});
20
View Transformation: Algorithm Structural Join
Based on Object Label Binary Structural Join[1]
Input
Two sorted (on node numbers) node lists: AList of potential ancestor nodes and DList of potential descendants nodes
Output
OutputList = [(ai; dj)] of join results, in which ai is the parent/ancestor of dj and ai is from AList and dj is from DList
EBC schemes stores elements with the same tag name in pre-ordered( i.e. sorted on element number) way; structural join can be naturally applied
21
View Transformation: Algorithm Structural Join:
Based on Object Label Complex Structural Join: (Example: structural join of three sorted
input lists) Input Sorted node lists: A,B,C
Output OutputList = [(ai; bj ; ck)] of join results, in which ai, bj , ck (from List A,B,C
respectively) are located on the same path in source document Two binary joins:
Step 1: Join A and B OutputList AB: [(ai; bj )] sorted on ai
Step 2: Join AB (using ai as node label) and C OutputList = [(ai; bj ; ck)] sorted on ai Important: ck should be on the same path as both ai AND bj
22
View Transformation: Algorithm A motivating example:
Paper
Proj
Researcher
RP;2
JR;2
Researcher
Proj
Paper
PR;2
JP;2
Source Schema View Schema
Source Document:
23
View Transformation: Algorithm Data Structures
Record Object label Object key (object identifier) ChildList: an array of children record references
Tuple A tuple consists of an array of records A tuple has a root record Example
Array of Tuples
Record:r1
(ChildList:
<1,2>), ROOT
Record:r2
(ChildList:
<3>)
Record:r3
(ChildList:
nil)
Record: r4
(ChildList:
nil)
Tuple Array Index: 0 1 2 3
r1
r2
r4
r3
Corresponding Tree:
24
View Transformation: Algorithm Operations
Structural Join (SJ) Input
L1: Array of Tuples of type <A1,A2,A3,…, An> L2: Array of Tuples of type <B1,B2,B3,…, Bn> Mask M: Bit Array, which specifies the object classes which participate in structural join
Output L3: Array of Tuples of type <A1,A2,A3,…, An,B1,B2,B3,…, Bn>
For each tuple in L3, its objects whose types are specified in M MUST locate on the same path in the source document
Merge (Merge) Input
L1: Array of Tuples of type < A1,A2,A3,…, An> Mask M: Bit Array, which specifies a list of object classes
Output L: Array of Tuples
For any two tuples t1 and t2 in L1 which have the same set of objects whose types are specified in M, the two tuples will be merged.
Duplicate-Preserving Merge (D-Merge) Input
L1: Array of Tuples of type < A1,A2,A3,…, An> Mask M: Bit Array, which specifies a list of object classes
Output L: Array of Tuples
For any two tuples t1 and t2 in L1 which have the same set of objects whose types are specified in M, the contents of the two tuples will be merged. t1 and t2 will then both have the merged content.
Note: Structural Join operation uses object number while Merge and D-Merge use object identifer.
25
View Transformation: Algorithm Definition 1:Relationship R1 is lower than R2, or R1 < R2, if the top-most participating object class of R1
is a descendent of the top-most participating object class of R2. Definition 2: A relationship is independent if its set of participating object classes is not included in any
other relationship. Otherwise the relationship is nested. Algorithm (Main steps):
Step 0. Initialize L1 := null, L2 := null, M := { }, //M contains the set of object classes that has been structurally joined L[O] : = null for each O in the view schema
Step 1. Put the independent relationships of the ORA-SS view schema into a partially ordered set (S, < )Step 2: While (S is not empty){
Extract the first relationship R from S;If (M∩R is not null) L1 := L[O] for the highest O(O has the smallest level in the view schema) in M∩R;Else L1 := null; For each object class O in (R – M) sorted decreasingly by their depths in the view schema M := M U {O}; L2 := Cluster[O]; //Cluster[O] is an array storing objects of class O if(L1 != null ) L1 := StructuralJoin(L1,L2,M∩R); else
L1: = L2; L[O] = L1;If R is the highest level relationship L1 := Merge(L1, R.top_most_object_class);Else L1 := D-Merge(L1, R – {object classes of R which doesn’t appear in
any other independent relationship}); }
Step 3: return L[Root]; where Root is the root of the view schema;
26
View Transformation: Algorithm
Example: View Schema Slide 8a Step 1: Construct Relationship Project-Paper
View Schema
Step 1.1. L := SJ (list(P), list(R), {P,R})
Step 1.2. L := D-Merge (L, {P})
Source Doc:
Researcher
Proj
PaperPR;2
JP;2
S = {PR,JP};
27
View Transformation: Algorithm Example: View Schema Slide 8a
Step 2: Construct Relationship Project-Paper
View Schema
Step 2.1. L := SJ(L, list(J), {J,P})
Step 2.2. L := Merge (L, {J} )
Source Doc:
Researcher
Proj
Paper
PR;2
JP;2
S = {JP};
28
Conclusion
We demonstrate how to combine an efficient XML storage scheme (EBC) and an expressive XML data model (ORA-SS) to provide XML view support for a native DBMS system.
ORA-SS, used as view schema definitions, can express a great variety of constraints graphically. More importantly, it can avoid ambiguity, which is a typical problem if by DTD or XML Schema is used as view schema format.
In our transformation method, a relationship type is the basic unit
of transformation. Both object label based structural join and logical key based value join are employed to construct view results. ORASS view schema information can guide correct view transformation.
29
Reference1. Shurug Al-Khalifa, H. V. Jagadish, Nick Kouda, Jignesh M. Patel, Divesh
Srivastava, YuqingWu. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In Proceedings of ICDE, 2002
2. Gillian Dobbie, Wu Xiaoying, Tok Wang Ling, Mong Li Lee: ORA-SS: An Object-Relationship-Attribute Model for Semistructured Data TR21/00, Technical Report, Department of Computer Science, National University of Singapore, December 2000.
3. eXcelon. An General XML Data Manager. http://www.exceloncorp.com/4. H. V. Jagadish, Shurug AL-Khalifa, et al. TIMBER: A Native XML Database.
Technical Report, University of Michigan,April 2002.5. Xiaofeng Meng, Daofeng Luo, Mong Li Lee, Jing An. OrientStore: A Schema
Based Native XML Storage System. In Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003
6. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J.Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, Vol.26(3):54-66,September 1997.
7. Lucian Popa, Mauricio A. Hern´andez ,Yannis Velegrakis , Ren´ee J. Miller, Felix Naumann, Howard Ho. Mapping XML and Relational Schemas with Clio. In ICDE 2002 Demo, 2002