1 querying xml documents. 2 objectives how xml generalizes relational databases the xquery language...
TRANSCRIPT
1
Querying XML Documents
2
Objectives
How XML generalizes relational databases The XQuery language How XML may be supported in databases
3
Only Some Trees are Relations
They have height two The root has an unbounded number of
children All nodes in the second layer (records)
have a fixed number of child nodes (fields)
Trees are ordered, while both rows and columns of tables may be permuted without changing the meaning of the data
4
An XML tree
books
book book
author author author title year
2000Data on the WebS. Abiteboul P. Buneman D. Suciu
author …
A. Moller
5
Relational Tables
bid title year
b1 Data on the Web 2000
b2An Introduction to XML and Web Technologies
2006
bid author
b1 S. Abiteboul
b1 P. Buneman
b1 D. Suciu
b2 A. Moller
b2 M. Schwartzbach
6
Query Usage Scenarios Data-oriented
to query XML data, like using SQL to query relational databases
Document-oriented to retrieve parts of documents to provide dynamic indexes to perform context-sensitive searching to generate new documents as combinations of existing docu
ments Programming
to automatically generate documentation, similar to the Javedoc tool
Hybrid to retrieve information from hybrid data, such as patient recor
ds
7
XQuery Design Requirements XML syntax and human-readable Declarative Namespace aware Coordinate with XML Schema Support simple and complex data types Can combine multiple documents Can transform and create XML trees
8
What is XQuery? XQuery is the language for querying XML data XQuery 1.0 is a strict superset of XPath 2.0
XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and operators
XQuery has the extra expressive power to join information from different sources and generate new XML fragments
XQuery and XSLT are both domain-specific languages for combining and transforming XML data from multiple sources XQuery is designed from scratch, XSLT is an intellectual desce
ndant of CSS Technically, they may emulate each other
XQuery is defined by the W3C XQuery is supported by all the major database engines (IBM, O
racle, Microsoft, etc.)
9
XQuery Prolog XQuery expressions are evaluated relatively to a c
ontext, which is explicitly provided by a prolog Declarations specify various parameters for the X
Query processor, such as:xquery version "1.0";
declare xmlspace preserve;
declare xmlspace strip;
declare default element namespace URI;
declare default function namespace URI;
import schema at URI;
declare namespace NCName = URI;
10
Implicit Declarations Declarations implicitly defined in any XQuery impl
ementation: declare namespace xml =
"http://www.w3.org/XML/1998/namespace"; declare namespace xs =
"http://www.w3.org/2001/XMLSchema"; declare namespace xsi =
"http://www.w3.org/2001/XMLSchema-instance"; declare namespace fn =
"http://www.w3.org/2005/11/xpath-functions"; declare namespace xdt =
"http://www.w3.org/2005/11/xpath-datatypes"; declare namespace local =
"http://www.w3.org/2005/11/xquery-local-functions";
11
XPath vs. XQuery XPath expressions are also XQuery expressions XPath expressions are required to be evaluated in
a static context, which must be provided by the invoking application
The XQuery prolog gives the required static context
The initial context node, position, and size are undefined because an XQuery expression may work on multiple XML documents
Only axes child, descendant, parent, attribute, self, and descendant-of-self are required to be implemented in XQuery.
12
Datatype Expressions Same atomic values as XPath 2.0 Also lots of primitive simple values:
xs:string("XML is fun")xs:boolean("true")xs:decimal("3.1415")xs:float("6.02214199E23")xs:dateTime("1999-05-31T13:20:00-05:00")xs:time("13:20:00-05:00")xs:date("1999-05-31")xs:gYearMonth("1999-05")xs:gYear("1999")xs:hexBinary("48656c6c6f0a")xs:base64Binary("SGVsbG8K")xs:anyURI("http://www.brics.dk/ixwt/")xs:QName("rcp:recipe")
13
XML Expressions
XQuery expressions may compute new XML nodes
Expressions may denote element, character data, comment, and processing instruction nodes
Each node is created with a unique node identity
Constructors may be either direct or computed
14
Direct Constructors
Uses the standard XML syntax The expression
<foo><bar/>baz</foo>evaluates to the given XML fragment
Note that nodes are created with unique identity and therefore <foo/> is <foo/> evaluates to false (operators is, <<, >> are used to compare nodes on identity and document order)
15
Same Namespace Declarations (1/3)
declare default element namespace "http://businesscard.org";
<card>
<name>John Doe</name>
<title>CEO, Widget Inc.</title>
<email>[email protected]</email>
<phone>(202) 555-1414</phone>
<logo uri="widget.gif"/>
</card>
16
Same Namespace Declarations (2/3)
declare namespace b = "http://businesscard.org";
<b:card>
<b:name>John Doe</b:name>
<b:title>CEO, Widget Inc.</b:title>
<b:email>[email protected]</b:email>
<b:phone>(202) 555-1414</b:phone>
<b:logo uri="widget.gif"/>
</b:card>
17
Same Namespace Declarations (3/3)
<card xmlns="http://businesscard.org">
<name>John Doe</name>
<title>CEO, Widget Inc.</title>
<email>[email protected]</email>
<phone>(202) 555-1414</phone>
<logo uri="widget.gif"/>
</card>
18
Enclosed Expressions Expressions create an element named numbers w
hich has a single character data node with value 1 2 3 4 5 (if boundary-space is set to strip)< numbers >1 2 3 4 5</ numbers >< numbers >{1, 2, 3, 4, 5}</ numbers >< numbers >{1, "2", 3, 4, 5}</ numbers >< numbers >{1 to 5}</ numbers >< numbers >1 {1+1} {" "} {"3"} {" "} {4 to 5}</ numbers >
Enclosed expressions are allowed inside attribute values<foo bar="1 2 3 4 5"/><foo bar="{1, 2, 3, 4, 5}"/><foo bar="1 {2 to 4} 5"/>
19
Explicit Constructors The constant expression
<card xmlns="http://businesscard.org"><name>John Doe</name><title>CEO, Widget Inc.</title><email>[email protected]</email><phone>(202) 555-1414</phone><logo uri="widget.gif"/>
</card> Can be written as:
element card {namespace { "http://businesscard.org" },element name { text { "John Doe" } },element title { text { "CEO, Widget Inc." } } ,element email { text { "[email protected]" } },element phone { text { "(202) 555-1414" } },element logo { attribute uri { "widget.gif" } }
}
20
Computed QNames Qualified names can be replaced by enclosed expressions eva
luating to equivalent strings:element { "card" } {
namespace { "http://businesscard.org" },element { "name" } { text { "John Doe" } },element { "title" } { text { "CEO, Widget Inc." } },element { "email" } { text { "[email protected]" } },element { "phone" } { text { "(202) 555-1414" } },element { "logo" } {
attribute { "uri" } { "widget.gif" }}
}
21
Biliingual Business Cards Controlled by a global variable $lang:
element { if ($lang="Danish") then "kort" else "card" } {namespace { "http://businesscard.org" },element { if ($lang="Danish") then "navn" else "name" }
{ text { "John Doe" } },element { if ($lang="Danish") then "titel" else "title" }
{ text { "CEO, Widget Inc." } },element { "email" }
{ text { "[email protected]" } },element { if ($lang="Danish") then "telefon" else "phone"}
{ text { "(202) 456-1414" } },element { "logo" } {
attribute { "uri" } { "widget.gif" }}
}
22
FLWOR Expressions Used for general queries:
<doubles>{ for $s in fn:doc("students.xml")//student let $m := $s/major where fn:count($m) ge 2 order by $s/@id return <double>
{ $s/name/text() } </double>
}</doubles>
23
The Difference Between For and Let (1/4)
A FLWOR expressionfor $x in (1, 2, 3, 4)
let $y := ("a", "b", "c")
return ($x, $y) Output
1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c
24
The Difference Between For and Let (2/4)
A FLWOR expressionlet $x in (1, 2, 3, 4)
for $y := ("a", "b", "c")
return ($x, $y) Output
1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c
25
The Difference Between For and Let (3/4)
A FLWOR expressionfor $x in (1, 2, 3, 4)
for $y in ("a", "b", "c")
return ($x, $y) Output
1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c, 4, a, 4, b, 4, c
26
The Difference Between For and Let (4/4)
A FLWOR expressionlet $x := (1, 2, 3, 4)
let $y := ("a", "b", "c")
return ($x, $y) Output
1, 2, 3, 4, a, b, c
27
An XML Example Document: books.xml<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book category="COOKING">
<title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price>
</book><book category="CHILDREN">
<title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price>
</book>
28
An XML Example Document: books.xml (cont.)<book category="WEB">
<title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price>
</book><book category="WEB">
<title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price>
</book></bookstore>
29
Path Expressions The following path expression is used to select all
the title elements in the "books.xml" file:doc("books.xml")/bookstore/book/title
The XQuery above will extract the following: <title lang="en">Everyday Italian</title>
<title lang="en">Harry Potter</title>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
30
Predicates The following predicate is used to select all the bo
ok elements under the bookstore element that have a price element with a value that is less than 30: doc("books.xml")/bookstore/book[price<30]
The XQuery above will extract the following: <book category="CHILDREN">
<title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price>
</book>
31
FLOWR Expressions
The following FLWOR expression will select all the title elements under the book elements that are under the bookstore element that have a price element with a value that is higher than 30 for $x in doc("books.xml")/bookstore/book
where $x/price>30
return $x/title The result will be:
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
32
Conditional Expressions Notes on the "if-then-else" syntax: parentheses around the
if expression are required. else is required, but it can be just else ().for $x in doc("books.xml")/bookstore/book return if ($x/@category="CHILDREN")
then <child>{data($x/title)}</child> else <adult>{data($x/title)}</adult>
The result of the example above will be:<adult>Everyday Italian</adult> <child>Harry Potter</child> <adult>Learning XML</adult> <adult>XQuery Kick Start</adult>
33
Comparisons General comparisons: =, !=, <, <=, >, >= Value comparisons: eq, ne, lt, le, gt, ge
$bookstore//book/@q > 10 The expression above returns true if any q attribu
tes have values greater than 10.$bookstore//book/@q gt 10
The expression above returns true if there is only one q attribute returned by the expression, and its value is greater than 10. If more than one q is returned, an error occurs.
34
Selecting and Filtering Elements The at keyword can be used to count the iteration:
for $x at $i in doc("books.xml")/bookstore/book/title return <book>{$i}. {data($x)}</book>
Result: <book>1. Everyday Italian</book> <book>2. Harry Potter</book> <book>3. XQuery Kick Start</book> <book>4. Learning XML</book>
It is also allowed with more than one in expression in the for clause. Use comma to separate each in expression: for $x in (10,20), $y in (100,200) return <test>x={$x} and y={$y}</test>
Result:<test>x=10 and y=100</test> <test>x=10 and y=200</test> <test>x=20 and y=100</test> <test>x=20 and y=200</test>
35
Computing Joins fridge.xml:
<fridge><stuff>eggs</stuff><stuff>olive oil</stuff><stuff>ketchup</stuff><stuff>unrecognizable moldy thing</stuff>
</fridge> Find recipes that use some ingredients in the refrigerator
declare namespace rcp = "http://www.brics.dk/ixwt/recipes";for $r in fn:doc("recipes.xml")//rcp:recipefor $i in $r//rcp:ingredient/@namefor $s in fn:doc("fridge.xml")//stuff[text()=$i]return fn:distinct-values($r/rcp:title/text() )
36
Inverting a Relationdeclare namespace rcp = "http://www.brics.dk/ixwt/recipes";<ingredients>
{ for $i in distinct-values( fn:doc("recipes.xml")//rcp:ingredient/@name
) return <ingredient name="{$i}">
{ for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] return <title>$r/rcp:title/text()</title>}</ingredient>
}</ingredients>
37
Sorting the Resultsdeclare namespace rcp = "http://www.brics.dk/ixwt/recipes";<ingredients>
{ for $i in distinct-values(fn:doc("recipes.xml")//rcp:ingredient/@name) order by $i return <ingredient name="{$i}">
{ for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] order by $r/rcp:title/text() return <title>$r/rcp:title/text()</title>}</ingredient>
}</ingredients>
38
A More Complicated Sorting
for $s in document("students.xml")//student
order by
fn:count(
$s/results/result[fn:contains(@grade,"A")]
) descending,
fn:count($s/major) descending,
xs:integer($s/age/text()) ascending
return $s/name/text()
39
Using Functionsdeclare function local:grade($g) {
if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0
};
declare function local:gpa($s) {fn:avg(for $g in $s/results/result/@grade return local:grade($g))
};
<gpas>{ for $s in fn:doc("students.xml")//student return <gpa id="{$s/@id}" gpa="{local:gpa($s)}"/> }
</gpas>
40
Extend the Expressive Power Using recursive function to generate an XML tree of a g
iven height:declare function gen($n) {
if ($n eq 0) <bar/>else <foo>{ gen($n - 1), gen($n - 1) } </foo>
}; Compute the height of an XML tree:
declare function local:height($x) {if (fn:empty($x/*)) then 1else fn:max(for $y in $x/* return local:height($y))+1
};
41
A Textual OutlineCailles en Sarcophages
pastrychilled unsalted butterfloursaltice water
fillingbaked chicken
marinated chickensmall chickens, cut upHerbes de Provencedry white wineorange juiceminced garlictruffle oil
...
42
Computing Textual Outlinesdeclare namespace rcp = "http://www.brics.dk/ixwt/recipes";declare function local:ingredients($i,$p) {
fn:string-join(for $j in $i/rcp:ingredientreturn fn:string-join(($p,$j/@name,"
",
local:ingredients($j,fn:concat($p," "))),""),"")};
declare function local:recipes($r) {fn:concat($r/rcp:title/text(),"
",local:ingredients($r," "))
};
fn:string-join(for $r in fn:doc("recipes.xml")//rcp:recipe[5]return local:recipes($r),""
)
43
XML Databases
How can XML and databases be merged? Several different approaches:
extract XML views of relations use SQL to generate XML shred XML into relational databases
44
Automatic XML Views (1/2)
<Students><record id="100026" name="Joe Average" age="21"/><record id="100078" name="Jack Doe" age="18"/>
</Students>
xmlelement(name, "Students",select xmlelement(name,
"record",xmlattributes(s.id, s.name, s.age))
from Students)
45
Automatic XML Views (2/2)<Students>
<record><id>100026</id><name>Joe
Average</name><age>21</age>
</record><record>
<id>100078</id><name>Jack Doe</name><age>18</age>
</record></Students>
xmlelement(name, "Students",
select xmlelement(name,
"record",
xmlforest(s.id, s.name, s.age))
from Students
)
46
Storing XML Documents
Designing a specialized system for storing native XML data
Using a DBMS to store the whole XML documents as text fields
Using a DBMS to store the document contents as data elements It must support the XML’s ordered data
model
47
Using a DBMS: Relational DTD
Schema-aware An element that can occur at most once in its
parent is stored as a column of the table representing its parent
ParentID ID TEXT
ParentID ID title year 3 5 “S. Abiteboul”
2 3 “Data on The Web” “2000” 3 6 “P. Buneman”
2 4 … … 3 7 “D. Suciu”
The book table The author table
references
book book
author author author title year
2000Data on the Web
S. Abiteboul
P. Buneman
D. Suciu
author…
…
48
Using a DBMS: Edge Schema-less A single table is used to store the entire document Each node is assigned an ID in depth first order
references
book book
author author author title year
2000Data on the Web
S. Abiteboul
P. Buneman
D. Suciu
author …
…
1
2
3 4
root
49
Using a DBMS: Edge (cont.)
SourceID tag ordinal TargetID Data
1 reference 1 2 NULL
2 book 1 3 NULL
2 book 2 4 NULL
3 author 1 0 “S. Abiteboul”
3 author 2 0 “P. Buneman”
3 author 3 0 “D. Suciu”
3 title 4 0 “Data on The Web”
3 year 5 0 “2000”
The edge table