1 querying xml documents. 2 objectives how xml generalizes relational databases the xquery language...

49
1 Querying XML Documents

Upload: lawrence-harmon

Post on 27-Dec-2015

242 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

1

Querying XML Documents

Page 2: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

2

Objectives

How XML generalizes relational databases The XQuery language How XML may be supported in databases

Page 3: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

3

Only Some Trees are Relations

They have height two The root has an unbounded number of

children All nodes in the second layer (records)

have a fixed number of child nodes (fields)

Trees are ordered, while both rows and columns of tables may be permuted without changing the meaning of the data

Page 4: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

4

An XML tree

books

book book

author author author title year

2000Data on the WebS. Abiteboul P. Buneman D. Suciu

author …

A. Moller

Page 5: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

5

Relational Tables

bid title year

b1 Data on the Web 2000

b2An Introduction to XML and Web Technologies

2006

bid author

b1 S. Abiteboul

b1 P. Buneman

b1 D. Suciu

b2 A. Moller

b2 M. Schwartzbach

Page 6: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

6

Query Usage Scenarios Data-oriented

to query XML data, like using SQL to query relational databases

Document-oriented to retrieve parts of documents to provide dynamic indexes to perform context-sensitive searching to generate new documents as combinations of existing docu

ments Programming

to automatically generate documentation, similar to the Javedoc tool

Hybrid to retrieve information from hybrid data, such as patient recor

ds

Page 7: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

7

XQuery Design Requirements XML syntax and human-readable Declarative Namespace aware Coordinate with XML Schema Support simple and complex data types Can combine multiple documents Can transform and create XML trees

Page 8: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

8

What is XQuery? XQuery is the language for querying XML data XQuery 1.0 is a strict superset of XPath 2.0

XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and operators

XQuery has the extra expressive power to join information from different sources and generate new XML fragments

XQuery and XSLT are both domain-specific languages for combining and transforming XML data from multiple sources XQuery is designed from scratch, XSLT is an intellectual desce

ndant of CSS Technically, they may emulate each other

XQuery is defined by the W3C XQuery is supported by all the major database engines (IBM, O

racle, Microsoft, etc.)

Page 9: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

9

XQuery Prolog XQuery expressions are evaluated relatively to a c

ontext, which is explicitly provided by a prolog Declarations specify various parameters for the X

Query processor, such as:xquery version "1.0";

declare xmlspace preserve;

declare xmlspace strip;

declare default element namespace URI;

declare default function namespace URI;

import schema at URI;

declare namespace NCName = URI;

Page 10: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

10

Implicit Declarations Declarations implicitly defined in any XQuery impl

ementation: declare namespace xml =

"http://www.w3.org/XML/1998/namespace"; declare namespace xs =

"http://www.w3.org/2001/XMLSchema"; declare namespace xsi =

"http://www.w3.org/2001/XMLSchema-instance"; declare namespace fn =

"http://www.w3.org/2005/11/xpath-functions"; declare namespace xdt =

"http://www.w3.org/2005/11/xpath-datatypes"; declare namespace local =

"http://www.w3.org/2005/11/xquery-local-functions";

Page 11: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

11

XPath vs. XQuery XPath expressions are also XQuery expressions XPath expressions are required to be evaluated in

a static context, which must be provided by the invoking application

The XQuery prolog gives the required static context

The initial context node, position, and size are undefined because an XQuery expression may work on multiple XML documents

Only axes child, descendant, parent, attribute, self, and descendant-of-self are required to be implemented in XQuery.

Page 12: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

12

Datatype Expressions Same atomic values as XPath 2.0 Also lots of primitive simple values:

xs:string("XML is fun")xs:boolean("true")xs:decimal("3.1415")xs:float("6.02214199E23")xs:dateTime("1999-05-31T13:20:00-05:00")xs:time("13:20:00-05:00")xs:date("1999-05-31")xs:gYearMonth("1999-05")xs:gYear("1999")xs:hexBinary("48656c6c6f0a")xs:base64Binary("SGVsbG8K")xs:anyURI("http://www.brics.dk/ixwt/")xs:QName("rcp:recipe")

Page 13: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

13

XML Expressions

XQuery expressions may compute new XML nodes

Expressions may denote element, character data, comment, and processing instruction nodes

Each node is created with a unique node identity

Constructors may be either direct or computed

Page 14: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

14

Direct Constructors

Uses the standard XML syntax The expression

<foo><bar/>baz</foo>evaluates to the given XML fragment

Note that nodes are created with unique identity and therefore <foo/> is <foo/> evaluates to false (operators is, <<, >> are used to compare nodes on identity and document order)

Page 15: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

15

Same Namespace Declarations (1/3)

declare default element namespace "http://businesscard.org";

<card>

<name>John Doe</name>

<title>CEO, Widget Inc.</title>

<email>[email protected]</email>

<phone>(202) 555-1414</phone>

<logo uri="widget.gif"/>

</card>

Page 16: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

16

Same Namespace Declarations (2/3)

declare namespace b = "http://businesscard.org";

<b:card>

<b:name>John Doe</b:name>

<b:title>CEO, Widget Inc.</b:title>

<b:email>[email protected]</b:email>

<b:phone>(202) 555-1414</b:phone>

<b:logo uri="widget.gif"/>

</b:card>

Page 17: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

17

Same Namespace Declarations (3/3)

<card xmlns="http://businesscard.org">

<name>John Doe</name>

<title>CEO, Widget Inc.</title>

<email>[email protected]</email>

<phone>(202) 555-1414</phone>

<logo uri="widget.gif"/>

</card>

Page 18: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

18

Enclosed Expressions Expressions create an element named numbers w

hich has a single character data node with value 1 2 3 4 5 (if boundary-space is set to strip)< numbers >1 2 3 4 5</ numbers >< numbers >{1, 2, 3, 4, 5}</ numbers >< numbers >{1, "2", 3, 4, 5}</ numbers >< numbers >{1 to 5}</ numbers >< numbers >1 {1+1} {" "} {"3"} {" "} {4 to 5}</ numbers >

Enclosed expressions are allowed inside attribute values<foo bar="1 2 3 4 5"/><foo bar="{1, 2, 3, 4, 5}"/><foo bar="1 {2 to 4} 5"/>

Page 19: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

19

Explicit Constructors The constant expression

<card xmlns="http://businesscard.org"><name>John Doe</name><title>CEO, Widget Inc.</title><email>[email protected]</email><phone>(202) 555-1414</phone><logo uri="widget.gif"/>

</card> Can be written as:

element card {namespace { "http://businesscard.org" },element name { text { "John Doe" } },element title { text { "CEO, Widget Inc." } } ,element email { text { "[email protected]" } },element phone { text { "(202) 555-1414" } },element logo { attribute uri { "widget.gif" } }

}

Page 20: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

20

Computed QNames Qualified names can be replaced by enclosed expressions eva

luating to equivalent strings:element { "card" } {

namespace { "http://businesscard.org" },element { "name" } { text { "John Doe" } },element { "title" } { text { "CEO, Widget Inc." } },element { "email" } { text { "[email protected]" } },element { "phone" } { text { "(202) 555-1414" } },element { "logo" } {

attribute { "uri" } { "widget.gif" }}

}

Page 21: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

21

Biliingual Business Cards Controlled by a global variable $lang:

element { if ($lang="Danish") then "kort" else "card" } {namespace { "http://businesscard.org" },element { if ($lang="Danish") then "navn" else "name" }

{ text { "John Doe" } },element { if ($lang="Danish") then "titel" else "title" }

{ text { "CEO, Widget Inc." } },element { "email" }

{ text { "[email protected]" } },element { if ($lang="Danish") then "telefon" else "phone"}

{ text { "(202) 456-1414" } },element { "logo" } {

attribute { "uri" } { "widget.gif" }}

}

Page 22: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

22

FLWOR Expressions Used for general queries:

<doubles>{ for $s in fn:doc("students.xml")//student let $m := $s/major where fn:count($m) ge 2 order by $s/@id return <double>

{ $s/name/text() } </double>

}</doubles>

Page 23: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

23

The Difference Between For and Let (1/4)

A FLWOR expressionfor $x in (1, 2, 3, 4)

let $y := ("a", "b", "c")

return ($x, $y) Output

1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c

Page 24: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

24

The Difference Between For and Let (2/4)

A FLWOR expressionlet $x in (1, 2, 3, 4)

for $y := ("a", "b", "c")

return ($x, $y) Output

1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c

Page 25: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

25

The Difference Between For and Let (3/4)

A FLWOR expressionfor $x in (1, 2, 3, 4)

for $y in ("a", "b", "c")

return ($x, $y) Output

1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c, 4, a, 4, b, 4, c

Page 26: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

26

The Difference Between For and Let (4/4)

A FLWOR expressionlet $x := (1, 2, 3, 4)

let $y := ("a", "b", "c")

return ($x, $y) Output

1, 2, 3, 4, a, b, c

Page 27: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

27

An XML Example Document: books.xml<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book category="COOKING">

<title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price>

</book><book category="CHILDREN">

<title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price>

</book>

Page 28: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

28

An XML Example Document: books.xml (cont.)<book category="WEB">

<title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price>

</book><book category="WEB">

<title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price>

</book></bookstore>

Page 29: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

29

Path Expressions The following path expression is used to select all

the title elements in the "books.xml" file:doc("books.xml")/bookstore/book/title

The XQuery above will extract the following: <title lang="en">Everyday Italian</title>

<title lang="en">Harry Potter</title>

<title lang="en">XQuery Kick Start</title>

<title lang="en">Learning XML</title>

Page 30: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

30

Predicates The following predicate is used to select all the bo

ok elements under the bookstore element that have a price element with a value that is less than 30: doc("books.xml")/bookstore/book[price<30]

The XQuery above will extract the following: <book category="CHILDREN">

<title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price>

</book>

Page 31: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

31

FLOWR Expressions

The following FLWOR expression will select all the title elements under the book elements that are under the bookstore element that have a price element with a value that is higher than 30 for $x in doc("books.xml")/bookstore/book

where $x/price>30

return $x/title The result will be:

<title lang="en">XQuery Kick Start</title>

<title lang="en">Learning XML</title>

Page 32: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

32

Conditional Expressions Notes on the "if-then-else" syntax: parentheses around the

if expression are required. else is required, but it can be just else ().for $x in doc("books.xml")/bookstore/book return if ($x/@category="CHILDREN")

then <child>{data($x/title)}</child> else <adult>{data($x/title)}</adult>

The result of the example above will be:<adult>Everyday Italian</adult> <child>Harry Potter</child> <adult>Learning XML</adult> <adult>XQuery Kick Start</adult>

Page 33: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

33

Comparisons General comparisons: =, !=, <, <=, >, >= Value comparisons: eq, ne, lt, le, gt, ge

$bookstore//book/@q > 10 The expression above returns true if any q attribu

tes have values greater than 10.$bookstore//book/@q gt 10

The expression above returns true if there is only one q attribute returned by the expression, and its value is greater than 10. If more than one q is returned, an error occurs.

Page 34: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

34

Selecting and Filtering Elements The at keyword can be used to count the iteration:

for $x at $i in doc("books.xml")/bookstore/book/title return <book>{$i}. {data($x)}</book>

Result: <book>1. Everyday Italian</book> <book>2. Harry Potter</book> <book>3. XQuery Kick Start</book> <book>4. Learning XML</book>

It is also allowed with more than one in expression in the for clause. Use comma to separate each in expression: for $x in (10,20), $y in (100,200) return <test>x={$x} and y={$y}</test>

Result:<test>x=10 and y=100</test> <test>x=10 and y=200</test> <test>x=20 and y=100</test> <test>x=20 and y=200</test>

Page 35: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

35

Computing Joins fridge.xml:

<fridge><stuff>eggs</stuff><stuff>olive oil</stuff><stuff>ketchup</stuff><stuff>unrecognizable moldy thing</stuff>

</fridge> Find recipes that use some ingredients in the refrigerator

declare namespace rcp = "http://www.brics.dk/ixwt/recipes";for $r in fn:doc("recipes.xml")//rcp:recipefor $i in $r//rcp:ingredient/@namefor $s in fn:doc("fridge.xml")//stuff[text()=$i]return fn:distinct-values($r/rcp:title/text() )

Page 36: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

36

Inverting a Relationdeclare namespace rcp = "http://www.brics.dk/ixwt/recipes";<ingredients>

{ for $i in distinct-values( fn:doc("recipes.xml")//rcp:ingredient/@name

) return <ingredient name="{$i}">

{ for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] return <title>$r/rcp:title/text()</title>}</ingredient>

}</ingredients>

Page 37: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

37

Sorting the Resultsdeclare namespace rcp = "http://www.brics.dk/ixwt/recipes";<ingredients>

{ for $i in distinct-values(fn:doc("recipes.xml")//rcp:ingredient/@name) order by $i return <ingredient name="{$i}">

{ for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] order by $r/rcp:title/text() return <title>$r/rcp:title/text()</title>}</ingredient>

}</ingredients>

Page 38: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

38

A More Complicated Sorting

for $s in document("students.xml")//student

order by

fn:count(

$s/results/result[fn:contains(@grade,"A")]

) descending,

fn:count($s/major) descending,

xs:integer($s/age/text()) ascending

return $s/name/text()

Page 39: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

39

Using Functionsdeclare function local:grade($g) {

if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0

};

declare function local:gpa($s) {fn:avg(for $g in $s/results/result/@grade return local:grade($g))

};

<gpas>{ for $s in fn:doc("students.xml")//student return <gpa id="{$s/@id}" gpa="{local:gpa($s)}"/> }

</gpas>

Page 40: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

40

Extend the Expressive Power Using recursive function to generate an XML tree of a g

iven height:declare function gen($n) {

if ($n eq 0) <bar/>else <foo>{ gen($n - 1), gen($n - 1) } </foo>

}; Compute the height of an XML tree:

declare function local:height($x) {if (fn:empty($x/*)) then 1else fn:max(for $y in $x/* return local:height($y))+1

};

Page 41: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

41

A Textual OutlineCailles en Sarcophages

pastrychilled unsalted butterfloursaltice water

fillingbaked chicken

marinated chickensmall chickens, cut upHerbes de Provencedry white wineorange juiceminced garlictruffle oil

...

Page 42: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

42

Computing Textual Outlinesdeclare namespace rcp = "http://www.brics.dk/ixwt/recipes";declare function local:ingredients($i,$p) {

fn:string-join(for $j in $i/rcp:ingredientreturn fn:string-join(($p,$j/@name,"&#x0A",

local:ingredients($j,fn:concat($p," "))),""),"")};

declare function local:recipes($r) {fn:concat($r/rcp:title/text(),"&#x0A",local:ingredients($r," "))

};

fn:string-join(for $r in fn:doc("recipes.xml")//rcp:recipe[5]return local:recipes($r),""

)

Page 43: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

43

XML Databases

How can XML and databases be merged? Several different approaches:

extract XML views of relations use SQL to generate XML shred XML into relational databases

Page 44: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

44

Automatic XML Views (1/2)

<Students><record id="100026" name="Joe Average" age="21"/><record id="100078" name="Jack Doe" age="18"/>

</Students>

xmlelement(name, "Students",select xmlelement(name,

"record",xmlattributes(s.id, s.name, s.age))

from Students)

Page 45: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

45

Automatic XML Views (2/2)<Students>

<record><id>100026</id><name>Joe

Average</name><age>21</age>

</record><record>

<id>100078</id><name>Jack Doe</name><age>18</age>

</record></Students>

xmlelement(name, "Students",

select xmlelement(name,

"record",

xmlforest(s.id, s.name, s.age))

from Students

)

Page 46: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

46

Storing XML Documents

Designing a specialized system for storing native XML data

Using a DBMS to store the whole XML documents as text fields

Using a DBMS to store the document contents as data elements It must support the XML’s ordered data

model

Page 47: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

47

Using a DBMS: Relational DTD

Schema-aware An element that can occur at most once in its

parent is stored as a column of the table representing its parent

ParentID ID TEXT

ParentID ID title year 3 5 “S. Abiteboul”

2 3 “Data on The Web” “2000” 3 6 “P. Buneman”

2 4 … … 3 7 “D. Suciu”

The book table The author table

references

book book

author author author title year

2000Data on the Web

S. Abiteboul

P. Buneman

D. Suciu

author…

Page 48: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

48

Using a DBMS: Edge Schema-less A single table is used to store the entire document Each node is assigned an ID in depth first order

references

book book

author author author title year

2000Data on the Web

S. Abiteboul

P. Buneman

D. Suciu

author …

1

2

3 4

root

Page 49: 1 Querying XML Documents. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

49

Using a DBMS: Edge (cont.)

SourceID tag ordinal TargetID Data

1 reference 1 2 NULL

2 book 1 3 NULL

2 book 2 4 NULL

3 author 1 0 “S. Abiteboul”

3 author 2 0 “P. Buneman”

3 author 3 0 “D. Suciu”

3 title 4 0 “Data on The Web”

3 year 5 0 “2000”

The edge table