xml query. introduction an xml document can represent almost anything, and users of an xml query...
TRANSCRIPT
XML query
introduction
• An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever they have stored in XML.
• XQuery is based on the structure of XML and leverages this structure to provide query capabilities for the same range of data that XML stores.
• XQuery is defined in terms of the XQuery 1.0 and XPath 2.0 Data Model [XQ-DM], which represents the parsed structure of an XML document as an ordered, labeled tree in which nodes have identity and may be associated with simple or complex types.
• XQuery is a functional language
XML vs. Relational Data
{ row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 }}
name phone
John 3634
Sue 6343
Dick 6363
row row row
name name name
phone phone phone
“John” 3634 “Sue” “Dick”6343 6363
Relation … in XML
Relational to XML Data
• A relation instance is basically a tree with:– Unbounded fanout at level 1 (i.e., any # of rows)– Fixed fanout at level 2 (i.e., fixed # fields)
• XML data is essentially an arbitrary tree– Unbounded fanout at all nodes/levels– Any number of levels– Variable # of children at different nodes, variable
path lengths
Query Language for XML
• Must be high-level; “SQL for XML”• Must conform to XSchema
– But also work in absence of schema info• Support simple and complex/nested datatypes• Support universal and existential quantifiers,
aggregation• Operations on sequences and hierarchies of doc
structures• Capability to transform and create XML structures
XQuery
• Influenced by XML-QL, Lorel, Quilt, YATL– Also, XPath and XML Schema
• Reads a sequence of XML fragments or atomic values and returns a sequence of XML fragments or atomic values– Inputs/outputs are objects defined by XML-Query
data model, rather than strings in XML syntax
Overview of XQuery• Path expressions• Element constructors• FLWOR (“flower”) expressions
– Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc.
• Expressions evaluated w.r.t. a context:– Context item (current node)– Context position (in sequence being processed)– Context size (of the sequence being processed)– Context also includes namespaces, variables, functions,
date, etc.
Path Expressions
Examples:• Bib/paper• Bib/book/publisher• Bib/paper/author/lastname
Given an XML document, the value of a path expression p is a set of objects
Path Expression Examples
Doc =
&o1
&o12 &o24 &o29
&o43
&o70 &o71
&96
&243 &206
&25
“Serge”“Abiteboul”
1997
“Victor”“Vianu”
122 133
paper bookpaper
references
references references
authortitle
yearhttp
author
authorauthor
title publisherauthor
authortitle
page
firstname lastnamefirstname
lastnamefirst last
Bib
&o44 &o45 &o46
&o47 &o48 &o49 &o50 &o51
&o52
Bib/paper = <&o12,&o29>Bib/book/publisher = <&o51>Bib/paper/author/lastname = <&o71,&206>
Bib/paper = <&o12,&o29>Bib/book/publisher = <&o51>Bib/paper/author/lastname = <&o71,&206>
Note that order of elements matters!
Element Construction
• An XQuery expression can construct new values or structures
• Example: Consider the path expressions from the previous slide.– Each of them returns a newly constructed
sequence of elements– Key point is that we don’t just return existing
structures or atomic values; we can re-arrange them as we wish into new structures
Data Model
• In the XQuery data model, every document is represented as a tree of nodes. The kinds of nodes that may occur are: document, element, attribute, text, name-space, processing instruction, and comment.
• An item is a single node or atomic value. A series of items is known as a sequence. In XQuery, every value is a sequence
Literals and comments
• (: Hello World :)
Doc() function
• Returns entire document doc("books.xml")
Locating nodes
• A path expression consists of a series of one or more steps, separated by a slash, /, or double slash, //.
• doc("books.xml")/bib/book
• doc("books.xml")//book
Predicates
• Predicates are Boolean conditions that select a subset of the nodes computed by a step expression.
• XQuery uses square brackets around predicates.
• For instance, the following query returns only authors for which last="Stevens" is true:doc("books.xml")/bib/book/author[last="Stevens"]
• If a predicate contains a single numeric value, it is treated like a subscript. For instance, the following expression returns the first author of each book:
(doc("books.xml")/bib/book/author)[1]
Creating Nodes
document { <book year="1977"> <title>Harold and the Purple Crayon</title> <author><last>Johnson</last><first>Crockett </first></author> <publisher>HarperCollins Juvenile Books</publisher> <price>14.95</price> </book> }
<titles count="{ count(doc('books.xml')//title) }"> { doc("books.xml")//title } </titles>
FLWOR Expressions
• FOR-LET-WHERE-ORDERBY-RETURN = FLWOR
FOR / LET Clauses
WHERE Clause
ORDERBY/RETURN Clause
List of tuples
List of tuples
Instance of XQuery data model
FOR vs. LET
• FOR $x IN list-expr – Binds $x in turn to each value in the list expr
• LET $x = list-expr – Binds $x to the entire list expr– Useful for common sub-expressions and for
aggregations
FOR vs. LET: Example
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns:<result> <book>...</book> <book>...</book> <book>...</book> ...</result>
Notice that result hasseveral elements
Notice that result hasexactly one element
XQuery Example 1
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
XQuery Example 2For each author of a book by Morgan
Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates (after converting inputs to atomic values)
Results for Example 2
<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
Observe how nested structure of result elements is determined by the nested structure of the query.
XQuery Example 3
count = (aggregate) function that returns the number of elements
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
For each publisher p- Let the list of books published by p be b
Count the # books in b, and return p if b > 100
XQuery Example 4
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
FLWOER Expressions
for $b in doc("books.xml")//book where $b/@year = "2000" return $b/title