lecture 13: xquery xml publishing, xml storage

38
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002

Upload: natala

Post on 06-Jan-2016

64 views

Category:

Documents


4 download

DESCRIPTION

Lecture 13: XQuery XML Publishing, XML Storage. Monday, October 28, 2002. Organization. Project: Next phase, need to form companies Please form group of 3, email  Tessa by Thursday Problems, little extra-credit: One group will have only two people - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 13: XQuery XML Publishing, XML Storage

1

Lecture 13: XQueryXML Publishing, XML Storage

Monday, October 28, 2002

Page 2: Lecture 13: XQuery XML Publishing, XML Storage

2

OrganizationProject:• Next phase, need to form companies• Please form group of 3, email Tessa by Thursday• Problems, little extra-credit:

– One group will have only two people– One billing, one shipping volunteers to do inventory

Homework: • Good practice for the midterm• Try to finish before Monday

Page 3: Lecture 13: XQuery XML Publishing, XML Storage

3

Organization

Midterm

• Next Monday, 11/3

• Missed it ? You will get this score:– MidtermScore = 100 – 1.2(100 – FinalScore)– In other words, you will loose 20% more points

than on the final

Page 4: Lecture 13: XQuery XML Publishing, XML Storage

4

Overview

• DTDs: elements and attributes

• XQuery

Page 5: Lecture 13: XQuery XML Publishing, XML Storage

5

Very Simple DTD

<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>

<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>

Page 6: Lecture 13: XQuery XML Publishing, XML Storage

6

Very Simple DTD

<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>

<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>

Example of valid XML document:

Page 7: Lecture 13: XQuery XML Publishing, XML Storage

7

Content Model

• Element content: what we can put in an element (aka content model)

• Content model:– Complex = a regular expression over other elements

– Text-only = #PCDATA

– Empty = EMPTY

– Any = ANY

– Mixed content = (#PCDATA | A | B | C)*• (i.e. very restrictied)

Page 8: Lecture 13: XQuery XML Publishing, XML Storage

8

Attributes in DTDs

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED>

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED>

<person age=“25”> <name> ....</name> ...</person>

<person age=“25”> <name> ....</name> ...</person>

Page 9: Lecture 13: XQuery XML Publishing, XML Storage

9

Attributes in DTDs

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED

id ID #REQUIRED

manager IDREF #REQUIRED

manages IDREFS #REQUIRED>

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED

id ID #REQUIRED

manager IDREF #REQUIRED

manages IDREFS #REQUIRED>

<person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ...</person>

<person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ...</person>

Page 10: Lecture 13: XQuery XML Publishing, XML Storage

10

Attributes in DTDs

Types:

• CDATA = string

• ID = key

• IDREF = foreign key

• IDREFS = foreign keys separated by space

• (Monday | Wednesday | Friday) = enumeration

• NMTOKEN = must be a valid XML name

• NMTOKENS = multiple valid XML names

• ENTITY = you don’t want to know this

Page 11: Lecture 13: XQuery XML Publishing, XML Storage

11

Attributes in DTDs

Kind:• #REQUIRED• #IMPLIED = optional• value = default value• value #FIXED = the only value allowed

Page 12: Lecture 13: XQuery XML Publishing, XML Storage

12

Using DTDs

• Must include in the XML document• Either include the entire DTD:

– <!DOCTYPE rootElement [ ....... ]>

• Or include a reference to it:– <!DOCTYPE rootElement SYSTEM

“http://www.mydtd.org”>

• Or mix the two... (e.g. to override the external definition)

Page 13: Lecture 13: XQuery XML Publishing, XML Storage

13

FLWR (“Flower”) Expressions

FOR ...

LET...

WHERE...

RETURN...

FOR ...

LET...

WHERE...

RETURN...

Page 14: Lecture 13: XQuery XML Publishing, XML Storage

14

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Sample Data for Queries (more or less)

Page 15: Lecture 13: XQuery XML Publishing, XML Storage

15

FOR-WHERE-RETURN

Find all book titles published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year/text() > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year/text() > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

Page 16: Lecture 13: XQuery XML Publishing, XML Storage

16

FOR-WHERE-RETURN

Equivalently (perhaps more geekish)

FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title

RETURN $x

FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title

RETURN $x

And even shorter:

document("bib.xml")/bib/book[year/text() > 1995] /title document("bib.xml")/bib/book[year/text() > 1995] /title

Page 17: Lecture 13: XQuery XML Publishing, XML Storage

17

FOR-WHERE-RETURN

• Find all book titles and the year when they were published:

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>

Page 18: Lecture 13: XQuery XML Publishing, XML Storage

18

FOR-WHERE-RETURN

• Notice the use of “{“ and “}”

• What is the result without them ?

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

Page 19: Lecture 13: XQuery XML Publishing, XML Storage

19

XQuery: NestingFor each author of a book by Morgan

Kaufmann, list all books she published:

FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>

FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>

In the RETURN clause comma concatenates XML fragments

Page 20: Lecture 13: XQuery XML Publishing, XML Storage

20

XQuery

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Result:

Page 21: Lecture 13: XQuery XML Publishing, XML Storage

21

Aggregates

Find all books with more than 3 authors:

count = a function that countsavg = computes the averagesum = computes the sumdistinct-values = eliminates duplicates

FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x

FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x

Page 22: Lecture 13: XQuery XML Publishing, XML Storage

22

Aggregates

Same thing:

FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

Page 23: Lecture 13: XQuery XML Publishing, XML Storage

23

Aggregates

Print all authors who published more than 3 books – be aware of duplicates !

FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a)>3 RETURN <author> { $a } </author>

FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a)>3 RETURN <author> { $a } </author>

Page 24: Lecture 13: XQuery XML Publishing, XML Storage

24

XQuery

Find books whose price is larger than average:

FOR $b in document(“bib.xml”)/bibLET $a:=avg($b/book/price/text())FOR $x in $b/bookWHERE $x/price/text() > $aRETURN $x

FOR $b in document(“bib.xml”)/bibLET $a:=avg($b/book/price/text())FOR $x in $b/bookWHERE $x/price/text() > $aRETURN $x

Page 25: Lecture 13: XQuery XML Publishing, XML Storage

25

FOR-WHERE-RETURN

• “Flatten” the authors, i.e. return a list of (author, title) pairs

FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>

FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>

Result:<answer> <title> abc </title> <author> efg </author></answer><answer> <title> abc </title> <author> hkj </author></answer>

Page 26: Lecture 13: XQuery XML Publishing, XML Storage

26

FOR-WHERE-RETURN

• For each author, return all titles of her/his books

FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

What aboutduplicateauthors ?

Result:<answer> <author> efg </author> <title> abc </title> <title> klm </title> . . . .</answer>

Page 27: Lecture 13: XQuery XML Publishing, XML Storage

27

FOR-WHERE-RETURN

• Same, but eliminate duplicate authors:

FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

Page 28: Lecture 13: XQuery XML Publishing, XML Storage

28

FOR-WHERE-RETURN

• Same thing:

FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

Page 29: Lecture 13: XQuery XML Publishing, XML Storage

29

FOR-WHERE-RETURN

Find book titles by the coauthors of “Database Theory”:

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>

Result: <answer> abc </ answer > < answer > def </ answer > < answer > abc </ answer > < answer > ghk </ answer >

Question:Why do we get duplicates ?

Page 30: Lecture 13: XQuery XML Publishing, XML Storage

30

Distinct-values

Same as before, but eliminate duplicates:

Result: <answer> abc </ answer > < answer > def </ answer > < answer > ghk </ answer >

distinct-values = a function that eliminates duplicates

Need to apply to a collectionof text values, not of elements – note how query has changed

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())

RETURN <answer> { $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())

RETURN <answer> { $y } </answer>

Page 31: Lecture 13: XQuery XML Publishing, XML Storage

31

SQL and XQuery Side-by-sideProduct(pid, name, maker)Company(cid, name, city)

Find all products made in Seattle

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }

FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }

SQL XQuery

FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }

FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }

CoolXQuery

Page 32: Lecture 13: XQuery XML Publishing, XML Storage

32

<db> <product> <row> <pid> ??? </pid> <name> ??? </name> <maker> ??? </maker> </row> <row> …. </row> … </product> . . . .</db>

Page 33: Lecture 13: XQuery XML Publishing, XML Storage

33

XQuery

• FOR $x in expr -- binds $x to each value in the list expr

• LET $x := expr -- binds $x to the entire list expr– Useful for common subexpressions and for

aggregations

Page 34: Lecture 13: XQuery XML Publishing, XML Storage

34

XQuery

$b is a collection of elements, not a single elementcount = a (aggregate) function that returns the number of elms

<big_publishers> { FOR $p IN distinct-values(//publisher/text()) LET $b := /db/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> { $p } </publisher>}</big_publishers>

<big_publishers> { FOR $p IN distinct-values(//publisher/text()) LET $b := /db/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> { $p } </publisher>}</big_publishers>

Find all publishers that published more than 100 books:

Page 35: Lecture 13: XQuery XML Publishing, XML Storage

35

XQuery

Summary:

• FOR-LET-WHERE-RETURN = FLWR

FOR/LET Clauses

WHERE Clause

RETURN Clause

List of tuples

List of tuples

Instance of Xquery data model

Page 36: Lecture 13: XQuery XML Publishing, XML Storage

36

FOR v.s. LET

FOR

• Binds node variables iteration

LET

• Binds collection variables one value

Page 37: Lecture 13: XQuery XML Publishing, XML Storage

37

FOR v.s. LET

FOR $x IN /bib/bookRETURN <result> { $x } </result>

FOR $x IN /bib/bookRETURN <result> { $x } </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x := /bib/bookRETURN <result> { $x } </result>

LET $x := /bib/bookRETURN <result> { $x } </result>

Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result>

Page 38: Lecture 13: XQuery XML Publishing, XML Storage

38

Collections in XQuery

• Ordered and unordered collections– /bib/book/author/text() = an ordered collection: result is

in document order

– distinct-values(/bib/book/author/text()) = an unordered collection: the output order is implementation dependent

• LET $a := /bib/book $a is a collection• $b/author a collection (several authors...)

RETURN <result> { $b/author } </result>RETURN <result> { $b/author } </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>