fundamentals, design, and implementation, 9/e text and xml databases instructor: dragomir r. radev...

31
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Post on 22-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Fundamentals, Design, and Implementation, 9/e

Text and XML databases

Instructor: Dragomir R. Radev

Winter 2005

Page 2: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/2 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Types of databases

Textual databases Semi-structured databases

Page 3: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/3 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Indexing textual data

Inverted files Boolean queries Signature files Signature S1 matches signature S2 if

S2&S1=S2

Page 4: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/4 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

XML-QL

WHERE <BOOK> <NAME><LAST>$1</LAST></NAME> </BOOK> in “www.booklist.com/books.xmlCONSTRUCT <RESULT> $1 </RESULT>

Two slides from Johannes Gehrke, Cornell University<IMG SRC=“xysq.gif” ALT=“(x+y)^2”>

<apply> <power/> <apply> <plus/> <ci>x</ci> <ci>y</ci> </apply> <cn>2</cn> </apply>

Page 5: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/5 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

XML-QL (continued)

WHERE <BOOK> $b <BOOK> IN “www.booklist.com/books.xml”, <AUTHOR> $n </AUTHOR> <PUBLISHED> $p </PUBLISHED> in $eCONSTRUCT <RESULT> <PUBLISHED> $p </PUBLISHED> WHERE <LAST> $l </LAST> IN $n CONSTRUCT <LAST> $l </LAST> </RESULT>

Page 6: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/6 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

<!ELEMENT book (author+, title, publisher)>

<!ATTLIST book year CDATA>

<!ELEMENT article (author+, title, year?, (shortversion|longversion))>

<!ATTLIST article type CDATA>

<!ELEMENT publisher (name, address)>

<!ELEMENT author (firstname?, lastname)>

XML-QL (continued)

Page 7: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/7 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

WHERE <book>

<publisher><name>Addison-Wesley</name></publisher>

<title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a

XML-QL (continued)

Page 8: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/8 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

WHERE <book> <publisher><name>Addison-

Wesley</></> <title> $t</> <author> $a</> </> IN "www.a.b.c/bib.xml" CONSTRUCT $a

XML-QL (continued)

Page 9: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/9 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

WHERE <book>

<publisher><name>Addison-Wesley</></> <title> $t</> <author> $a</> </> IN "www.a.b.c/bib.xml" CONSTRUCT <result> <author> $a</> <title> $t</> </>

XML-QL (continued)

Page 10: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/10 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

<bib>

<book year="1995">

<!-- A good introductory text -->

<title> An Introduction to Database Systems </title>

<author> <lastname> Date </lastname> </author>

<publisher> <name> Addison-Wesley </name > </publisher>

</book>

<book year="1998">

<title> Foundation for Object/Relational Databases: The Third Manifesto </title>

<author> <lastname> Date </lastname> </author>

<author> <lastname> Darwen </lastname> </author>

<publisher> <name> Addison-Wesley </name > </publisher>

</book>

</bib>

XML-QL (continued)

Page 11: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/11 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

<result> <author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result>

<result> <author> <lastname> Date </lastname> </author> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> </result>

<result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> </result>

XML-QL (continued)

Page 12: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/12 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

WHERE <book > $p</> IN "www.a.b.c/bib.xml",

<title > $t</>,

<publisher><name>Addison-Wesley</>> IN $p

CONSTRUCT <result>

<title> $t </>

WHERE <author> $a </> IN $p

CONSTRUCT <author> $a</>

</>

XML-QL (continued)

Page 13: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/13 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

<result>

<title> An Introduction to Database Systems </title>

<author> <lastname> Date </lastname> </author>

</result>

<result>

<title> Foundation for Object/Relational Databases: The Third Manifesto </title>

<author> <lastname> Date </lastname> </author>

<author> <lastname> Darwen </lastname> </author>

</result>

XML-QL (continued)

Page 14: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/14 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

WHERE <article> <author> <firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </> </> CONTENT_AS $a IN "www.a.b.c/bib.xml"

<book year=$y> <author> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </> </> IN "www.a.b.c/bib.xml", y > 1995 CONSTRUCT <article> $a </>

XML-QL (continued)

Page 15: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/15 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

XML-QL (continued)

Page 16: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/16 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

<!ATTLIST person ID ID #REQUIRED><!ATTLIST article author IDREFS

#IMPLIED>

XML-QL (continued)

Page 17: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/17 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

<person ID="o123">

<firstname>John</firstname>

<lastname>Smith<lastname>

</person>

<person ID="o234">

. . .

</person>

<article author="o123 o234">

<title> ... </title>

<year> 1995 </year>

</article>

XML-QL (continued)

Page 18: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/18 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

XML-QL (continued)

Page 19: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/19 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

WHERE <article><author><lastname> $n</></></> IN "abc.xml”

XML-QL (continued)

WHERE <article author=$i> <title> </> ELEMENT_AS $t </>, <person ID=$i> <lastname> </> ELEMENT_AS $l </>CONSTRUCT <result> $t $l</>

Page 20: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/20 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Scalar values

<title>A Trip to <titlepart> the Moon </titlepart></title> NOT!

<title><CDATA> A Trip to </CDATA><titlepart><CDATA> the

Moon</CDATA></titlepart></title> YES

Page 21: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/21 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Tag variables

WHERE <$p> <title> $t </title> <year>1995</> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </>

Page 22: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/22 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Transforming data

<!ELEMENT book (author+, title, publisher)> <!ATTLIST book year CDATA> <!ELEMENT article (author+, title, year?, (shortversion|

longversion))> <!ATTLIST article type CDATA> <!ELEMENT publisher (name, address)> <!ELEMENT author (firstname?, lastname)>

<!ELEMENT person (lastname, firstname, address?, phone?, publicationtitle*)>

Page 23: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/23 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Transforming data (cont’d)

WHERE <$> <author> <firstname> $fn </> <lastname> $ln </> </> <title> $t </> </> IN "www.a.b.c/bib.xml", CONSTRUCT <person ID=PersonID($fn, $ln)> <firstname> $fn </> <lastname> $ln </> <publicationtitle> $t </> </>

Page 24: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/24 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Integrating data from different sources

WHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN "www.a.b.c/data.xml",

<taxpayer> <ssn> $ssn</> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> $n $i </>

Page 25: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/25 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Query blocks

WHERE <$e> <title> $t </> <year> 1995 </> </> CONTENT_A $p IN "www.a.b.c/bib.xml" CONSTRUCT <result ID=ResultID($p)> <title> $t </> </> { WHERE $e = "journal-paper", <month> $m </> IN $p CONSTRUCT <result ID=ResultID($p)> <month> $m </>

</> } { WHERE $e = "book", <publisher>$q </> IN $p CONSTRUCT <result ID=ResultID($p)> <publisher>$q </>

</> }

Page 26: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/26 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

XQuery

Successor to XML-QL, YAML, Lorel, Quilt

Supported by the W3C Draft only

Page 27: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/27 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

DTD

<!ELEMENT bib (book* )><!ELEMENT book (title, (author+ | editor+ ), publisher, price )><!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )><!ELEMENT editor (last, first, affiliation )><!ELEMENT title (#PCDATA )><!ELEMENT last (#PCDATA )><!ELEMENT first (#PCDATA )><!ELEMENT affiliation (#PCDATA )><!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>

Page 28: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/28 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Sample database<bib>

<book year="1994"><title>TCP/IP Illustrated</title><author>

<last>Stevens</last><first>W.</first>

</author><publisher>Addison-Wesley</publisher><price> 65.95</price>

</book> <book year="1992">

<title>Advanced Programming in the Unix environment</title><author>

<last>Stevens</last><first>W.</first>

</author><publisher>Addison-Wesley</publisher><price>65.95</price>

</book><book year="2000">

<title>Data on the Web</title><author>

<last>Abiteboul</last><first>Serge</first></author>

<author><last>Buneman</last><first>Peter</first>

</author><author>

<last>Suciu</last><first>Dan</first>

</author><publisher>Morgan Kaufmann Publishers</publisher><price>39.95</price>

</book></bib>

Page 29: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/29 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Sample query

<bib>{for $b in

document("http://www.bn.com/bib.xml")/bib/bookwhere

$b/publisher = "Addison-Wesley“and

$b/@year > 1991return

<book year="{ $b/@year }">{ $b/title }

</book>}

</bib>

Page 30: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/30 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Expected result

<bib><book year="1994">

<title>TCP/IP Illustrated</title></book><book year="1992">

<title>Advanced Programming in the Unix environment</title>

</book></bib>

Page 31: Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005

Chapter 9/31 Copyright © 2004

Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. Kroenke

Pointers and Demos

http://www.w3.org/TR/xquery/ http://www.w3.org/TR/xmlquery-use-cases/ http://xml.org/ http://131.107.228.20/Default.aspx?

case=XMP&example=Q1#Query http://www.db.ucsd.edu/people/yannis/

XQueryTutorial.htm http://www.ex.ac.uk/~pellison/xml/multiple.htm http://seacow.eecs.umich.edu:8080/timberweb/