g22 3033 002 c91.prn...3 7 towards xml application services n processing n dom extensions n binding...
TRANSCRIPT
1
1
XML for Java Developers G22.3033-002
Session 9 - Main ThemeXML Information Retrieval (Part I)
Dr. Jean-Claude Franchitti
New York UniversityComputer Science Department
Courant Institute of Mathematical Sciences
2
Agenda
n Summary of Previous Session
n Applications of XML to Database Technology
n XML Query Languages
n XPath
n XML Queries
n XQuery: A Query Language for XML
n XML Query Engines
n Assignment 5a+5b+5c (due in two weeks)
3
Summary of Previous Session
n XML/XSL and JSP/JavaBeans Rendering Technology
n Internationalization Issues
n Web Content Accessibility Guidelines (WCAG)
n Assignment 4a+4b (due next week)
2
4
XML-Based Retrieval Developmentn XML Software Development Methodology
n Language + Stepwise Process + Tools
n Rational Unified Process (RUP) v.s. “XML Unified Process”
n XML Application Development Infrastructure
n Metadata Management (e.g., XMI)n XML Query Engine (3rd party software)
n XML Tools (e.g., XML Editors)
n XML Applications Involved in the Rendering Phase:
n Application(s) of XMLn XML-based applications/services (markup language mediators)
n MOM, POP, Persistence service
n Application Infrastructure Frameworks
5
XML Data Retrieval Patterns
n XML Data Retrieval Operationsn Access
n Queryn Manipulate
n Multiple XML Data Sources Integrationn XML Message Filteringn DBMS Data Viewsn Database System Interfacing
6
Part I
Applications of XML toDatabase Technology
3
7
Towards XML Application Servicesn Processing
n DOM Extensions
n Binding Extensionsn Component Frameworks (reusable component models)
n Model-Based Automation (MDA)
n Renderingn DOM 2.1.0, SAX 2.0, JAXP 1.1 & TraX, XSL -FO 1.0n Component Frameworks
n Queryingn XQuery 1.0, XSLT 1.1/2.0, XPath 1.0/2.0
n Security (signatures encryption/decryption, etc.)n Etc.
8
Retrieval Software Developmentn Languages (XML-QL, YaTL, XQL, etc.)
n Data Model + Operations + Syntax
n Process (“XUP”)n Frameworks
n Custom Enginen e.g., XQEngine
n Translation to SQLn e.g., DB2XML, Oracle’s XML I/F
n Translation to OQL
n XML Query Infrastructuren XPath Processors: Saxon 6.1, Xalan-J 2.1.0n XQuery Processors
n SQLX
9
XML Query History
n SGML Query Facilitiesn Ad-hoc Approach to Query Languagesn 02/98: XQL Proposaln 08/98: XML-QL Submissionn 12/98: W3C QL’98 Workshop
n Candidate Requirements for XML Queryn Database Desiderata for an XML Query Language
n 11/99: XPath Recommendation
4
10
W3C XML Query WG
n 07/99: WG Proposaln 09/99: WG Official Inceptionn Today:
n 30 W3C Member Companies
n 11 Meetings, 60+ Telconsn Heartbeat every Three Months
n Proposed Recommendation(s)
n Goals:n XML Query Data Model for XML Dcouments
n Query Operators for XML Query Data Modeln Query Language Based on XML Query Operators
11
W3C’s Related Standards
n XML Query Specifications:n XML Query Requirements (02/16/01 - orig. 01/00)
n XML Query Use Cases (06/08/01 - orig. 08/00)n XQuery 1.0 and XPath 2.0 Data Model (06/07/01 - orig. 05/00)
n XQuery 1.0 Formal Semantics (06/07/01 - orig. 12/00) n XQuery 1.0: An XML Query Language (06/07/01)
n XML Syntax for XQuery 1.0 (XQueryX) (06/07/01)
n XPath 2.0 Specificationsn XPath Requirements Version 2.0 (02/15/01)
n XQuery 1.0 and XPath 2.0 Data Model (06/07/01)
12
Related XML Technologiesn XPathn XSLn XPointern XML Scheman XML Infosetn WAIn Internationalizationn IETF DASL
n Distributed Authoring Searching and Locatingn http://www.ics.uci.edu/pub/ietf/dasl/
5
13
Properties of RDBMS Queries
n Pattern + Filter + Construction clausen Construction clause may have ordering
subclausesn Queries may perform joins across multiple
input setsn Queries may generate intermediate
variables or path expressions
14
Mapping XML to a RDBMS
n SQL-like queries that return XML documentsn e.g., Microsoft IIS + SQL Servern e.g., Oracle Database Server
n Broad spectrum of possible mappingsn Hierarchical v.s. limited RDBMS tree structure
15
JDBC Refresher
n See section 6.2 of XML and Java textbookn Importing JDBC Packagen Loading a JDBC Drivern Connecting to a Databasen Submitting a Query
6
16
XML Embedded in SQL (SQLX)
n SQL Embedded in XMLn See Section 6.3 of XML and Java textbookn Front-end to RDBMS that provides XML-based
Input/Outputn Translates XML query into sequence of JDBC
calls, and converts the result to a DOM structure which is returned
17
Part II
XML Query
18
XML Query Requirements (Part I)n General:
n Declarative Languagen Readable XML Syntaxn Protocol Independencen Standard Error Conditionsn Support for Future Updates
n Data Model n Based on XML Infosetsn Namespace Awaren Support for XML Schema Data Typesn Support Inter/Intra Document References
7
19
XML Query Requirements (Part II)n Query Functionality:
n Operators on All Data Typesn Text Operators Across Element Boundariesn Hierarchies and Sequencesn Combination of Data from Various Locationsn Aggregation and Sortingn Combination of Operators (Queries as Operands)n Support NULL valuesn Preservation of Structure/Identityn Operations on Names/Schemasn Extensibility & Closure
20
XML Query Use Casesn Approach
n Description, DTD/Schema, Input, Queries, Results
n Existing Use Casesn XMP (examples)n TREE (queries that preserve hierarchy)n SEQ (queries based on sequence)n R (relational data access)n TEXT (text search)n NS (namespace-based queries)n PARTS (recursive parts queries)n REF (queries based on references)
21
XML Query Data Model
n Information Presented to a Query Processorn Augmented Infoset:
n XML Schema Data Types (PSVI)n Document Collectionsn References
n Node-Labeled Tree Constructor Model with Node Identity
n Infoset Mapping to Query Data Model is Defined as Part of the Specification
8
22
XML Query Data Model(continued)
n Nodesn Node = DocNode | ElemNode | AttrNode | ValueNode
| NSNode | PINode | CommentNode | InfoItemNode
n XML Schema Primitive Typesn string, boolean, ID, IDREF, decimal, etc.
n Collectionsn list [T], set {T}, bag {|T|}, disjoint/union (T1 | T2),
tuple (T1, …, Tn)
n Referencesn ref(T)
23
XML Query Algebra
n Defines Static and Dynamic Semanticsn Static Semantics are Type Inference Rules
n Relate Algebra Expressions to Types
n Dynamic Semantics are Value Inference Rulesn Relate Algebra Expressions to Values
n Issues:n Algebra Type System Alignment with XML Scheman Operators on Schema Simple Types not Definedn Lexical Representation of Schema Simple Types not
Defined
24
Constructorsn Construct Values in XML Query Data Model
attrNode : (Ref(QNameValue), Ref(ValueNode)) -> AttrNodeValueNode = QNameValue | StringValue | DecimalValue | ...qnameValue : (uriReference | null, string, Ref(Def_QName)) -> QNameValuedecimalValue : (decimal, Ref(Def_decimal)) -> DecimalValue
<part price=10.50/><xsd:attribute name=“price” type=xsd:decimal/>
attrNode(ref(qnameValue(null, “price”, Ref(Def_QName)), ref(decimalValue(10.50, Ref(Def_decimal))))
9
25
Assessors
n Deconstruct Values in XML Query Data Modelname : AttrNode -> Ref(QNameValue)value : AttrNode -> Ref(ValueNode)type : AttrNode -> Ref(ElemNode)
<xsd:attribue name=“price” type=xsd:decimal/><part price=10.50/>
name(A1) = ref(qnameValue(null, “price”))value(A1) = ref(decimalValue(10.50, Ref(Def_decimal)))type(A1) = <!-- data model representation of simple type decimal -->
26
Part III
XML Query Languages
27
XQuery
n Functional Languagen Query Represented as an Expression
n Expressions can be Nested without Restrictionn Input/Output of an XQuery are Instances of
the XML Query Data Modeln Based on OQL, SQL, XML-QL, XPathn Readable XML Syntax
10
28
XQuery Expressions
n Path Expressionsn Element Constructorsn FLWR Expressionsn Expressions with Operators/Functionsn Conditional Expressionsn Quantified Expressionsn List Constructorsn Expressions to Test/Modify Datatypes
29
XQuery Path Expressionsn Abbreviated XPath 1.0 Syntax
n Find figure(s) with caption “Tree Frogs” in second chapter of “zoo.xml”
n document(“zoo.xml)/chapter[2]//figure[caption = “Tree Frogs”]
n Extensionsn Dereference Operatorn Range PredicateDocument Formatsn Find captions of figures referenced by <figref”
elements in “Frogs” chapter of “zoo.xml”n document(“zoo.xml”)/chapter[title =
“Frogs”]//figref/@refid->fig/caption
30
XQuery Element Constructor
n Start/End Tag + Enclosed List of Expressionsn Generate an element with a computed name that
contains nested elements:<$tagname><description> $d </description><price> $p </price></$tagname>
11
31
XQuery For Let Where Return (FLWR)
n FOR and LET Clausen Generate a List of Tuples that Preserves Doc Order
n WHERE Clausen Applies a Predicate to Eliminate Some Tuples
n RETURN Clausen Executed on Resulting Tuples -> Ordered Output List
n Syntax:FOR var IN expr WHERE expr RETURN exprLET var := expr
32
FLWR Sample Expressions
n List titles of books published by MK in 98FOR $b IN document (“bib.xml”)//bookWHERE $b/publisher = “Morgan Kaufmann”
AND $b/year = “1998”RETURN $b/title
n List each publisher and its books average priceFOR $p IN distinct(document(“bib.xml”)//publisher)LET $a := avg(document(“bib.xml”)
/book[publisher = $p]/price)
33
XQuery Operators and Functions
n Infix/Prefix Operatorsn e.g., Infix Operators BEFORE and AFTER
n Parenthesized Expressionsn Arithmetic/Logical Operatorsn Collection Operators
n e.g., UNION, INTERSECT, EXCEPT
n Functions Can Be Defined in XQuery
12
34
Sample Operators and Functions
n Find max depth of “partlist.xml”NAMESPACE xsd=“http”//www.w3.org/2001/03/XMLSchema-datatypes”
FUNCTION depth(ELEMENT $e) RETURNS xsd:integer
{ IF empty ($e/*) THEN 1
ELSE max (depth($e/*))+1}
depth(document(“partlist.xml”))
35
XQuery Conditional Expressions
FOR $h IN //holding
RETURN<holding>
$h/titleIF $h/@type=“Journal” THEN $h/editor
ELSE $h/author<holding> SORTBY (title)
36
XQuery Quantified Expressions
n Example 1:FOR $b IN //bookWHERE SOME $p IN $b//para SATISFIES
contains($p, “sailing”)AND contains($p, “windsurfing”)
RETURN $b/title
n Example 2:FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIEScontains($p, “sailing”)
RETURN $b/title
13
37
XQuery List Constructors
n List encloses zero or more expressions in square brackets, separated by commas
n List of member variables: [$x, $y, $z]n Empty list: [ ]
38
XQuery Operators on Data Types
n INSTANCEOF (instance, type)n CAST
n Convert value from one datatype to another
n TREATn Causes the query processor to treat an expression
as if its datatype were a subtype of its static type
39
XQuery Outstanding Issues
n Integration with XPath 2.0n Alignment of XQuery and XML Query
Algebra Syntaxn Internationalization
n e.g., Collation Sequences for Sorting, Strings ops
n XML Query Syntaxn Operators and Functions TBD
14
40
Part IV
XML Query Engines
41
Various Approachesn XQEngine
n Full-text search engine for XMLn Java APIs availablen W3C XQuery Specification Support
n DB2XMLn Standalone tool (with GUI or command line)n Servlet to dynamically generate XML-documentsn DB2XML API
n Oracle XML Developer Kit (XDK ) n Microsoft SQL Serversupport for XML
42
Part V
Conclusions
15
43
Summary
n Applying XML to Database Technology allows the viewing of database data as an XML document.
n XML Query is based on a well defined Data Model and Algebra
n Various syntaxes are possible for an XML Query Language
n XML Query Engines are infrastructure components that support XML Query
44
Readingsn Readings
n XML Development with Java 2: Chapter 10
n Professional Java XML: Chapters 16, and 17n XML and Java: Chapter 6
n Handouts posted on the course web siten Review textbook chapters on XML/Java persistence bindings
n Project Frameworks Setup (ongoing)n Howard Katz’s XQEnginen Apache’s Web Server, TomCat/JRun, and Cocoonn Apache’s Xerces, Xalan, Saxon
n Antenna House XML Formatter, Apache’s FOP, X-smilesn Publishing Systems at http://www.xmlsoftware.com
n Visibroker 4.5, WebLogic 6.1n POSE & KVM (See Session 3 handout)
45
Assignment
n Assignment #5: (#5c will be discussed next week)n #5a: This part of the project focuses on the application data
model design/development using XML information retrieval technology. The design/development process should focus
initially on identifying the data to be retrieved for resulting subsets of data
n #5b: Your XML application service should demonstrate the following additional steps: (a) Defining the optimal retrieval approach for each dataset, and (b) Considering query
constraints when designing an overall application data modeln More specific project related information, and extra credit
assignments will be provided during the session
16
46
Next Session:XML Information Retrieval (Part II)
XML-Based Frameworks (Part I)
n XML Object Persistencen Advanced XML-QL/XQL Conceptsn Using the SAX and DOM APIs with a databasen XML Server Pages (XSP)n Presentation Oriented Publishing (POP) Frameworksn Client-Side XML POP Frameworksn Server-Side XML POP Frameworks