g22 3033 002 c91.prn...3 7 towards xml application services n processing n dom extensions n binding...

16
1 1 XML for Java Developers G22.3033-002 Session 9 - Main Theme XML Information Retrieval (Part I) Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences 2 Agenda n Summary of Previous Session n Applications of XML to Database Technology n XML Query Languages n XPath n XML Queries n XQuery: A Query Language for XML n XML Query Engines n Assignment 5a+5b+5c (due in two weeks) 3 Summary of Previous Session n XML/XSL and JSP/JavaBeans Rendering Technology n Internationalization Issues n Web Content Accessibility Guidelines (WCAG) n Assignment 4a+4b (due next week)

Upload: others

Post on 12-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

1

1

XML for Java Developers G22.3033-002

Session 9 - Main ThemeXML Information Retrieval (Part I)

Dr. Jean-Claude Franchitti

New York UniversityComputer Science Department

Courant Institute of Mathematical Sciences

2

Agenda

n Summary of Previous Session

n Applications of XML to Database Technology

n XML Query Languages

n XPath

n XML Queries

n XQuery: A Query Language for XML

n XML Query Engines

n Assignment 5a+5b+5c (due in two weeks)

3

Summary of Previous Session

n XML/XSL and JSP/JavaBeans Rendering Technology

n Internationalization Issues

n Web Content Accessibility Guidelines (WCAG)

n Assignment 4a+4b (due next week)

Page 2: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

2

4

XML-Based Retrieval Developmentn XML Software Development Methodology

n Language + Stepwise Process + Tools

n Rational Unified Process (RUP) v.s. “XML Unified Process”

n XML Application Development Infrastructure

n Metadata Management (e.g., XMI)n XML Query Engine (3rd party software)

n XML Tools (e.g., XML Editors)

n XML Applications Involved in the Rendering Phase:

n Application(s) of XMLn XML-based applications/services (markup language mediators)

n MOM, POP, Persistence service

n Application Infrastructure Frameworks

5

XML Data Retrieval Patterns

n XML Data Retrieval Operationsn Access

n Queryn Manipulate

n Multiple XML Data Sources Integrationn XML Message Filteringn DBMS Data Viewsn Database System Interfacing

6

Part I

Applications of XML toDatabase Technology

Page 3: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

3

7

Towards XML Application Servicesn Processing

n DOM Extensions

n Binding Extensionsn Component Frameworks (reusable component models)

n Model-Based Automation (MDA)

n Renderingn DOM 2.1.0, SAX 2.0, JAXP 1.1 & TraX, XSL -FO 1.0n Component Frameworks

n Queryingn XQuery 1.0, XSLT 1.1/2.0, XPath 1.0/2.0

n Security (signatures encryption/decryption, etc.)n Etc.

8

Retrieval Software Developmentn Languages (XML-QL, YaTL, XQL, etc.)

n Data Model + Operations + Syntax

n Process (“XUP”)n Frameworks

n Custom Enginen e.g., XQEngine

n Translation to SQLn e.g., DB2XML, Oracle’s XML I/F

n Translation to OQL

n XML Query Infrastructuren XPath Processors: Saxon 6.1, Xalan-J 2.1.0n XQuery Processors

n SQLX

9

XML Query History

n SGML Query Facilitiesn Ad-hoc Approach to Query Languagesn 02/98: XQL Proposaln 08/98: XML-QL Submissionn 12/98: W3C QL’98 Workshop

n Candidate Requirements for XML Queryn Database Desiderata for an XML Query Language

n 11/99: XPath Recommendation

Page 4: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

4

10

W3C XML Query WG

n 07/99: WG Proposaln 09/99: WG Official Inceptionn Today:

n 30 W3C Member Companies

n 11 Meetings, 60+ Telconsn Heartbeat every Three Months

n Proposed Recommendation(s)

n Goals:n XML Query Data Model for XML Dcouments

n Query Operators for XML Query Data Modeln Query Language Based on XML Query Operators

11

W3C’s Related Standards

n XML Query Specifications:n XML Query Requirements (02/16/01 - orig. 01/00)

n XML Query Use Cases (06/08/01 - orig. 08/00)n XQuery 1.0 and XPath 2.0 Data Model (06/07/01 - orig. 05/00)

n XQuery 1.0 Formal Semantics (06/07/01 - orig. 12/00) n XQuery 1.0: An XML Query Language (06/07/01)

n XML Syntax for XQuery 1.0 (XQueryX) (06/07/01)

n XPath 2.0 Specificationsn XPath Requirements Version 2.0 (02/15/01)

n XQuery 1.0 and XPath 2.0 Data Model (06/07/01)

12

Related XML Technologiesn XPathn XSLn XPointern XML Scheman XML Infosetn WAIn Internationalizationn IETF DASL

n Distributed Authoring Searching and Locatingn http://www.ics.uci.edu/pub/ietf/dasl/

Page 5: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

5

13

Properties of RDBMS Queries

n Pattern + Filter + Construction clausen Construction clause may have ordering

subclausesn Queries may perform joins across multiple

input setsn Queries may generate intermediate

variables or path expressions

14

Mapping XML to a RDBMS

n SQL-like queries that return XML documentsn e.g., Microsoft IIS + SQL Servern e.g., Oracle Database Server

n Broad spectrum of possible mappingsn Hierarchical v.s. limited RDBMS tree structure

15

JDBC Refresher

n See section 6.2 of XML and Java textbookn Importing JDBC Packagen Loading a JDBC Drivern Connecting to a Databasen Submitting a Query

Page 6: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

6

16

XML Embedded in SQL (SQLX)

n SQL Embedded in XMLn See Section 6.3 of XML and Java textbookn Front-end to RDBMS that provides XML-based

Input/Outputn Translates XML query into sequence of JDBC

calls, and converts the result to a DOM structure which is returned

17

Part II

XML Query

18

XML Query Requirements (Part I)n General:

n Declarative Languagen Readable XML Syntaxn Protocol Independencen Standard Error Conditionsn Support for Future Updates

n Data Model n Based on XML Infosetsn Namespace Awaren Support for XML Schema Data Typesn Support Inter/Intra Document References

Page 7: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

7

19

XML Query Requirements (Part II)n Query Functionality:

n Operators on All Data Typesn Text Operators Across Element Boundariesn Hierarchies and Sequencesn Combination of Data from Various Locationsn Aggregation and Sortingn Combination of Operators (Queries as Operands)n Support NULL valuesn Preservation of Structure/Identityn Operations on Names/Schemasn Extensibility & Closure

20

XML Query Use Casesn Approach

n Description, DTD/Schema, Input, Queries, Results

n Existing Use Casesn XMP (examples)n TREE (queries that preserve hierarchy)n SEQ (queries based on sequence)n R (relational data access)n TEXT (text search)n NS (namespace-based queries)n PARTS (recursive parts queries)n REF (queries based on references)

21

XML Query Data Model

n Information Presented to a Query Processorn Augmented Infoset:

n XML Schema Data Types (PSVI)n Document Collectionsn References

n Node-Labeled Tree Constructor Model with Node Identity

n Infoset Mapping to Query Data Model is Defined as Part of the Specification

Page 8: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

8

22

XML Query Data Model(continued)

n Nodesn Node = DocNode | ElemNode | AttrNode | ValueNode

| NSNode | PINode | CommentNode | InfoItemNode

n XML Schema Primitive Typesn string, boolean, ID, IDREF, decimal, etc.

n Collectionsn list [T], set {T}, bag {|T|}, disjoint/union (T1 | T2),

tuple (T1, …, Tn)

n Referencesn ref(T)

23

XML Query Algebra

n Defines Static and Dynamic Semanticsn Static Semantics are Type Inference Rules

n Relate Algebra Expressions to Types

n Dynamic Semantics are Value Inference Rulesn Relate Algebra Expressions to Values

n Issues:n Algebra Type System Alignment with XML Scheman Operators on Schema Simple Types not Definedn Lexical Representation of Schema Simple Types not

Defined

24

Constructorsn Construct Values in XML Query Data Model

attrNode : (Ref(QNameValue), Ref(ValueNode)) -> AttrNodeValueNode = QNameValue | StringValue | DecimalValue | ...qnameValue : (uriReference | null, string, Ref(Def_QName)) -> QNameValuedecimalValue : (decimal, Ref(Def_decimal)) -> DecimalValue

<part price=10.50/><xsd:attribute name=“price” type=xsd:decimal/>

attrNode(ref(qnameValue(null, “price”, Ref(Def_QName)), ref(decimalValue(10.50, Ref(Def_decimal))))

Page 9: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

9

25

Assessors

n Deconstruct Values in XML Query Data Modelname : AttrNode -> Ref(QNameValue)value : AttrNode -> Ref(ValueNode)type : AttrNode -> Ref(ElemNode)

<xsd:attribue name=“price” type=xsd:decimal/><part price=10.50/>

name(A1) = ref(qnameValue(null, “price”))value(A1) = ref(decimalValue(10.50, Ref(Def_decimal)))type(A1) = <!-- data model representation of simple type decimal -->

26

Part III

XML Query Languages

27

XQuery

n Functional Languagen Query Represented as an Expression

n Expressions can be Nested without Restrictionn Input/Output of an XQuery are Instances of

the XML Query Data Modeln Based on OQL, SQL, XML-QL, XPathn Readable XML Syntax

Page 10: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

10

28

XQuery Expressions

n Path Expressionsn Element Constructorsn FLWR Expressionsn Expressions with Operators/Functionsn Conditional Expressionsn Quantified Expressionsn List Constructorsn Expressions to Test/Modify Datatypes

29

XQuery Path Expressionsn Abbreviated XPath 1.0 Syntax

n Find figure(s) with caption “Tree Frogs” in second chapter of “zoo.xml”

n document(“zoo.xml)/chapter[2]//figure[caption = “Tree Frogs”]

n Extensionsn Dereference Operatorn Range PredicateDocument Formatsn Find captions of figures referenced by <figref”

elements in “Frogs” chapter of “zoo.xml”n document(“zoo.xml”)/chapter[title =

“Frogs”]//figref/@refid->fig/caption

30

XQuery Element Constructor

n Start/End Tag + Enclosed List of Expressionsn Generate an element with a computed name that

contains nested elements:<$tagname><description> $d </description><price> $p </price></$tagname>

Page 11: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

11

31

XQuery For Let Where Return (FLWR)

n FOR and LET Clausen Generate a List of Tuples that Preserves Doc Order

n WHERE Clausen Applies a Predicate to Eliminate Some Tuples

n RETURN Clausen Executed on Resulting Tuples -> Ordered Output List

n Syntax:FOR var IN expr WHERE expr RETURN exprLET var := expr

32

FLWR Sample Expressions

n List titles of books published by MK in 98FOR $b IN document (“bib.xml”)//bookWHERE $b/publisher = “Morgan Kaufmann”

AND $b/year = “1998”RETURN $b/title

n List each publisher and its books average priceFOR $p IN distinct(document(“bib.xml”)//publisher)LET $a := avg(document(“bib.xml”)

/book[publisher = $p]/price)

33

XQuery Operators and Functions

n Infix/Prefix Operatorsn e.g., Infix Operators BEFORE and AFTER

n Parenthesized Expressionsn Arithmetic/Logical Operatorsn Collection Operators

n e.g., UNION, INTERSECT, EXCEPT

n Functions Can Be Defined in XQuery

Page 12: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

12

34

Sample Operators and Functions

n Find max depth of “partlist.xml”NAMESPACE xsd=“http”//www.w3.org/2001/03/XMLSchema-datatypes”

FUNCTION depth(ELEMENT $e) RETURNS xsd:integer

{ IF empty ($e/*) THEN 1

ELSE max (depth($e/*))+1}

depth(document(“partlist.xml”))

35

XQuery Conditional Expressions

FOR $h IN //holding

RETURN<holding>

$h/titleIF $h/@type=“Journal” THEN $h/editor

ELSE $h/author<holding> SORTBY (title)

36

XQuery Quantified Expressions

n Example 1:FOR $b IN //bookWHERE SOME $p IN $b//para SATISFIES

contains($p, “sailing”)AND contains($p, “windsurfing”)

RETURN $b/title

n Example 2:FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIEScontains($p, “sailing”)

RETURN $b/title

Page 13: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

13

37

XQuery List Constructors

n List encloses zero or more expressions in square brackets, separated by commas

n List of member variables: [$x, $y, $z]n Empty list: [ ]

38

XQuery Operators on Data Types

n INSTANCEOF (instance, type)n CAST

n Convert value from one datatype to another

n TREATn Causes the query processor to treat an expression

as if its datatype were a subtype of its static type

39

XQuery Outstanding Issues

n Integration with XPath 2.0n Alignment of XQuery and XML Query

Algebra Syntaxn Internationalization

n e.g., Collation Sequences for Sorting, Strings ops

n XML Query Syntaxn Operators and Functions TBD

Page 14: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

14

40

Part IV

XML Query Engines

41

Various Approachesn XQEngine

n Full-text search engine for XMLn Java APIs availablen W3C XQuery Specification Support

n DB2XMLn Standalone tool (with GUI or command line)n Servlet to dynamically generate XML-documentsn DB2XML API

n Oracle XML Developer Kit (XDK ) n Microsoft SQL Serversupport for XML

42

Part V

Conclusions

Page 15: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

15

43

Summary

n Applying XML to Database Technology allows the viewing of database data as an XML document.

n XML Query is based on a well defined Data Model and Algebra

n Various syntaxes are possible for an XML Query Language

n XML Query Engines are infrastructure components that support XML Query

44

Readingsn Readings

n XML Development with Java 2: Chapter 10

n Professional Java XML: Chapters 16, and 17n XML and Java: Chapter 6

n Handouts posted on the course web siten Review textbook chapters on XML/Java persistence bindings

n Project Frameworks Setup (ongoing)n Howard Katz’s XQEnginen Apache’s Web Server, TomCat/JRun, and Cocoonn Apache’s Xerces, Xalan, Saxon

n Antenna House XML Formatter, Apache’s FOP, X-smilesn Publishing Systems at http://www.xmlsoftware.com

n Visibroker 4.5, WebLogic 6.1n POSE & KVM (See Session 3 handout)

45

Assignment

n Assignment #5: (#5c will be discussed next week)n #5a: This part of the project focuses on the application data

model design/development using XML information retrieval technology. The design/development process should focus

initially on identifying the data to be retrieved for resulting subsets of data

n #5b: Your XML application service should demonstrate the following additional steps: (a) Defining the optimal retrieval approach for each dataset, and (b) Considering query

constraints when designing an overall application data modeln More specific project related information, and extra credit

assignments will be provided during the session

Page 16: g22 3033 002 c91.prn...3 7 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation

16

46

Next Session:XML Information Retrieval (Part II)

XML-Based Frameworks (Part I)

n XML Object Persistencen Advanced XML-QL/XQL Conceptsn Using the SAX and DOM APIs with a databasen XML Server Pages (XSP)n Presentation Oriented Publishing (POP) Frameworksn Client-Side XML POP Frameworksn Server-Side XML POP Frameworks