next generation query and transformation - datypic, inc. · pdf file ·...
TRANSCRIPT
1
Next Generation Query and
Transformation Standards
Priscilla Walmsley
Managing Director, Datypic
http://www.datypic.com
© 2005 Datypic http://www.datypic.com Slide 2
Agenda
• The query and transformation landscape
• Querying XML with XQuery
• Transforming XML with XSLT
• Shared components
• Decision points
© 2005 Datypic http://www.datypic.com Slide 3
Querying and Transforming
XML
• Querying: Extracting data of interest
• Transformation: Changing the structure of data
• Sometimes it's hard to tell them apart
Change all the b elements to x
elements, and ignore the rest
Give me just the b elements, but
call them x in the
results
© 2005 Datypic http://www.datypic.com Slide 4
W3C Standards for
Querying/Transformation
XQuery 1.0 XSLT 2.0
XPath2.0
XPath 1.0
Path ExpressionsComparison ExpressionsSome Built-In Functions
Conditional ExpressionsArithmetic ExpressionsQuantified ExpressionsBuilt-In Functions & OperatorsData Model
FLWOR ExpressionsXML ConstructorsQuery PrologUser-Defined Functions
StylesheetsTemplatesXML ConstructorsUser-Defined Functions
5
Querying XML with
XQuery
© 2005 Datypic http://www.datypic.com Slide 6
XQuery 1.0
for $i1 in doc("input1.xml")//item/@dept
for $i2 in doc("input2.xml")//product
where $i1/@dept = $i2/@dept
order by $i1
return
<dep name="{$i1}" quant="{sum($i2/@quant)}"/>
XML Input 1
XQuery Processor serialize
(or pass on)
analyze and evaluate
parse
XML Output
XML Output
XMLInput 2
parse
© 2005 Datypic http://www.datypic.com Slide 7
XML Input
• Could be data that is:
– a textual XML document on a file system
– retrieved from a Web service
– stored in an XML database
– stored in a relational database
– created in memory by program code
• Can take the form of:
– a single XML document
– a collection of several documents
– a fragment of a document (e.g. sequence of elements)
© 2005 Datypic http://www.datypic.com Slide 8
XQuery 1.0 Capabilities
for $i1 in doc("order.xml")//item
for $i2 in doc("catalog.xml")//product
where $i1/@num = $i2/prodNum
order by $i1/@num, $i1/quantity
return
<item number="{$i1/@num}"
name="{$i2/prodName}"
salePrice="{min($i2/price) -
$i2/discount}"/>
selecting elements/attributes from XML input documents
© 2005 Datypic http://www.datypic.com Slide 9
XQuery 1.0 Capabilities
joining data from multiple sources
for $i1 in doc("order.xml")//item
for $i2 in doc("catalog.xml")//product
where $i1/@num = $i2/prodNum
order by $i1/@num, $i1/quantity
return
<item number="{$i1/@num}"
name="{$i2/prodName}"
salePrice="{min($i2/price) -
$i2/discount}"/>
© 2005 Datypic http://www.datypic.com Slide 10
XQuery 1.0 Capabilities
adding new elements/attributes to results
for $i1 in doc("order.xml")//item
for $i2 in doc("catalog.xml")//product
where $i1/@num = $i2/prodNum
order by $i1/@num, $i1/quantity
return
<item number="{$i1/@num}"
name="{$i2/prodName}"
salePrice="{min($i2/price) -
$i2/discount}"/>
© 2005 Datypic http://www.datypic.com Slide 11
XQuery 1.0 Capabilities
performing calculations
for $i1 in doc("order.xml")//item
for $i2 in doc("catalog.xml")//product
where $i1/@num = $i2/prodNum
order by $i1/@num, $i1/quantity
return
<item number="{$i1/@num}"
name="{$i2/prodName}"
salePrice="{min($i2/price) -
$i2/discount}"/>
© 2005 Datypic http://www.datypic.com Slide 12
XQuery 1.0 Capabilities
sorting results
for $i1 in doc("order.xml")//item
for $i2 in doc("catalog.xml")//product
where $i1/@num = $i2/prodNum
order by $i1/@num, $i1/quantity
return
<item number="{$i1/@num}"
name="{$i2/prodName}"
salePrice="{min($i2/price) -
$i2/discount}"/>
13
XQuery Use Cases
© 2005 Datypic http://www.datypic.com Slide 14
"Native" XML DBMS e.g. MarkLogic, Berkeley DB, eXist
Search and Browse
Happy User
Semi-Structured XML Content(Poetry Manuscripts, Medical Journals, Hotel Reviews)
Built-In XQuery Processor
Built-In User Interface Custom User Interface
What hotels in New York allow pets and
have Internet access?
© 2005 Datypic http://www.datypic.com Slide 15
Relational DBMS e.g. SQL Server, Oracle, DB2
"XML-izing" Data for Web
Services
Happy User
Structured Data(Orders, Product Prices, Customer Information)
Built-In XQuery Front-End
Order Inquiry Web Service
What is the status of my order?
© 2005 Datypic http://www.datypic.com Slide 16
Integrating Disparate Data
Sources
© DataDirect Technologies
© 2005 Datypic http://www.datypic.com Slide 17
Anything, really...
• Anywhere in application code you would currently use XPath, or XSLT, or DOM, e.g.:
– to narrow down results returned
from a Web service
– in a pipeline process to split or
subset an XML document
– to manipulate or create a
configuration file stored as XML
18
XQuery Features
© 2005 Datypic http://www.datypic.com Slide 19
Features of XQuery
• Compact syntax
• Typing and schema support
• Reusable function libraries
• Designed with today's XML in mind
© 2005 Datypic http://www.datypic.com Slide 20
Compact, Intuitive Syntax
• Easy to learn and use
• Less verbose than XSLT
– but much more powerful than straight XPath
• Does not require hard-core programming background
• Ideal for embedding into programming languages
© 2005 Datypic http://www.datypic.com Slide 21
Embedding in Java
• XQJ: XQuery API for Java
– proposed Java standard for invoking queries,
and processing the results
– the "JDBC of XML"XQExpression expr = conn.createExpression();
String qy = "for $p in doc('cat.xml')//product
return ($p/name)";
XQResultSequence result = expr.executeQuery(qy);
while (result.next()) {
String str = result.getString();
System.out.println("Product name: " + str); }
result.close(); expr.close(); conn.close();
© 2005 Datypic http://www.datypic.com Slide 22
Typing and Schema
Support
• Typing allows for identification of query errors
• Optional schema support
– can associate a schema with a query or input document
– the schema defines the rules for the input or output XML
• names of elements/attributes
• hierarchical structure
• number of occurrences
• data types
© 2005 Datypic http://www.datypic.com Slide 23
Benefits of Using Schemas
• Better identification of static errors– allows discovery of errors in the query that
were not otherwise apparent
– especially important when new versions of the input XML vocabulary come along
• Query optimization
• Validity of query inputs and results– makes them more predictable
• Special processing based on type
© 2005 Datypic http://www.datypic.com Slide 24
Using Schemas to Catch
Static Errors
import schema
default element namespace
"http://datypic.com/prod"
at "http://datypic.com/prod.xsd";
for $prod in doc("cat.xml")/produt
order by $prod/name/number
return $prod/name + 1
invalid path; name
will never have number child
misspelling
type error: name is declared to be of type xs:string, so
cannot be used in an add operation
© 2005 Datypic http://www.datypic.com Slide 25
Reusable Function
Libraries
• Portable, reusable, shareable
• Can provide a set of standard queries on a standard XML vocabulary
• As vocabulary changes, function libraries can be recompiled and/or versioned
module namespace dty = "http://datypic.com/order";
declare function dty:orderStatus($num as xs:string?)
as element(order)* { ... };
declare function dty:cancelOrder($num as xs:string?)
as xs:boolean { ... };
© 2005 Datypic http://www.datypic.com Slide 26
Designed with Today's XML
in Mind
• Intuitive, designed-in support for:
– namespaces
– construction of new elements/attributes
– data types
– whitespace handling
– etc.
• Much less awkward than, e.g., DOM manipulation
27
Transforming XML
with XSLT
© 2005 Datypic http://www.datypic.com Slide 28
Typical XSLT Use Cases
• Transform content into presentation
– XML to HTML, XML to XSL-FO
• General purpose XML to XML transforms (data manipulation)
– B2B
– EAI
• Transform XML to other formats (text, CSV, etc.)
© 2005 Datypic http://www.datypic.com Slide 29
XSLT (Look familiar?)
<xsl:template match="order">
<xsl:for-each select="item">
<li>Item number <xsl:value-of select="@num"/></li>
</xsl:for-each>
</xsl:template>
XML Input 1
XSLT Processor serialize
(or pass on)
analyze and evaluate
parse
XML Output
XML Output
parseXML
Input 2
© 2005 Datypic http://www.datypic.com Slide 30
XSLT 2.0 - What's New?
• Grouping
• Multiple result documents
• Temporary result trees
• XPath 2.0 enhancements
– more powerful syntax
– more built-in functions
• Schema support and type system
© 2005 Datypic http://www.datypic.com Slide 31
Schema Support and
Type System
• Same typing/schema features as XQuery
• Special processing based on type:
<xsl:template match="element(*,USAddressType)">
.... <xsl:value-of select="city"/>
<xsl:value-of select="zipCode"/>
</xsl:template>
<xsl:template match="element(*,UKAddressType)">
.... <xsl:value-of select="postCode"/>
<xsl:value-of select="city"/>
</xsl:template>
© 2005 Datypic http://www.datypic.com Slide 32
XSLT Conveniences
(not present in XQuery)
• Highly flexible recursive processing
– allows "Push" approach
• Grouping syntax is more explicit = easier
• Formatting of dates and numbers
– format-date, format-number
• Advanced string manipulation
– analyze-string
• Ability to customize/override stylesheets
© 2005 Datypic http://www.datypic.com Slide 33
Pull vs. Push Approaches
• Pull
– go get element X and do this with it
– next, go get element Y and do this with it
• Push
– get the root element
• if it happens to be X, do this with it.
• if it happens to be Y, do this with it.
• if it's anything else, skip it.
– next, go get its children and repeat
© 2005 Datypic http://www.datypic.com Slide 34
Pull Approach
• Pulling the information from the input document using hardcoded paths to specific locations
• Requires a predictable document structure
<xsl:template match="order">
<xsl:for-each select="item">
<li>Item #<xsl:value-of select="@num"/></li>
</xsl:for-each>
</xsl:template>
© 2005 Datypic http://www.datypic.com Slide 35
Push Approach
• Traversing a document, taking each element as it comes, then deciding what to do with it
• Useful when the structure of the input file is not known, or is highly flexible
• Flexible but not optimized
• Very difficult to do in XQuery
© 2005 Datypic http://www.datypic.com Slide 36
Sample Stylesheet in
"Push" Style
<xsl:template match="order">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="item">
<xsl:apply-templates select="@*"/>
</xsl:template>
<xsl:template match="@num">
<li>Item # <xsl:value-of select="."/></li>
</xsl:template>
37
XQuery and XSLT:
Shared Components
© 2005 Datypic http://www.datypic.com Slide 38
Shared Components
• XPath 2.0
• Built-in functions
• Data model
© 2005 Datypic http://www.datypic.com Slide 39
XPath 2.0
• Full compatibility across XQuery and XSLT
– same syntax
– same expression will always return the same
value
• Much more than just path expressions
for $a in fn:distinct-values(/bib/book/author)
return ($a, /bib/book[author = $a]/title)
some $emp in /emps/employee satisfies
($emp/bonus > 0.25 * $emp/salary)
© 2005 Datypic http://www.datypic.com Slide 40
Over 100 Built-In
Functions: A Sample
• String-related• substring, contains, matches, tokenize
• Date-related• current-date, month-from-date
• Number-related• round, avg, sum, ceiling
• Sequence-related• index-of, insert-before, reverse
• Document- and URI-related• collection, doc, root, base-uri
© 2005 Datypic http://www.datypic.com Slide 41
XQuery/XPath Data Model
42
XQuery vs. XSLT:
Decision Factors
© 2005 Datypic http://www.datypic.com Slide 43
XQuery vs. XSLT:
Decision Factors
• Use case
• Availability of relevant implementations
• Performance
• Programming style
© 2005 Datypic http://www.datypic.com Slide 44
Use Case
• Use XSLT if:
– your documents are highly variable
– your transformation is presentation-oriented
– your processing is heavily recursive
• Use XQuery if:
– you are selecting a small subset of a collection of XML data
– you are joining data from multiple sources
– your documents are predictable in structure, or variations are not relevant to your searches
© 2005 Datypic http://www.datypic.com Slide 45
Availability of Relevant
Implementations
• XQuery– XML DBMSs: MarkLogic, Sleepycat Berkeley
DB, X-Hive, eXist
– Relational DBMSs: Oracle, SQL Server, DB2
– Standalone: Saxon
– XML Editors: Stylus Studio, XMLSpy, Oxygen
• XSLT 2.0– Standalone: Saxon
– XML Editors: Stylus Studio, XMLSpy
© 2005 Datypic http://www.datypic.com Slide 46
Performance
• XQuery implementations tend to be optimized
for:
– XML stored in a database
– predictable document structures that can be indexed
• XSLT implementations tend to be optimized for:
– transforming an entire document that can be loaded into memory
• More driven by use cases than limitations of
languages
© 2005 Datypic http://www.datypic.com Slide 47
Programming Style
• XSLT
– recursive template language difficult for some
developers to grasp
– verbosity can be irritating
– however, many users loves it
• XQuery
– appealing to SQL users
– probably easier for newcomers
© 2005 Datypic http://www.datypic.com Slide 48
Conclusions
• XQuery and XSLT 2.0 are coming of age
• They overlap in capabilities...
– but differ in use cases and sweet spots
• Both take XML manipulation to a new level in terms of:
– power
– flexibility
– production-readiness
© 2005 Datypic http://www.datypic.com Slide 49
Resources
• Detailed technical comparison of XQuery and XSLT 2.0
– Michael Kay's paper from XTech 05:
– http://www.idealliance.org/proce
edings/xtech05/papers/02-03-01/
• XQuery implementations
– http://www.w3.org/XML/Query
© 2005 Datypic http://www.datypic.com Slide 50
Learning XQuery
• My tutorial on XQuery:– http://datypic.com/services/xquery
• Definitive XQuery
– By Priscilla Walmsley
– Coming in 2006
51
Thank you for your
interest.
For more information pleasecontact me at:
Email: [email protected]: http://www.datypic.com