module 5 introduction to xquery

54
Module 5 Module 5 Introduction to Introduction to XQuery XQuery

Upload: idona-hopkins

Post on 31-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Module 5 Introduction to XQuery. XML is now everywhere. Google search (warning: unreliable numbers) 285.000.000 for XML 1.000.000 for XQuery 11.000.000 for XSLT 12.000.000 for XML Schema 60.000.000 for .NET 200.000.000 for Java 64.000.000 for SQL - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Module 5 Introduction to XQuery

Module 5Module 5

Introduction to Introduction to XQueryXQuery

Page 2: Module 5 Introduction to XQuery

04/19/23 2

XML is now everywhereXML is now everywhere Google search (warning: Google search (warning: unreliable numbers)unreliable numbers) 285.000.000 for XML285.000.000 for XML 1.000.000 for XQuery1.000.000 for XQuery 11.000.000 for XSLT11.000.000 for XSLT 12.000.000 for XML Schema12.000.000 for XML Schema 60.000.000 for .NET60.000.000 for .NET 200.000.000 for Java200.000.000 for Java 64.000.000 for SQL64.000.000 for SQL

The highest Google number among all The highest Google number among all the technology buzzwords that I the technology buzzwords that I searched (except RSS)searched (except RSS)

Page 3: Module 5 Introduction to XQuery

04/19/23 3

Sources of XML data Sources of XML data 1.1. Inter-application communication data (WS, Inter-application communication data (WS,

Rest, etc)Rest, etc)2.2. Mobile devices communication dataMobile devices communication data3.3. LogsLogs4.4. Blogs (RSS)Blogs (RSS)5.5. Metadata (e.g. Schema, WSDL, XMP)Metadata (e.g. Schema, WSDL, XMP)6.6. Presentation data (e.g. XHTML)Presentation data (e.g. XHTML)7.7. Documents (e.g. Word)Documents (e.g. Word)8.8. Views of other sources of data Views of other sources of data

Relational, LDAP, CSV, Excel, etc.Relational, LDAP, CSV, Excel, etc.

9.9. Sensor dataSensor data

Page 4: Module 5 Introduction to XQuery

04/19/23 4

Some vertical Some vertical application domains for application domains for

XMLXML HealthCare Level Seven HealthCare Level Seven http://www.hl7.org/http://www.hl7.org/ Geography Markup Language (GML) Geography Markup Language (GML) Systems Biology Markup Language (SBML) Systems Biology Markup Language (SBML) http://sbml.org/http://sbml.org/ XBRL, the XML based Business Reporting standard XBRL, the XML based Business Reporting standard http://www.xbrl.org/http://www.xbrl.org/

Global Justice XML Data ModelGlobal Justice XML Data Model (GJXDM) (GJXDM) http://it.ojp.gov/jxdmhttp://it.ojp.gov/jxdm ebXML ebXML http://www.ebxml.org/http://www.ebxml.org/ e.g. Encoded Archival Description Applicatione.g. Encoded Archival Description Application http://lcweb.loc.gov/ead/http://lcweb.loc.gov/ead/

Digital photography metadata XMPDigital photography metadata XMP An XML grammar for sensor data (SensorML)An XML grammar for sensor data (SensorML) Real Simple Syndication (RSS 2.0)Real Simple Syndication (RSS 2.0)

Basically everywhere.Basically everywhere.

Page 5: Module 5 Introduction to XQuery

04/19/23 5

Processing the XML dataProcessing the XML data• Huge amount of XML information, and growingHuge amount of XML information, and growing• We need to “We need to “managemanage” it, and then “” it, and then “processprocess” it” it

• Store it efficientlyStore it efficiently• Verify the correctness Verify the correctness • Filter, search, select, join, aggregateFilter, search, select, join, aggregate• Create new pieces of informationCreate new pieces of information• Clean, normalize the dataClean, normalize the data • Update itUpdate it• Take actions based on the existing dataTake actions based on the existing data• Write complex execution flowsWrite complex execution flows

• No conceptual organization like for No conceptual organization like for relational databases (applications are too relational databases (applications are too heterogeneous)heterogeneous)

Page 6: Module 5 Introduction to XQuery

04/19/23 6

Frequent solutions to XML Frequent solutions to XML data managementdata management

1.1. Map it to Map it to genericgeneric programming APIs programming APIs (e.g. DOM, SAX, StaX)(e.g. DOM, SAX, StaX)

2.2. ManuallyManually map it to map it to non-genericnon-generic APIs APIs

3.3. AutomaticallyAutomatically map it to map it to non-genericnon-generic structuresstructures

4.4. Use Use XML extensionsXML extensions of existing of existing languageslanguages

5.5. ShreddingShredding for relational stores for relational stores

6.6. NativeNative XML processing through XSLT XML processing through XSLT and XQueryand XQuery

Page 7: Module 5 Introduction to XQuery

04/19/23 7

1. Mapping to generic 1. Mapping to generic structuresstructures Represent the data:Represent the data:

Original UNICODE form orOriginal UNICODE form or Some binary representation (e.g FastInfoset)Some binary representation (e.g FastInfoset)

Store it:Store it: Directly on a file system orDirectly on a file system or On a “transacted” file system (e.g. On a “transacted” file system (e.g. SleepyCat, or a relational database)SleepyCat, or a relational database)

Map the XML data to generic XML Map the XML data to generic XML programmatic APIsprogrammatic APIs E.g. Dom, Sax, Stax (JSR 173), XMLReaderE.g. Dom, Sax, Stax (JSR 173), XMLReader

Use the native programming languages Use the native programming languages (e.g. Java, C#) to manipulate the data(e.g. Java, C#) to manipulate the data

Re-serialize it at the endRe-serialize it at the end

Page 8: Module 5 Introduction to XQuery

04/19/23 8

1. Manual mapping to 1. Manual mapping to generic structures generic structures

(example)(example)<purchaseOrder><purchaseOrder><lineItem><lineItem>……....</lineItem></lineItem><lineItem><lineItem>……....</lineItem></lineItem>

</purchaseOrder></purchaseOrder>

<book><book><author>…</author><author>…</author><title>….</title><title>….</title>……....

</book></book>

Class DomNode{

public String getNodeName();public String getNodeValue();public void setNodeValue(nodeValue);public short getNodeType();

}

Hard coded mappings

Page 9: Module 5 Introduction to XQuery

04/19/23 9

2. Manual mapping to 2. Manual mapping to non-generic structuresnon-generic structures

<purchaseOrder><purchaseOrder><lineItem><lineItem>……....</lineItem></lineItem><lineItem><lineItem>……....</lineItem></lineItem>

</purchaseOrder></purchaseOrder>

<book><book><author>…</author><author>…</author><title>….</title><title>….</title>……....

</book></book>

Class PurchaseOrder{

public List getLineItems();……..

}

Hard coded mappings

Class Book{ public List getAuthor();

public String getTitle();……

}

Page 10: Module 5 Introduction to XQuery

04/19/23 10

3. Automatic mapping to 3. Automatic mapping to non-generic structuresnon-generic structures

<type name=“<type name=“book-typebook-type”>”> <sequence><sequence> <attribute name=“<attribute name=“yearyear” type=“xs:integer”>” type=“xs:integer”> <element name=“<element name=“titletitle” type=“xs:string”>” type=“xs:string”> <sequence minoccurs=“0”><sequence minoccurs=“0”> <element name=“<element name=“authorauthor” type=“xs:string>” type=“xs:string> </sequence></sequence> </sequence></sequence></type></type><element name=“<element name=“bookbook” type=“” type=“book-typebook-type”>”>

Class Book-type{

public integer getYear();public string getTitle();public List getAuthors();……..

}

Automatic mapping

e.g.XMLBeans

Page 11: Module 5 Introduction to XQuery

04/19/23 11

4. XML extensions of 4. XML extensions of existing procedural existing procedural

languageslanguages Examples:Examples:

C-omega, ECMAscript, PHP extensions, C-omega, ECMAscript, PHP extensions, Phyton extensions, etc.Phyton extensions, etc.

Most of them define:Most of them define: A way of importing XML data into A way of importing XML data into their native type systemtheir native type system

A rich API for XML data manipulationA rich API for XML data manipulation A way of A way of navigating/searching/querying the XML navigating/searching/querying the XML data via their extensions (Xpath data via their extensions (Xpath based or Xpath inspired)based or Xpath inspired)

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 12: Module 5 Introduction to XQuery

04/19/23 12

5. Native XML processing5. Native XML processingXSLT and XQueryXSLT and XQuery

Most promising alternative for the future.Most promising alternative for the future. The The onlyonly alternative such that: alternative such that:

the data is modeled only oncethe data is modeled only once is well integrated with XML Schema type is well integrated with XML Schema type systemsystem

it preserves the logical/physical data it preserves the logical/physical data independenceindependence

the code deals with non-generic structuresthe code deals with non-generic structures Code can be optimized automaticallyCode can be optimized automatically

Data is stored:Data is stored: in plain file systems in plain file systems or or in sophisticated in sophisticated data stores (e.g. XML extensions of relational data stores (e.g. XML extensions of relational stores)stores)

Missing pieces, under developmentMissing pieces, under development E.g. no procedural logicE.g. no procedural logic

Page 13: Module 5 Introduction to XQuery

04/19/23 13

Why XQuery ?Why XQuery ? Why a “Why a “query” languagequery” language for XML ? for XML ?

Need to process XML dataNeed to process XML data Preserve logical/physical data independencePreserve logical/physical data independence

The semantics is described in terms of an The semantics is described in terms of an abstract data modelabstract data model, , independent of the physical data storageindependent of the physical data storage

DeclarativeDeclarative programmingprogramming Such programs should describe the “Such programs should describe the “whatwhat”, not the “”, not the “how”how”

Why a Why a nativenative query language ? Why not query language ? Why not SQLSQL ? ? We need to deal with the We need to deal with the specificitiesspecificities of XML of XML (hierarchical, ordered , textual, potentially (hierarchical, ordered , textual, potentially schema-less structure)schema-less structure)

Why another XML processing language ? Why not Why another XML processing language ? Why not XSLTXSLT?? The template nature of XSLT was not appealing to the The template nature of XSLT was not appealing to the database people. Not declarative enough.database people. Not declarative enough.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 14: Module 5 Introduction to XQuery

04/19/23 14

What is XQuery ?What is XQuery ?

A programming language that can express A programming language that can express arbitrary XML to XML data transformationsarbitrary XML to XML data transformations Logical/physical data independenceLogical/physical data independence ““Declarative” Declarative” ““High level”High level” ““Side-effect free”Side-effect free” ““Strongly typed” languageStrongly typed” language

““An expression language for XML.”An expression language for XML.” Commonalities with Commonalities with functionalfunctional programming, programming, imperativeimperative programming and programming and queryquery languages languages

The “The “queryquery” part might be a misnomer (***)” part might be a misnomer (***)

Page 15: Module 5 Introduction to XQuery

04/19/23 15

XQuery family of XQuery family of standardsstandards

••XQuery XQuery 1.0: An XML Query Language1.0: An XML Query Language:an XML-aware syntax for querying :an XML-aware syntax for querying collections of structured and semi-structured data both locally and over the Webcollections of structured and semi-structured data both locally and over the Web

•XSL Transformations (XSLT) Version 2.0•XSL Transformations (XSLT) Version 2.0:transforms data model instances :transforms data model instances (XML and non-XML) into other documents, including into XSL-FO for printing(XML and non-XML) into other documents, including into XSL-FO for printing

•XML Path Language (•XML Path Language (XPathXPath) 2.0) 2.0:expression syntax for referring to parts of XML :expression syntax for referring to parts of XML documentsdocuments

••XQuery XQuery 1.0 and1.0 and XPath XPath 2.0 Functions and Operators2.0 Functions and Operators:the functions you can :the functions you can call in XPath expressions and the operations you can perform on XPath 2.0 data call in XPath expressions and the operations you can perform on XPath 2.0 data typestypes

••XQuery XQuery 1.0 and1.0 and XPath XPath 2.0 Data Model (XDM)2.0 Data Model (XDM):representation and access for :representation and access for both XML and non-XML sourcesboth XML and non-XML sources

•XSLT 2.0 and•XSLT 2.0 and XQuery XQuery 1.0 Serialization1.0 Serialization:how to output the results of XSLT 2.0 :how to output the results of XSLT 2.0 and XML Query evaluation in XML, HTML or as textand XML Query evaluation in XML, HTML or as text

•XML Syntax for•XML Syntax for XQuery XQuery 1.0 (1.0 (XQueryXXQueryX)): an XML-aware syntax for querying : an XML-aware syntax for querying collections of structured and semi-structured data both locally and over the Webcollections of structured and semi-structured data both locally and over the Web

••XQuery XQuery 1.0 and1.0 and XPath XPath 2.0 Formal Semantics2.0 Formal Semantics:the type system used in XQuery :the type system used in XQuery and XSLT 2 via XPath defined precisely for implementersand XSLT 2 via XPath defined precisely for implementers

Page 16: Module 5 Introduction to XQuery

04/19/23 16

XQuery, Xpath, XSLTXQuery, Xpath, XSLT

Xpath 1.0

XSLT 2.0XQuery 1.0

Xpath 2.0

XSLT 1.0

uses

uses

extends, almost backwards compatible

extendsFLWOR expressionsNode constructorsValidation

1999

2007

Page 17: Module 5 Introduction to XQuery

04/19/23 17

Roadmap for todayRoadmap for today XQuery Data Model (XDM)XQuery Data Model (XDM) XQuery type systemXQuery type system Xquery environmentXquery environment XQuery basic constructsXQuery basic constructs variablesvariables constantsconstants function calls, function libraryfunction calls, function library arithmetic operationsarithmetic operations boolean operationsboolean operations path expressionspath expressions conditionalsconditionals

Page 18: Module 5 Introduction to XQuery

04/19/23 18

The need for an The need for an abstract XML data modelabstract XML data model

XML 1.0 specification only talks XML 1.0 specification only talks about charactersabout characters

We cannot have a programming We cannot have a programming language processing “characters” language processing “characters” (one by one)(one by one)

An XML abstract/logical data An XML abstract/logical data model !?model !?

Unfortunately too many of thoseUnfortunately too many of those Infoset, PSVI, DOM, Infoset, PSVI, DOM, XDMXDM, etc, etc

Page 19: Module 5 Introduction to XQuery

04/19/23 19

XML Data Model (XDM)XML Data Model (XDM) Abstract (I.e. logical) data model for XML dataAbstract (I.e. logical) data model for XML data Same role for XQuery as the relational data model Same role for XQuery as the relational data model

for SQLfor SQL Purely Purely logicallogical --- no --- no standardstandard storage or access storage or access

model (in purpose)model (in purpose) XQuery is XQuery is closedclosed with respect to the Data Model with respect to the Data Model

InfosetPSVI XML Data Model

XQueryXpath 2.0XSLT 2.0

Page 20: Module 5 Introduction to XQuery

04/19/23 20

XML Data model life XML Data model life cyclecycle

parse

validate

.xml

.xsd

XQueryData

Model

XQueryData

Model

Xpath 2.0

XQuery

XSLT 2.0

application- dependent

.xml

serialize

Page 21: Module 5 Introduction to XQuery

04/19/23 21

XML Data ModelXML Data Model

Instance of the data model: Instance of the data model: a a sequencesequence composed of zero or more composed of zero or more itemsitems

The The empty sequenceempty sequence often often considered as the considered as the “null value”“null value”

ItemsItems nodesnodes or or atomic valuesatomic values

NodesNodesdocument | element | attribute | text | namespaces | document | element | attribute | text | namespaces | PI | commentPI | comment

Atomic values Atomic values Instances of all XML Schema atomic typesInstances of all XML Schema atomic typesstring, boolean, ID, IDREF, decimal, QName, URI, ...string, boolean, ID, IDREF, decimal, QName, URI, ... untyped atomic valuesuntyped atomic values

TypedTyped (I.e. schema validated) and (I.e. schema validated) and untypeduntyped (I.e. non schema validated) nodes and values(I.e. non schema validated) nodes and values

Remember Lisp ?

Page 22: Module 5 Introduction to XQuery

04/19/23 22

SequencesSequences

Can be Can be heterogeneousheterogeneous (nodes (nodes and and atomic values) atomic values)

(<a/>, 3)(<a/>, 3) Can contain Can contain duplicates duplicates (by value and by (by value and by identity)identity)

(1,1,1)(1,1,1) AreAre notnot necessarily ordered in necessarily ordered in document orderdocument order Nested sequences are Nested sequences are automatically flattenedautomatically flattened

( 1, 2, (3, 4) ) = (1, 2, 3, 4)( 1, 2, (3, 4) ) = (1, 2, 3, 4) Single items and singleton sequences are the Single items and singleton sequences are the samesame

1 = (1)1 = (1)

Page 23: Module 5 Introduction to XQuery

04/19/23 23

Atomic valuesAtomic values

The values of the 19 The values of the 19 atomic typesatomic types available in XML Schema available in XML Schema E.g. xs:integer, xs:boolean, xs:dateE.g. xs:integer, xs:boolean, xs:date

All the All the user defined derived atomic user defined derived atomic typestypes E.g myNS:ShoeSizeE.g myNS:ShoeSize

xs:untypedAtomicxs:untypedAtomic Atomic values carry their type Atomic values carry their type together with the valuetogether with the value (8, myNS:ShoeSize) is not the same as (8, myNS:ShoeSize) is not the same as (8, xs:integer)(8, xs:integer)

Page 24: Module 5 Introduction to XQuery

04/19/23 24

XML nodesXML nodes

7 types of nodes:7 types of nodes: document | element | attribute | text | document | element | attribute | text | namespaces | PI | commentnamespaces | PI | comment

Every node has a unique Every node has a unique node identifiernode identifier Scope of node identifier uniqueness is Scope of node identifier uniqueness is implementation dependentimplementation dependent

Nodes have children and an optional parentNodes have children and an optional parent conceptual “conceptual “treetree””

Nodes are ordered based of the topological Nodes are ordered based of the topological order in the tree (“order in the tree (“document orderdocument order”)”)

Page 25: Module 5 Introduction to XQuery

04/19/23 25

Node accessorsNode accessors node-kind : xs:stringnode-kind : xs:string node-name : xs:Qname ?node-name : xs:Qname ? parent : node() ?parent : node() ? string-value : xs:stringstring-value : xs:string typed-value : xs:anyAtomicType *typed-value : xs:anyAtomicType * type-name : xs:Qname ?type-name : xs:Qname ? children : node()*children : node()* attributes : attribute() *attributes : attribute() * namespaces : node() *namespaces : node() *

Page 26: Module 5 Introduction to XQuery

04/19/23 26

Example of well formed Example of well formed XML dataXML data

<<bookbook yearyear=“1967”>=“1967”> <<titletitle>The politics of experience</>The politics of experience</titletitle>> <<authorauthor>R.D. Laing</>R.D. Laing</authorauthor>>

</</bookbook>> 3 element nodes, 1 attribute node, 5 text nodes3 element nodes, 1 attribute node, 5 text nodes

name(book element) = {-}:bookname(book element) = {-}:book In the absence of schema validationIn the absence of schema validation

type(book element) = xs:untypedtype(book element) = xs:untyped type(author element) = xs:untypedtype(author element) = xs:untyped type(year attribute) = xs:untypedAtomictype(year attribute) = xs:untypedAtomic typed-value(author element) = (“R.D. Laing” , typed-value(author element) = (“R.D. Laing” , xs:untypedAtomic)xs:untypedAtomic)

typed-value(year attribute) = (“1967”, typed-value(year attribute) = (“1967”, xs:untypedAtomic)xs:untypedAtomic)

Page 27: Module 5 Introduction to XQuery

04/19/23 27

XML schema exampleXML schema example

<type name=“<type name=“book-typebook-type”>”> <sequence><sequence> <attribute name=“<attribute name=“yearyear” type=“xs:integer”>” type=“xs:integer”> <element name=“<element name=“titletitle” type=“xs:string”>” type=“xs:string”> <sequence minoccurs=“0”><sequence minoccurs=“0”> <element name=“<element name=“authorauthor” type=“xs:string>” type=“xs:string> </sequence></sequence> </sequence></sequence></type></type><element name=“<element name=“bookbook” type=“” type=“book-typebook-type”>”>

Page 28: Module 5 Introduction to XQuery

04/19/23 28

Schema validated XML Schema validated XML datadata

<<bookbook yearyear=“1967” >=“1967” >

<<titletitle>The politics of experience</>The politics of experience</titletitle>>

<<authorauthor>R.D. Laing</>R.D. Laing</authorauthor>>

</</bookbook>> After schema validationAfter schema validation

type(book element) = {uri}:book-type type(book element) = {uri}:book-type type(author element) = xs:string type(author element) = xs:string type(year attribute) = xs:integer type(year attribute) = xs:integer typed-value(author element) = (“R.D. Laing” , xs:string)typed-value(author element) = (“R.D. Laing” , xs:string) typed-value(year attribute) = (1967 , xs:integer)typed-value(year attribute) = (1967 , xs:integer)

Schema validation impacts the data model Schema validation impacts the data model representation and therefore the XQuery representation and therefore the XQuery semantics!!semantics!!

Page 29: Module 5 Introduction to XQuery

04/19/23 29

Lexical and binary aspect Lexical and binary aspect

of the dataof the data Every node holds (logically) redundant Every node holds (logically) redundant information:information:

<a xsi:type=“xs:integer”>001</a><a xsi:type=“xs:integer”>001</a> dm:string-value () “001” as xs:stringdm:string-value () “001” as xs:string dm:typed-value ()dm:typed-value ()

““001” as an xs:untyped 001” as an xs:untyped beforebefore validation validation 1 as an xs:integer 1 as an xs:integer after after validationvalidation

Implementations can store :Implementations can store : The The string valuestring value

Retrieve the typed value dynamically based on the type, every Retrieve the typed value dynamically based on the type, every time is neededtime is needed

The The typed valuetyped value Retrieve an acceptable lexical value for that type every time Retrieve an acceptable lexical value for that type every time this is requiredthis is required

BothBoth In case of unvalidated data the two are the sameIn case of unvalidated data the two are the same

Page 30: Module 5 Introduction to XQuery

04/19/23 30

Typed vs. untyped XML Typed vs. untyped XML DataData

• Untyped data (non XML Schema validated)<a>3</a> eq 3<a>3</a> eq “3”

• Typed data (after XML Schema validation)<a xsi:type=“xs:integer”>3</a> eq 3<a xsi:type=“xs:string”>3</a> eq 3 <a xsi:type=“xs:integer”>3</a> eq “3”<a xsi:type=“xs:string”>3</a> eq “3”

Page 31: Module 5 Introduction to XQuery

04/19/23 31

XML data equivalenceXML data equivalence XQuery has multiple notions of data “equality”XQuery has multiple notions of data “equality”

““==“, ““, “eqeq”, “”, “isis”, “”, “fn:deep-equal()”fn:deep-equal()” Expected properties:Expected properties:

TransitivityTransitivity, , reflexivity reflexivity andand symmetry symmetry Necessary for grouping, indexing and hashingNecessary for grouping, indexing and hashing

Additional property:Additional property: if (if ( datadata11 equal equal datadata22 ) ) then ( then ( ff((datadata1)1) equalequal ff((datadata22)) )) Necessary for memoization, cachingNecessary for memoization, caching

None of the equality relationships above None of the equality relationships above (except “is”) satisfies those properties(except “is”) satisfies those properties

The “The “isis” relationship only applies to nodes” relationship only applies to nodes Careful implementations forCareful implementations for indexesindexes, , hashinghashing, , cachescaches

Page 32: Module 5 Introduction to XQuery

04/19/23 32

Document orderDocument order

<<bookbook yearyear=“1967” price=“45.32>=“1967” price=“45.32> <<titletitle>The politics of experience</>The politics of experience</titletitle>> <<authorauthor>R.D. Laing</>R.D. Laing</authorauthor>>

</</bookbook>>

How many nodes here ?How many nodes here ? What is the order between nodes ?What is the order between nodes ?

Page 33: Module 5 Introduction to XQuery

04/19/23 33

Document orderDocument order<<bookbook(n1)(n1) yearyear(n2)(n2) =“1967” price=“1967” price(n3)(n3)=“45.32>=“45.32>(n4)(n4)

<<titletitle(n5)(n5)>>(n6)(n6) The politics of The politics of experience</experience</titletitle>>(n7)(n7) <<authorauthor(n8)(n8)>>(n9)(n9) R.D. Laing</ R.D. Laing</authorauthor>>

</</bookbook>>

How many nodes here ? 9How many nodes here ? 9 What is the order between nodes ?What is the order between nodes ?

n1 before all the othersn1 before all the others order of n2 and n3 non-deterministicorder of n2 and n3 non-deterministic n2 and n3 are before n4,n5,n6,n7,n8,n9n2 and n3 are before n4,n5,n6,n7,n8,n9 n4<n5<n6<n7<n8<n9 (top-down, left to right among the n4<n5<n6<n7<n8<n9 (top-down, left to right among the

children)children)

Page 34: Module 5 Introduction to XQuery

04/19/23 34

XQuery type system XQuery type system

XQuery has a powerful (and complex!) type systemXQuery has a powerful (and complex!) type system XQuery types are imported from XML SchemasXQuery types are imported from XML Schemas Every XML data model instance has a dynamic typeEvery XML data model instance has a dynamic type Every XQuery expression has a static typeEvery XQuery expression has a static type Pessimistic static type inferencePessimistic static type inference The goal of the type system is:The goal of the type system is:

1.1. detect statically errors in the queriesdetect statically errors in the queries

2.2. infer the type of the result of valid queriesinfer the type of the result of valid queries

3.3. ensure statically that the result of a given query is of a given ensure statically that the result of a given query is of a given

(expected) type if the input dataset is guaranteed to be of a (expected) type if the input dataset is guaranteed to be of a

given typegiven type

Page 35: Module 5 Introduction to XQuery

04/19/23 35

XQuery type system XQuery type system componentscomponents

Atomic typesAtomic types xs:untypedAtomicxs:untypedAtomic All 19 primitive XML Schema typesAll 19 primitive XML Schema types All user defined atomic typesAll user defined atomic types

Empty, NoneEmpty, None Type constructors (simplification!)Type constructors (simplification!)

Elements: Elements: element name {type}element name {type} Attributes: Attributes: attribute name {type}attribute name {type} Alternation : Alternation : type1 | type 2type1 | type 2 Sequence: Sequence: type1, type2type1, type2 Repetition: Repetition: type*type* Interleaved product: Interleaved product: type1 & type2type1 & type2

• type1 intersect type2 ?• type1 subtype of type2 ?• type1 equals type2 ?

Page 36: Module 5 Introduction to XQuery

04/19/23 36

XML queriesXML queries An XQuery basic structure:An XQuery basic structure:

a a prologprolog + an + an expressionexpression Role of the prolog:Role of the prolog:

Populate the context where the expression is compiled Populate the context where the expression is compiled and evaluatedand evaluated

Prologue contains:Prologue contains: namespace definitionsnamespace definitions schema importsschema imports default element and function namespacedefault element and function namespace function definitionsfunction definitions collations declarationscollations declarations function library importsfunction library imports global and external variables definitionsglobal and external variables definitions etcetc

Page 37: Module 5 Introduction to XQuery

04/19/23 37

XQuery processingXQuery processing

Page 38: Module 5 Introduction to XQuery

04/19/23 38

XQuery expressionsXQuery expressionsXQuery Expr :=XQuery Expr :=Constants | Variable | FunctionCalls | Constants | Variable | FunctionCalls |

PathExpr |PathExpr |

ComparisonExpr | ArithmeticExpr | LogicExpr |ComparisonExpr | ArithmeticExpr | LogicExpr |

FLWRExpr | ConditionalExpr | QuantifiedExpr |FLWRExpr | ConditionalExpr | QuantifiedExpr |

TypeSwitchExpr | InstanceofExpr | CastExpr |TypeSwitchExpr | InstanceofExpr | CastExpr |

UnionExpr | IntersectExceptExpr |UnionExpr | IntersectExceptExpr |

ConstructorExpr | ValidateExprConstructorExpr | ValidateExpr

Expressions can be nested with full generality !Expressions can be nested with full generality !

Functional programming heritage (ML, Haskell, Lisp)Functional programming heritage (ML, Haskell, Lisp)

Page 39: Module 5 Introduction to XQuery

04/19/23 39

ConstantsConstantsXQuery grammar has built-in support for:XQuery grammar has built-in support for:

Strings:Strings: “125.0” or ‘125.0’“125.0” or ‘125.0’ Integers:Integers: 150150 Decimal:Decimal: 125.0125.0 Double:Double: 125.e2125.e2

19 other 19 other atomic typesatomic types available via XML Schema available via XML Schema Values can be constructed Values can be constructed

with constructors in F&O doc: with constructors in F&O doc: fn:true(), fn:true(), fn:date(“2002-5-20”)fn:date(“2002-5-20”)

by castingby casting by schema validationby schema validation

Page 40: Module 5 Introduction to XQuery

04/19/23 40

VariablesVariables $ + Qname (e.g. $x, $ns:foo)$ + Qname (e.g. $x, $ns:foo) bound, not assignedbound, not assigned XQuery does not allow variable assignmentXQuery does not allow variable assignment created by created by letlet, , forfor, , some/every, typeswitch some/every, typeswitch expressions, function parametersexpressions, function parameters

example:example:

let $x := ( 1, 2, 3 )let $x := ( 1, 2, 3 )return count($x)return count($x)

above scoping ends at conclusion of above scoping ends at conclusion of return return expressionexpression

Page 41: Module 5 Introduction to XQuery

04/19/23 41

A built-in function A built-in function samplersampler

fn:document(xs:anyURI)=> document?fn:document(xs:anyURI)=> document? fn:empty(item*) => booleanfn:empty(item*) => boolean fn:index-of(item*, item) => xs:unsignedInt?fn:index-of(item*, item) => xs:unsignedInt? fn:distinct-values(item*) => item*fn:distinct-values(item*) => item* fn:distinct-nodes(node*) => node*fn:distinct-nodes(node*) => node* fn:union(node*, node*) => node*fn:union(node*, node*) => node* fn:except(node*, node*) => node*fn:except(node*, node*) => node* fn:string-length(xs:string?) => xs:integer?fn:string-length(xs:string?) => xs:integer? fn:contains(xs:string, xs:string) => xs:booleanfn:contains(xs:string, xs:string) => xs:boolean fn:true() => xs:booleanfn:true() => xs:boolean fn:date(xs:string) => xs:datefn:date(xs:string) => xs:date fn:add-date(xs:date, xs:duration) => xs:datefn:add-date(xs:date, xs:duration) => xs:date

See Functions and Operators W3C See Functions and Operators W3C specificationspecification

Page 42: Module 5 Introduction to XQuery

04/19/23 42

AtomizationAtomization fn:data(item*) -> fn:data(item*) -> xs:anyAtomicType* Extracting the “value” of a node, or Extracting the “value” of a node, or returning the atomic valuereturning the atomic value

Implicitly applied:••Arithmetic expressionsArithmetic expressions••Comparison expressionsComparison expressions••Function calls and returnsFunction calls and returns••Cast expressionsCast expressions••Constructor expressions for various Constructor expressions for various kinds of nodeskinds of nodes

••order byorder by clauses in FLWOR expressions clauses in FLWOR expressions

Page 43: Module 5 Introduction to XQuery

04/19/23 43

Constructing sequencesConstructing sequences

(1, 2, 2, 3, 3, <a/>, <b/>)(1, 2, 2, 3, 3, <a/>, <b/>)

““,” is the sequence concatenation ,” is the sequence concatenation operatoroperator

Nested sequences are flattened:Nested sequences are flattened:

(1, 2, 2, (3, 3)) => (1, 2, 2, 3,3)(1, 2, 2, (3, 3)) => (1, 2, 2, 3,3)

range expressions:range expressions: (1 to 3) => (1, (1 to 3) => (1, 2,3)2,3)

Page 44: Module 5 Introduction to XQuery

04/19/23 44

Combining sequencesCombining sequences Union, Intersect, ExceptUnion, Intersect, Except Work only for sequences of nodes, not atomic Work only for sequences of nodes, not atomic valuesvalues

Eliminate duplicates and reorder to document Eliminate duplicates and reorder to document orderorder

$x := <a/>, $y := <b/>, $z := <c/>$x := <a/>, $y := <b/>, $z := <c/>

($x, $y) union ($y, $z) => (<a/>, <b/>, <c/>)($x, $y) union ($y, $z) => (<a/>, <b/>, <c/>)

F&O specification provides other functions & F&O specification provides other functions &

operators; eg. operators; eg. fn:distinct-values()fn:distinct-values() and and

fn:distinct-nodes()fn:distinct-nodes() particularly useful particularly useful

Page 45: Module 5 Introduction to XQuery

04/19/23 45

Arithmetic expressionsArithmetic expressions1 + 41 + 4 $a div 5$a div 55 div 65 div 6 $b mod 10$b mod 101 - (4 * 8.5)1 - (4 * 8.5) -55.5-55.5

<a>42</a> + 1 <a>baz</a> + 1 <a>42</a> + 1 <a>baz</a> + 1 validate {<a xsi:type=“xs:integer”>42</a> }+ 1validate {<a xsi:type=“xs:integer”>42</a> }+ 1 validate {<a xsi:type=“xs:string”>42</a> }+ 1validate {<a xsi:type=“xs:string”>42</a> }+ 1

Apply the following rules:Apply the following rules: atomizeatomize all operands. if either operand is (), => () all operands. if either operand is (), => () if an operand is untyped, cast to if an operand is untyped, cast to xs:double xs:double (if unable, (if unable, => => error)error)

if the operand types differ but can be if the operand types differ but can be promotedpromoted to to common type, do so (e.g.: common type, do so (e.g.: xs:integerxs:integer can be promoted to can be promoted to xs:doublexs:double))

if operator is consistent w/ types, apply it; result is if operator is consistent w/ types, apply it; result is either atomic value or either atomic value or errorerror

if type is not consistent, throw type exceptionif type is not consistent, throw type exception

Page 46: Module 5 Introduction to XQuery

04/19/23 46

Logical expressionsLogical expressions expr1 expr1 andand expr2 expr2 expr1 expr1 oror expr2 expr2 fn:notfn:not() as a function() as a function

return return true, false true, false Different from SQLDifferent from SQL

twotwo value logic, value logic, notnot three three value logic value logic Different from imperative languagesDifferent from imperative languages

andand, , oror are commutative in Xquery, but not in Java. are commutative in Xquery, but not in Java. if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) …..if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) …..

Non-deterministicNon-deterministicfalse and error => false false and error => false oror error ! (non-deterministically) error ! (non-deterministically)

• Rules:Rules: first compute the first compute the Boolean Effective Value (BEV)Boolean Effective Value (BEV) for each operand: for each operand:

if (), “”, NaN, 0, then return if (), “”, NaN, 0, then return falsefalse if the operand is of type xs:boolean, return it; if the operand is of type xs:boolean, return it; If operand is a sequence with first item a node, return trueIf operand is a sequence with first item a node, return true else raises an errorelse raises an error

then use standard two value Boolean logic on the two BEV's as then use standard two value Boolean logic on the two BEV's as appropriateappropriate

Page 47: Module 5 Introduction to XQuery

04/19/23 47

ComparisonsComparisons

Value for comparing single values

eq, ne, lt, le, gt, ge

General Existential quantification + automatic type coercion

=, !=, <=, <, >, >=

Node for testing identity of single nodes

is, isnot

Order testing relative position of one node vs. another (in document order)

<<, >>

Page 48: Module 5 Introduction to XQuery

04/19/23 48

Value and general Value and general comparisonscomparisons

<a>42</a> eq “42” true<a>42</a> eq “42” true <a>42</a> eq 42 error<a>42</a> eq 42 error <a>42</a> eq “42.0” false<a>42</a> eq “42.0” false <a>42</a> eq 42.0 error<a>42</a> eq 42.0 error <a>42</a> = 42 true<a>42</a> = 42 true <a>42</a> = 42.0 true<a>42</a> = 42.0 true <a>42</a> eq <b>42</b> true<a>42</a> eq <b>42</b> true <a>42</a> eq <b> 42</b> false<a>42</a> eq <b> 42</b> false <a>baz</a> eq 42 error<a>baz</a> eq 42 error () eq 42 ()() eq 42 () () = 42 false() = 42 false (<a>42</a>, <b>43</b>) = 42.0 true(<a>42</a>, <b>43</b>) = 42.0 true (<a>42</a>, <b>43</b>) = “42” true(<a>42</a>, <b>43</b>) = “42” true ns:shoesize(5) eq ns:hatsize(5) truens:shoesize(5) eq ns:hatsize(5) true (1,2) = (2,3) true(1,2) = (2,3) true

Page 49: Module 5 Introduction to XQuery

04/19/23 49

Algebraic properties of Algebraic properties of comparisonscomparisons

General comparisons not reflexive, General comparisons not reflexive, transitivetransitive (1,3) = (1,2) (1,3) = (1,2) (but also !=, <, >, <=, (but also !=, <, >, <=, >= !!!!!)>= !!!!!)

ReasonsReasons implicit existential quantification, dynamic castsimplicit existential quantification, dynamic casts

Negation rule does not holdNegation rule does not hold fn:not($x = $y) is not equivalent to $x != $yfn:not($x = $y) is not equivalent to $x != $y

General comparison not transitive, not reflexiveGeneral comparison not transitive, not reflexive Value comparisons are Value comparisons are almostalmost transitive transitive

Exception: Exception: xs:decimal due to the loss of precisionxs:decimal due to the loss of precisionImpact on grouping, hashing, indexing, caching !!!

Page 50: Module 5 Introduction to XQuery

04/19/23 50

XPath expressionsXPath expressions An expression that defines the set of nodes where the An expression that defines the set of nodes where the navigation starts + a series of selection steps that navigation starts + a series of selection steps that explain how to navigate into the XML treeexplain how to navigate into the XML tree

A step:A step: axisaxis ‘::’ ‘::’ nodeTestnodeTest

Axis control the navigation direction in the treeAxis control the navigation direction in the tree attribute, child, descendant, descendant-or-self, parent, selfattribute, child, descendant, descendant-or-self, parent, self The other Xpath 1.0 axes (The other Xpath 1.0 axes (following, following-sibling, preceding, following, following-sibling, preceding, preceding-sibling, ancestor, ancestor-or-selfpreceding-sibling, ancestor, ancestor-or-self) are optional in ) are optional in XQueryXQuery

Node test by:Node test by: Name Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* )*:* )

Kind of itemKind of item (e.g. node(), comment(), text() ) (e.g. node(), comment(), text() ) Type testType test (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer) (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer)

Page 51: Module 5 Introduction to XQuery

04/19/23 51

Examples of path Examples of path expressionsexpressions

document(“bibliography.xml”)/child::bibdocument(“bibliography.xml”)/child::bib $x/child::bib/child::book/attribute::year$x/child::bib/child::book/attribute::year $x/parent::*$x/parent::* $x/child::*/descendent::comment()$x/child::*/descendent::comment() $x/child::element(*, ns:PoType)$x/child::element(*, ns:PoType) $x/attribute::attribute(*, xs:integer)$x/attribute::attribute(*, xs:integer) $x/ancestors::document(schema-element(ns:PO))$x/ancestors::document(schema-element(ns:PO)) $x/(child::element(*, xs:date) | $x/(child::element(*, xs:date) |

attribute::attribute(*, xs:date)attribute::attribute(*, xs:date) $x/f(.)$x/f(.)

Page 52: Module 5 Introduction to XQuery

04/19/23 52

Xpath abbreviated Xpath abbreviated syntaxsyntax

Axis can be missingAxis can be missing By default the child axisBy default the child axis $x/$x/child::child::person -> $x/person person -> $x/person

Short-hands for common axesShort-hands for common axes Descendent-or-selfDescendent-or-self

$x/$x/descendant-or-self::*/child::descendant-or-self::*/child::comment()-> $xcomment()-> $x////comment()comment()

Parent Parent $x/$x/parent::*parent::* -> $x/ -> $x/....

AttributeAttribute$x/$x/attribute::attribute::year -> $x/year -> $x/@@yearyear

SelfSelf$x/$x/self::*self::* -> $x/ -> $x/..

Page 53: Module 5 Introduction to XQuery

04/19/23 53

Xpath filter Xpath filter predicatespredicates

Syntax:Syntax: expression1 expression1 [ [ expression2expression2 ] ]

[ ] is an overloaded operator[ ] is an overloaded operator Filtering by position (if numeric value) :Filtering by position (if numeric value) :

/book[3] /book[3] /book[3]/author[1] /book[3]/author[1] /book[3]/author[1 to 2] /book[3]/author[1 to 2]

Filtering by predicate :Filtering by predicate : //book [author/firstname = “ronald”]//book [author/firstname = “ronald”] //book [@price <25]//book [@price <25] //book [count(author [@gender=“female”] )>0 //book [count(author [@gender=“female”] )>0

Classical Xpath mistakeClassical Xpath mistake $x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]$x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]

Page 54: Module 5 Introduction to XQuery

04/19/23 54

Conditional expressions Conditional expressions

if ( $book/@year <1980 ) if ( $book/@year <1980 )

then ns:WS(<old>{$x/title}</old>) then ns:WS(<old>{$x/title}</old>)

else ns:WS(<new>{$x/title}</new>)else ns:WS(<new>{$x/title}</new>)

Only one branch allowed to raise Only one branch allowed to raise execution errorsexecution errors

Impacts scheduling and Impacts scheduling and parallelizationparallelization