processing of structured documents part 3. 2 xml schema (continues…) zbuilding content models…...
DESCRIPTION
3 Nested choice and sequence groupsTRANSCRIPT
Processing of structured documents
Part 3
2
XML Schema (continues…)
Building content models…
a simplified view of the allowed structure of a complex type complexType -> annotations?, (simpleContent
| complexContent | ((all | choice | sequence | group)? , attrDecls))
3
Nested choice and sequence groups
<xsd:complexType name=”PurchaseOrderType”> <xsd:sequence> <xsd:choice> <xsd:group ref=”shipAndBill” /> <xsd:element name=”singleUSAddress” type=”USAddress” /> </xsd:choice> <xsd:element name=”items” type=”Items” /> </xsd:sequence>
4
Nested choice and sequence groups
<xsd:group name=”shipAndBill”> <xsd:sequence> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> </xsd:sequence></xsd:group>
5
An ’all’ group
An all group: all the elements in the group may appear once or not at all, and they may appear in any order minOccurrs and maxOccurs can be 0 or 1
limited to the top-level of any content modelhas to be the only child at the topgroup’s children must all be individual
elements (no groups), and no element in the content model may appear more than once
6
An ’all’ group
<xsd:complexType name=”PurchaseOrderType”> <xsd:all> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> <xsd:element ref=”comment” minOccurs=”0” /> <xsd:element name=”items” type=”Items” /> </xsd:all> <xsd:attribute name=”orderDate” type=”xsd:date” /> </xsd:complexType>
7
Occurrence constraintsGroups represented by ’group’, ’choice’,
’sequence’ and ’all’ may carry minOccurs and maxOccurs attributes
by combining and nesting the various groups, and by setting the values of minOccurs and maxOccurs, it is possible to represent any content model expressible with an XML 1.0 DTD ’all’ group provides additional expressive power
8
Attribute groupsAlso attribute definitions can be grouped
and named<xsd:element name=”item” > <xsd:complexType> <xsd:sequence> … </xsd:sequence> <xsd:attributeGroup ref=”ItemDelivery” /> </xsd:complexType></xsd:element>
<xsd:attributeGroup name=”ItemDelivery”> <xsd:attribute name=”partNum” type=”SKU” /> …</xsd:attributeGroup>
9
XML Path Language (XPath)
The ability to navigate through XML documents is needed in many applications of XML querying of XML documents creation of hypertext links to objects
that do not have unique identifiers formatting of document components for
presentation
10
XML Path Language (XPath)
XPath provides common syntax and semantics to address
parts of an XML document basic facilities for manipulation of strings,
numbers and booleans XPath uses a compact, non-XML syntax to
facilitate use of XPath within URIs and XML attribute values
11
XML Path Language (XPath)
Use e.g. as a pattern in XSLT:
<xsl:template match=”chapter/title”>
…
</xsl:template>
12
XML Path Language (XPath)
XPath operates on an XML document as a tree
every element in an XML document has a specific and unique contextual location any element in the document can be
identified by the steps it would take to reach it, either from the root element, or from some other fixed starting location
13
Data model of XPath
A conceptual model: no particular implementation is assumed
A tree contains nodes (7 types): root nodes element nodes text nodes attribute nodes namespace nodes processing instruction nodes comment nodes
14
Data model
Every node has a string-valuedocument order defined on all the nodes in
the document: root node is the first node element nodes in order of the occurrence of their
start tags attribute nodes and namespace nodes before the
children of the element namespace nodes before attribute nodes
parent - child, ancestor - descendant
15
Root node
The root of the treethe element node for the document
element is a child of the root nodeother children:
processing instruction nodes comment nodes
string-value: concatenation of the string-values of all text node descendants of the root node in document order
16
Element nodesAn element node for every element in the documentchildren:
element nodes (subelements) comment nodes processing instruction nodes text nodes (content)
entity references are expandedstring-value:
concatenation of the string-values of all text node descendants of the element node in document order
17
Attribute nodesEach element node has an associated set
of attribute nodes the element node is the parent of each of
these attribute nodes but: an attribute node is not a child of its
parent elementa defaulted attribute is treated the same
as a specified attribute
18
Attribute nodes
if an attribute was declared for the element with the default #IMPLIED, but the attribute was not specified on the element, there is no attribute node for this attribute
String-value: the normalized value as specified by the XML specification
19
Namespace nodes
Each element has an associated set of namespace nodes one for each distinct namespace prefix that is in
scope for the element one for the default namespace if one is in scope
for the elementThe element is the parent of each of these
namespace nodes, but a namespace node is not a child of its parent element
string-value: the namespace URI
20
PI nodes, comment nodes
There is a processing instruction node for every processing instruction
there is a comment node for every comment string-value: the content of the
comment not including <!-- and -->… except for PIs and comments in
document type declarations
21
Text nodes
Character data is grouped into text nodesas much character data as possible is
grouped into each text nodestring-value: the character datacharacters inside comments, processing
instructions and attribute values do not produce text nodes
22
Expressions
The primary syntactic construct in XPath is the expression
an expression is evaluated to yield an object, which has one of the following types node-set (unordered) boolean (true or false) number string
23
Location pathsrelative location paths
a path that starts from an existing location sequence of one or more location steps separated by / steps are composed from left to right the initial step selects a set of nodes relative to the
context node each node in this set is used as a context node for the
following step the sets of nodes identified by that step are unioned
together e.g. child:div/child:para
24
Location paths
An absolute location path consists of / optionally followed by a relative location path
A / by itself selects the root node of the document
if / is followed by a relative path, then the location path selects the set of nodes that would be selected by the relative location path relative to the root node
25
Location stepsA location step has three parts
an axis: the tree relationship between the nodes selected by the location step and the context node
a node test: the node type and name of the nodes selected by the location step
zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step
syntax: axis::node-test[expr][expr]… e.g. child::para[position()=1]
26
Location stepsThe node-set selected by the location step is the
node-set that results from generating an initial node-set from the axis and node-
test, and then filtering that node-set by each of the predicates in
turnthe initial node-set consists of the nodes
having the relationship to the context node specified by the axis, and
having the node type and name specified by the node test
27
Axeschilddescendant parentancestorfollowing-sibling
empty, if the context node is an attribute node or namespace node
preceding-sibling empty, if the context node is an attribute node or
namespace node
28
Axes
following all nodes in the same document as the context
node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes
preceding all nodes in the same document as the context
node that are before the context node in document order, excluding any ancestors and excluding attribute nodes and namespace nodes
29
Axes
attribute attribute nodes of the context node empty unless the context node is an element
namespace namespace nodes of the context node empty unless the context node is an element
self the context node itself
descendant-or-self, ancestor-or-self
30
Axes
The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document
31
Node tests
Every axis has a principal node type for the attribute axis: attribute for the namespace axis: name space for other axes: element
a node test both name and type have to match child::para
selects the para element children of the context nodeif the context node has no para children, it will select an
empty set of nodes
32
Node tests
Function node() represents any nodefunction text(), comment(), and
processing-instruction() represent any object of these specific types
33
Node testsA node test * is true for any node of the principal
node type child::*
selects all element children of the context node attribute::*
selects all attributes of the context nodetext()
true for any text nodecomment()processing-instruction()
may have an argument = name of the PI
34
Abbreviated syntax
child:: -> can be omitted from a location step; child is the default axis child::div/child::para -> div/para
attribute:: -> @ child::para[attribute::type=”warning”] ->
para[@type=”warning”]/descendant-or-self::node()/ -> //
//para selects any para element in the document div//para selects all para descendants of div
children (of the context node)
35
Abbreviated syntax
self::node() -> . (fullstop) .//para selects all para descendant elements
of the context nodeparent::node() -> ..
../title selects the title children of the parent of the context node
36
Predicates
An axis is either a forward axis or a reverse axis forward axis: an axis that only ever contains
the context node or nodes that are after the context node in document order
reverse axis: an axis that only ever contains the context node or nodes that are before the context node in document order
37
Predicatesthe proximity position of a member of a node-set
with respect to an axis: the position of the node in the node-set ordered in
document order if the axis is a forward axisreverse order if the axis is a reverse axis
the first position is 1a predicate filters a node-set to produce a new
node-set for each node in the node-set, the predicate expression
is evaluated with that node as the context node and with the proximity position of the node in the node-set
38
Predicates
If the predicate expression evaluates to true for that node, the node is included in the new node-set
the result of the evaluation is converted to a boolean if the result is a number, the result is true if the
number is equal to the context position otherwise, the result will be converted as if by a
call to the function boolean (see below) e.g. para[3] equals para[position()=3]
39
PredicatesContained element tests
the name of an element can appear in a predicate filter -> represents an element that must be present as a child
note[title]a note element is only selected if it directly contains
a title element note[title=”first note”]
true, if the content of the element is ’first note’ note[id(”123”)]
40
PredicatesAttribute tests
para[@type=’secret’]every ’para’ element with a ’type’ attribute
value of ’secret’
41
Expressions
boolean operators: or, andcomparisons: =, !=, <=, <, >=, >
in XML documents: < has to be converted to <
numeric operators: +, -, *, div, mod
42
Core functions
Node set functions number last() number position() number count(node-set) node-set id(object)
e.g. id(”foo”) selects the element with unique ID foo
43
Core functions
String functions string string(object?)
convert an object to a stringe.g. negative infinity -> -Infinity
string concat(string,string,string*)returns the concatenation of its arguments
boolean starts-with(string,string)returns true if the first argument string starts
with the second argument string
44
Core functionsString functions
boolean contains(string,string)returns true if the first string contains the second
string string substring-before(string,string)
e.g. substring-before(”1999/04/01”,”/”) returns 1999 string substring-after(string,string) string substring(string,number,number?)
e.g. substring(”12345”,2,3) returns ”234”e.g. substring(”12345”,2) returns ”2345”
45
Core functions
String functions number string-length(string?)
default: the string-value of the context node string normalize-space(string?)
returns the string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space
string translate(string,string,string)e.g. translate(”bar”,”abc”,”ABC”) returns BA r
46
Core functionsBoolean functions
boolean boolean(object)convert the argument to a booleane.g. a number is true if and only if it is neither
positive or negative zero nor NaN (not-a-number)e.g. a node-set is true iff it is non-empty
boolean not(boolean) boolean true(), boolean false() boolean lang(string)
attribute xml:lang
47
Core functions
Number functions number number(object?)
converts its argument to a numbere.g. boolean true -> 1; boolean false -> 0e.g. a string -> mathematical value or NaN
number sum(node-set) number floor(number), number
ceiling(number), number round(number)
48
Examples para selects the para element children of the context
node * selects all element children text() selects all text node children @name selects the name attribute @* selects all the attributes para[1] selects the first para child para[last()] selects the last para child */para selects all para grandchildren /doc/chapter[5]/section[2] selects the second section of
the fifth chapter of the doc (root)
49
Examples chapter//para selects the para element descendants
of the chapter element children //para selects all the para descendants of the
document root and thus selects all the para elements in the same document as the context node
//olist/item selects all the item elements in the same document as the context node that have an olist parent
. selects the context node .//para selects the para element descendants .. selects the parent ../@lang selects the lang attribute of the parent
50
Examples para[@type=”warning”] selects all para
children of the context node that have a type attribute with value warning
para[@type=”warning”][5] selects the fifth para child of the context node that has a type attribute with value warning
para[5][@type=”warning”] selects the fifth para child of the context node if that child has a type attribute with value warning
51
Examples chapter[title=”Introduction”] selects the chapter
children of the context node that have one or more title children with string-value equal to Introduction
chapter[title] selects the chapter children of the context node that have one or more title children
employee[@secretary and @assistant] selects all the employee children of the context node that have both a secretary attribute and an assistant attribute