db2 purexml experiences at arcelormittal gent davy … dec gse euroclear db2 xml... · db2 purexml...

41
DB2 pureXML experiences at ArcelorMittal Gent Davy Goethals GSE DB2 working group 04/12/2014 Euroclear Brussel

Upload: duongque

Post on 12-Mar-2018

245 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

DB2 pureXML experiences at ArcelorMittal Gent Davy Goethals

GSE DB2 working group 04/12/2014 Euroclear Brussel

Page 2: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Overview

• How to start : collecting doc and choosing a case • Producing XML documents using QMF • Creating and loading a table with XML columns • Querying the XML data with SQL and XPath • Validate XML data using an XML Schema • Create and use XML indexes

1

Page 3: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Starting with pureXML

2

Page 4: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Collecting doc

The Bible DB2 XML Guide

3

Presenter
Presentation Notes
In DB2 pureXML Cookbook, two of IBM's leading experts (Matthias Nicola & Pav Kumar-Chatterjee) provide the single most comprehensive coverage of DB2's pureXML capabilities. This book explains DB2 pureXML in more than 700 practical examples, including 250+ XQuery and SQL/XML queries, taking the reader from simple introductions all the way to advanced scenarios. The authors have distilled their hands-on experience with many pureXML applications so that you can benefit from best practices, tips & tricks, performance guidelines, and other gems that are not documented elsewhere. This book is invaluable for database administrators and application developers, beginners and DB2 experts. The topics are organized by typical user tasks throughout the life cycle of XML database projects, from planning, designing, and implementing databases all the way to tuning, problem determination, and application development. It includes code samples for Java, .NET, COBOL, PL/1, C, PHP, and Perl programmers. The DB2 pureXML Cookbook provides proven recipes rather than a mere reference of ingredients.
Page 5: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Collecting doc

XML Redbook IDUG presentations

4

Page 6: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Collecting XML sample data

• DB2 installation samples – DB2.NEW.SDSNSAMP(DSNTEJ1) – DB2.NEW.SDSNSAMP(DSNTEJ2H)

5

CREATOR NAME COLUMN Rows DSN8910 PRODUCT DESCRIPTIO

N 4

DSN8910 CUSTOMER INFO HISTORY

0

DSN8910 PURCHASEORDER

PORDER 0

DSN8910 CATALOG CATLOG 0 DSN8910 SUPPLIER ADDR 0

DB2 Express C 4

6

6

0 0

Presenter
Presentation Notes
In DB2 for z/OS, The installation job DSNTEJ1 creates five tables with XML columns. These tables are in the relational schema DSN8910 and are named PRODUCT, CUSTOMER, PURCHASEORDER, CATALOG, and SUPPLIERS. Only table DSN8910.PRODUCT is populated by the installation jobs. There are several ways to populate some of these tables. For example, if you have a DB2 for Linux, UNIX, and Windows installation, such as the free DB2 Express-C, you can create the sample database and select or export the data from there. The data can then be imported or inserted into the z/OS tables using SUPFI or an import job. The PDF document “DB2 Version 9.1 for z/OS XML Guide” (SC18-9858) provides the DDL and three INSERT statements with XML data for a table called MYCUSTOMER. You can copy and paste these statements into SPUFI to build a sample table to work with.
Page 7: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Collecting XML sample data

• Table DSN8910.PRODUCT column DESCRIPTION

6

<product pid="100-100-01"> <description> <name>Snow Shovel, Basic 22 inch</name> <details>Basic Snow Shovel, 22 inches wide, straight handle with D-grip</details> <price>9.99</price> <weight>1 kg</weight> </description> </product> <product pid="100-101-01"> <description> <name>Snow Shovel, Deluxe 24 inch</name> <details>A Deluxe Snow Shovel, 24 inches wide, ergonomic curved handle with D-Grip</details> <price>19.99</price> <weight>2 kg</weight> </description> </product>

Presenter
Presentation Notes
An XML document basically consists of elements with zero, one or more attributes. Each element consists of a <start tag> and an </end tag> . These tags are enclosed in angle brackets. Elements can have a value or contain other elements. Empty elements can have attributes and can be represented by a single <empty tag/> . Elements can occur multiple times. Attributes always have a value. A well-formed document has a single root element. The order of elements is significant. The order of attributes is not significant An XML document is case sensitive. This sample XML document is very simple in nature (no encoding schema, no XML version, no namespaces ,limited number of elements and attributes,….)
Page 8: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Collecting XML sample data

• Load IBM GSDB database in z/OS – “Great-outdoors-sample” : 153 tables 1Gb data – 1 XML column PTNR_ORDER in table GOSALESCT.

PTNR_ACTIVITY with 212 rows

7

<cust order> <cust_code> 100022 </cust_code> <cust_order_number> 181649 </cust_order_number> <cust_order_date> 4/9/2007 8:43 </cust_order_date> <cust_unique_items> 2 </cust_unique_items> <cust_order_details1 product_number="61110" cust_quantity="1" cust_unit_sale_price="74.91"/> <cust_order_details2 product_number="99110" cust_quantity="1" cust_unit_sale_price="11.64"/> </cust_order>

Presenter
Presentation Notes
The IBM GSDB sample database is available to use in your own projects and for learning about IBM products (like IBM Data Studio) . The sample database contains a rich set of sample data that follows the fictional Sample Outdoor company and its sales and operations. It can be downloaded from the web To set up the sample database on DB2 for z/OS, you run the setup scripts from a workstation and install the database on a cataloged remote DB2 for z/OS subsystem. It contains one table with an XML column and 212 rows of data. Beware that cust-order-details1 and cust_order_details2 are empty elements with attributes represented by a single empty tag.
Page 9: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Building our own PureXML case with QMF

8

Page 10: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Collecting XML sample data

• Only two samples found so far for DB2 on z/OS • Wanted a “richer case” to play with :

– More complicated XML documents – More rows , more volume – With an XML schema to test validation – But comprehensive for DBA and developers

• Decided to build an own case based on QMF exported data in XML format to use as proof-of-concept.

9

Page 11: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

• EXPORT DATA TO … (DATAFORMAT=XML – Export result of QMF query to TSO SEQ file or HFS Unix file – Result may already contain XML columns before exporting

• Uses z/OS XML parse services and z/OS Unicode services

– Data exported as Unicode UTF-8 XML file (CCSID 1208) • Header records • Metadata records for each column in the result set • Data records for each column in the result set

– Style sheet and XML schema provided • qmf_dataset.xslt and qmf_data.xsd

10

Presenter
Presentation Notes
In QMF you can export the result of a QMF query or a table in XML format by using the DATAFORMAT=XML clause on the EXPORT DATA or EXPORT TABLE command. This format must be used when the data contains XML columns but can also be used when the data or table to be exported does not contain XML columns. When you export data or tables in XML format, the data is exported to the HFS Unix file, the TSO data set, or the CICS data queue that is specified in the command. QMF uses the XML 1.0 specification (fourth edition) when exporting data. QMF uses z/OS XML parse services as well as z/OS Unicode conversion services when processing XML data for export , so these services must be configured and active. The result of exported XML data in QMF is always in Unicode UTF-8 format. The Unicode character set can include characters from almost all of the living languages of the world. In UTF-8, ASCII and control characters are represented by their usual ASCII single-byte codes, and other characters become two to four bytes long. The IBM UTF-8 implementation is defined by codepage 1208. UTF-8 stands for “UCS Transformation Format 8”
Page 12: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format • Query :

– SELECT NAME,SALARY FROM Q.STAFF FETCH FIRST 3 ROWS ONLY

• Command : – EXPORT DATA TO

'SIDDAGO.STAFF.XML'(DATAFORMAT=XML – EXPORT DATA TO '/data/local/clipbrd/Users/siddago/staff.xml'(DATAFORMAT=XML

11

NAME SALARY SANDERS 18357.50 PERNAL 18171.25 MARENGHI 17506.70

Presenter
Presentation Notes
To illustrate we can use a simple result set as shown above with 2 columns and 2 rows.
Page 13: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

• Header records

12

<?xml version="1.0" encoding="utf-8"?> <!-- A style sheet has been provided by the QMF product. You can find it in the QMF SAMPLE library. Copy it to the directory where you exported the file. The next comment is an example of a stylesheet statement you can remove the comments and use as is. --> <!-- ?xml-stylesheet type ="text/xsl" href="qmf_dataset.xslt" ? --> <DataSet xmlns="http://www.ibm.com/qmf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <!-- A schema file has been provided by the QMF product. You can find it in the QMF SAMPLE library. Copy it to the directory where you exported the file. If you would like to use it from a different directory you must enter an xsi:schemaLocation statement as the next statement in this document --> …….

Presenter
Presentation Notes
The header records in the exported XML file contain the version of XML used, the encoding scheme, and a line that references which style sheet to use to format the exported XML document. QMF provides a default style sheet “qmf_dataset.xslt” as member DSQ1STSH of the QMF samples data set SDSQSAPn . You can copy this default style sheet to the location where the exported file resides. When you uncomment the above style sheet statement , you can format the XML document to these specifications when opening it. There is also a namespace definition and an optional xsi:schemaLocation definition that can be added (as attributes of the root element). XML namespaces are used to provide uniquely named elements and attributes in an XML document by a prefix. QMF uses a default namespace “http://www.ibm.com/qmf" and a prefix namespace “http://www.w3.org/2001/XMLSchema-instance“ with prefix “:xsi“ . Elements without prefix in the document belong to the first namespace domain , elements and attributes with the prefix :xsi will belong to the second. Attributes without a prefix have no namespace. Beware that the namespaces are in the form of an URL , but these URL’s are not real web-adresses but just a unique identifier. In many XML documents the namespace :xsi=“http://www.w3.org/2001/ XMLSchema-instance“ has a special meaning. It is used to reference the XML schema of the document with a :xsi.SchemaLocation attribute (for documentation reasons but sometimes also for processing reasons) . The basic idea is that it works like a magic cookie. Either the processing software has been programmed to recognize it, and thus acts on the basis of what it means, or it has not. Other attributes of this namespace are xsi:type, xsi:nil and xsi:noNamespaceSchemaLocation') with fixed predefined meanings. In DB2 V10 the schema location attribute is used to match the registered schema version in the XSR directory during automatic validation . It consists of a namespace name and a schema location hint which uniquely define a registered schema in XSR. Ex : xsi:schemaLocation=“http://www.example.com/P02 http://www.example.com/P04.xsd” .
Page 14: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

• Metadata and data records

13

<ResultSet> <MetaData> <SourceDescription/> <ColumnsAmount>2</ColumnsAmount> <ColumnDescription id="1”> <Name>NAME</Name> <Label>NAME</Label> <Type>varchar</Type> <Width>9</Width> <Nullable>true</Nullable> <Format>plain</Format> </ColumnDescription> <ColumnDescription id="2"> <Name>SALARY</Name> <Label>SALARY</Label> <Type>decimal</Type> <Scale>2</Scale> <Precision>7</Precision> <Nullable>true</Nullable> <Format>plain</Format> </ColumnDescription> </MetaData> ……………

<Data> <Row id="0"> <Cell id="1”>SANDERS</Cell> <Cell id="2">18357.50</Cell> </Row> <Row id="1"> <Cell id="1”>PERNAL</Cell> <Cell id="2">18171.25</Cell> </Row> <Row id=“2"> <Cell id="1”>MARENGHI</Cell> <Cell id="2">17506.70</Cell> </Row> </Data> </ResultSet> <Extensions/> </DataSet>

Presenter
Presentation Notes
The column metadata consists of the number of columns, column names, column labels (if applicable), data types, data lengths, whether the data is null, and the format. The exported XML file contains one column-description block for each column. Only the non-empty elements are present (<Width> for varchar type and <Precision>, <Scale> for decimal type) . This is different from the relational model. The exported file contains one row-definition block for each row of exported data . Data records are in variable block spanned (VBS) format. A <cell> tag identifies each column in the row by number. The attribute “id” from <Cell> corresponds with the attribute “id” from <ColumnDescription> . Mind the empty elements <SourceDescription/> and <Extensions/> represented by an empty tag.
Page 15: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

14

Presenter
Presentation Notes
Because of the UTF-8 encoding scheme, you have to use the “display UTF8” command in ISPF to browse an exported XML file .
Page 16: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

15

Presenter
Presentation Notes
Because of the UTF-8 encoding scheme, you have to use the “display UTF8” command in ISPF to browse an exported XML file .
Page 17: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

• Sharing XML files between mainframe and Windows platform – Use FTP (sequential file or HFS file) – Use DFS shared directory (HFS only)

• DFS : IBM Distributed File Service for z/OS

• Export data to DFS share: /data/local/clipbrd/Users/siddago/staff.xml – 3270 screen : Use TSO ISHELL command + ISPF

browse/edit – Windows : use Notepad or Internet Explorer or other XML

product – Edit .xml file : easiest is from Windows because of UTF-8

format

16

Presenter
Presentation Notes
In our shop we use IBM's Distributed File Service to allow users to access and share data in a distributed environment across IBM and the Windows platform. DFS support includes DFS client and file server support for DCE. DCE support is provided by the IBM z/OS DCE Base Services element. The DFS implementation is based on source code developed by the Open Software Foundation (OSF). To use the DFS support, the DCE Base Services element of z/OS must be installed, configured, and run on the system. The easiest way to manipulate the content of an XML file is from the Windows platform. Many tools exist there. We used Windows Notepad and Windows Internet Explorer.
Page 18: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

• Windows Internet Explorer

17

Presenter
Presentation Notes
Here we show what happens if we open an XML exported file with Windows Internet Explorer The XML document is shown in a Windows Internet Explorer window (read only) . If the document is not a valid XML document, error messages are shown.
Page 19: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Exporting data in QMF to XML format

• Use of style sheet <?xml-stylesheet type ="text/xsl" href="qmf_dataset.xslt" ?>

18

Presenter
Presentation Notes
Here we show what happens if we copy the default QMF style sheet to the location where the exported file resides and uncomment the above style sheet statement. When opening the XML file with Windows Internet Explorer, the result is formatted according to the specifications in the style sheet.
Page 20: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Creating and loading a table with XML columns

• Building a case : – DB2 meta data table with a description of 2571 existing DB2

tables from our environment in QMF exported XML format

19

CREATE TABLE SIDDAGO.TABLE_XML (TB_CREATOR VARCHAR(128) NOT NULL, TB_NAME VARCHAR(128) NOT NULL, TB_DESCRIPTION XML NOT NULL, CO_DESCRIPTION XML NOT NULL, TB_ID VARCHAR(20) NOT NULL) IN SIDDAGO.SIDDAGOX ;

SELECT DBNAME, TSNAME, COLCOUNT, CARDF, AUDITING, CREATEDTS, TRANSLATE(REMARKS,'A','&'), TRANSLATE(LABEL,'A','&') FROM SYSIBM.SYSTABLES for this table

SELECT COLNO, NAME AS COLNAME, COLTYPE, LENGTH, SCALE, NULLS, TRANSLATE(REMARKS,'A','&') FROM SYSIBM.SYSCOLUMNS for all columns of this table

T0000001.xml T0000002.xml ………

C0000001.xml C0000002.xml ………

Presenter
Presentation Notes
We decided to build a test case with a DB2 table containing meta data of other existing DB2 tables in our environment. The new table SIDDAGO.TABLE_XML has 3 relational columns and 2 XML columns. The first and second relational columns are the table creator and table name of each table . The first XML column contains an XML document with the meta data of the corresponding table . The second XML column contains an XML document with the meta data of all columns of this table. Each XML document is produced by running a QMF query and exporting the result to an XML file on a shared DFS directory . The XML files are of the form T0000001.xml,T0000002.xml,T0000003.xml,…..T0002571.xml and C0000001.xml,C0000002.xml,C0000003.xml,…..C0002571.xml . The suffix 0000001,0000002,… of these files is stored in the relational column TB_ID for reference reasons. The QMF REXX script that creates all these xml files also builds an input file for the DB2 load utility to populate the table. We used the TRANSLATE function to substitute ‘&’ to ‘A’ for tables and columns having the XML special character ‘&‘ in their REMARK or LABEL column in the DB2 catalog.
Page 21: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Creating and loading a table with XML columns

• Building a case : – QMF REXX SCRIPT

• Create the Txxxxxxx.xml and Cxxxxxxx.xml files – Run query on sysibm.systables and sysibm.syscolumns for each table – Export data in XML format

• Create input SYSREC file for LOAD utility with file-reference-variables :

20

Q , APPLICANT , /data/local/clipbrd/Users/siddago/T0000001.xml , /data/local/clipbrd/Users/siddago/C0000001.xml , 0000001 Q , COMMAND_SYNONYMS , /data/local/clipbrd/Users/siddago/T0000002.xml , /data/local/clipbrd/Users/siddago/C0000002.xml , 0000002 Q , DSQ_RESERVED , /data/local/clipbrd/Users/siddago/T0000003.xml , /data/local/clipbrd/Users/siddago/C0000003.xml , 0000003 Q , ERROR_LOG , /data/local/clipbrd/Users/siddago/T0000004.xml , /data/local/clipbrd/Users/siddago/C0000004.xml , 0000004 Q , INTERVIEW , /data/local/clipbrd/Users/siddago/T0000005.xml , /data/local/clipbrd/Users/siddago/C0000005.xml , 0000005 Q , OBJECT_DATA , /data/local/clipbrd/Users/siddago/T0000006.xml , /data/local/clipbrd/Users/siddago/C0000006.xml , 0000006 Q , OBJECT_DIRECTORY , /data/local/clipbrd/Users/siddago/T0000007.xml , /data/local/clipbrd/Users/siddago/C0000007.xml , 0000007 Q , OBJECT_REMARKS , /data/local/clipbrd/Users/siddago/T0000008.xml , /data/local/clipbrd/Users/siddago/C0000008.xml , 0000008 Q , ORG , /data/local/clipbrd/Users/siddago/T0000009.xml , /data/local/clipbrd/Users/siddago/C0000009.xml , 0000009 Q , PARTS , /data/local/clipbrd/Users/siddago/T0000010.xml , /data/local/clipbrd/Users/siddago/C0000010.xml , 0000010

Presenter
Presentation Notes
The input SYSREC file for the LOAD utility is a VB file with delimited input and containing 2571 records, one for each table .
Page 22: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Creating and loading a table with XML columns

• LOADing the table – XML files are always parsed and

must contain valid XML – Add PRESERVE WHITESPACE

if desired – HFS ,PDS or PDSE – SYSREC must be in UTF-8

(CCSID 1208 same as .xml files)

– Provide DISCARD file to discard records with XML parsing errors SQLCODE = -20398

21

LOAD DATA RESUME YES LOG NO UNICODE CCSID(01208,00000,00000) INTO TABLE SIDDAGO.TABLE_XML (TB_CREATOR CHAR, TB_NAME CHAR, TB_DESCRIPTION CHAR CLOBF, CO_DESCRIPTION CHAR CLOBF, TB_ID CHAR) FORMAT DELIMITED SORTDEVT 3390 WORKDDN(TSYSUT1,TSORTOUT) DISCARDDN(TSYSDISC) ERRDDN(TSYSERR) MAPDDN(TSYSMAP)

Presenter
Presentation Notes
We used above LOAD syntax to LOAD the generated XML data in the meta data table. The TEMPLATE definitions for the LOAD utility are not shown in the above example but are standard. The input XML fields are always parsed to UTF-8 and stored in a parsed format in the XML table space. If a parsing error occurs SQLCODE-20398 is returned to the LOAD utility. You can discard input records with bad XML files by specifying a discard file. In our case 6 input records were discarded because of special XML characters other than ‘&’ in their input XML data . The input XML files can be loaded from a PDS, PDSE or HFS with V, VB or U format. The default behavior of LOAD Is not to preserve whitespace (CR, LF, tab) during parsing . We saw no difference between specifying CLOBF or BLOBF. You cannot use a SYSREC file coded in EBCDIC referring to .xml files with UTF-8 content. If the content of the .xml files is UTF-8 (as in our case) , the SYSREC file must also be coded in UTF-8 and UNICODE CCSID(01208,00000,00000) must be specified to indicate that the input file is in UTF-8 . We used following technique to convert EBCDIC strings to UTF-8 in our QMF REXX coding : convert the string to UTF-8 in HEX format : SELECT HEX(UNICODE_STR(‘string',UTF8)) FROM SYSIBM.SYSDUMMY1 convert back to string using the REXX x2c function : string = x2c(string)
Page 23: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Creating and loading a table with XML columns

• Compressing the XML table spaces – ALTER TABLESPACE

SIDDAGO.XTAB0001 PART 1 COMPRESS YES

– ALTER TABLESPACE SIDDAGO.XTAB0002 PART 1 COMPRESS YES

22

REORG TABLESPACE SIDDAGO.XTAB0001 NOSYSREC COPYDDN(TSYSCOPY) SHRLEVEL REFERENCE DRAIN_WAIT 20 RETRY 120 RETRY_DELAY 60 TIMEOUT TERM SORTDEVT 3390 WORKDDN(TSYSUT1,TSORTOUT) STATISTICS TABLE(ALL) INDEX(ALL) REPORT YES

.xml files XML TS XML TS compressed

Compression ratio

T files 190 MB 50 MB 4,5 MB 89 % C files 1200 MB 90 MB 21 MB 76 %

Presenter
Presentation Notes
Opposite to LOB table spaces, XML table spaces can be compressed .We used above SQL statements and REORG statements to enable the compression. Because of the storage in parsed format, the XML table spaces are much smaller than the native .xml files , but even then compression is still significant.
Page 24: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Querying the XML data with SQL and XPath

23

Page 25: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Querying the XML data with SQL and XPath

• Using XMLEXISTS function : get all tables of database DSQ1STBB

24

<ResultSet> <MetaData> <SourceDescription/> <ColumnsAmount>8</ColumnsAmount> <ColumnDescription id="1”> <Name>DBNAME</Name> <Label>DBNAME</Label> <Type>varchar</Type> <Width>24</Width> <Nullable>true</Nullable> <Format>plain</Format> </ColumnDescription> <ColumnDescription id="2"> <Name>TSNAME</Name> <Label>TSNAME</Label> …………… </MetaData> <Data> <Row id="0”> <Cell id="1">DSQ1STBB</Cell> <Cell id="2">DSQ1STBT</Cell> <Cell id="3">5</Cell> <Cell id="4">1.000E+01</Cell> <Cell id="5">C</Cell> <Cell id="6">1990-03-12-09.51.52.580000</Cell> <Cell id="7">QMF SAMPLE TABLE</Cell> <Cell id="8" /> </Row> </Data> </ResultSet>

SELECT TB_CREATOR,TB_NAME FROM TABLE_XML WHERE XMLEXISTS('declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row[@id="0"]/ Cell[@id="1" and text()="DSQ1STBB"]' PASSING TB_DESCRIPTION )

Table Q.APPLICANT T0000001.xml

SELECT TB_CREATOR,TB_NAME FROM TABLE_XML WHERE XMLEXISTS('declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row/ Cell[../@id="0" and ./@id="1" and ./text()="DSQ1STBB"]‘ PASSING TB_DESCRIPTION )

SELECT TB_CREATOR,TB_NAME FROM TABLE_XML WHERE XMLEXISTS('declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row[@id="0"]/ Cell[@id=/DataSet/ResultSet/MetaData/ ColumnDescription [Name="DBNAME"]/@id and text() ="DSQ1STBB"]‘ PASSING TB_DESCRIPTION)

Presenter
Presentation Notes
The TB_DESCRIPTION XML column contains meta data about a table (in this example Q.APPLICANT) : DBNAME=‘DSQ1STBB’ ; TSNAME=‘DSQ1STBT’ ; COLCOUNT =5 ; CARDF = 1.000E+01 ; AUDITING = ‘C’ ; CREATEDTS = ‘1990-03-12-09.51.52.580000 ‘ ; REMARKS = ‘QMF SAMPLE TABLE’ ; LABEL= ‘ ‘ A lot of different XPATH expressions are possible that lead to the above result . Some XPath characteristics : Everything is case sensitive Character strings are delimited by “ “; numeric fields have no delimiter : “0" is character string ; 0 is numeric Attention with sorting : 2 < 13 but "2" > "13" The used namespaces must always be present , including the default namespace PASSING defines the "initial context" predicates are delimited by [ ] ; and, or etc are possible in predicates ../ refers to the parent context ./ refers to the current context * is wildcard for tags // is wildcard for paths It’s best to avoid as much as possible wildcards for performance reasons, especially if indexes are present
Page 26: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Querying the XML data with SQL and XPath

• Using XMLEXISTS function : get all tables of database DSQ1STBB

25

TB_CREATOR TB_NAME

Q APPLICANT

Q INTERVIEW Q ORG Q PARTS Q PRODUCTS Q PROJECT Q SALES Q STAFF Q SUPPLIER

Presenter
Presentation Notes
The result came unexpectedly quick (because of parsed elements)
Page 27: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Querying the XML data with SQL and XPath

• Using XMLQUERY function : get number of columns and cardinality of table Q.APPLICANT

26

<ResultSet> <MetaData> <SourceDescription/> <ColumnsAmount>8</ColumnsAmount> <ColumnDescription id="1”> <Name>DBNAME</Name> <Label>DBNAME</Label> <Type>varchar</Type> <Width>24</Width> <Nullable>true</Nullable> <Format>plain</Format> </ColumnDescription> <ColumnDescription id="2"> <Name>TSNAME</Name> <Label>TSNAME</Label> …………… </MetaData> <Data> <Row id="0”> <Cell id="1">DSQ1STBB</Cell> <Cell id="2">DSQ1STBT</Cell> <Cell id="3”>5</Cell> <Cell id="4”>1.000E+01</Cell> <Cell id="5">C</Cell> <Cell id="6">1990-03-12-09.51.52.580000</Cell> <Cell id="7">QMF SAMPLE TABLE</Cell> <Cell id="8" /> </Row> </Data> </ResultSet>

SELECT XMLCAST( XMLQUERY('declare default element namespace "http://www.ibm.com/qmf"; $doc/DataSet/ResultSet/Data/Row[@id="0"]/ Cell[@id="3"]' PASSING TB_DESCRIPTION AS "doc") AS INTEGER) AS COLCOUNT ,XMLCAST( XMLQUERY('declare default element namespace "http://www.ibm.com/qmf"; $doc/DataSet/ResultSet/Data/Row[@id="0"]/ Cell[@id="4"]' PASSING TB_DESCRIPTION AS "doc") AS FLOAT) AS CARDF FROM TABLE_XML WHERE TB_CREATOR = 'Q' AND TB_NAME = 'APPLICANT'

Table Q.APPLICANT T0000001.xml

COLCOUNT CARDF

5 1.000E+01

Presenter
Presentation Notes
XMLQUERY always returns XML data type XMLCAST returns the value of an XML element as an SQL type (without tags and attributes). Use XMLCAST to cast from XML to another data type . XMLCAST can only be used when QMLQUERY returns one element ;otherwise return the result in native XML format with multiple elements or use the XMLTABLE function .
Page 28: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Querying the XML data with SQL and XPath • Using XMLTABLE function : get all columns of table Q.STAFF

27

<ResultSet> <MetaData> </MetaData> <Data> <Row id="0”> <Cell id="1">1</Cell> <Cell id="2">ID</Cell> <Cell id="3">SMALLINT</Cell> <Cell id="4">2</Cell> <Cell id="5">0</Cell> <Cell id="6">N</Cell> <Cell id=“7" /> </Row> <Row id=“1”> <Cell id="1">2</Cell> <Cell id="2">NAME</Cell> <Cell id="3">VARCHAR</Cell> <Cell id="4">9</Cell> <Cell id="5">0</Cell> <Cell id="6">Y</Cell> <Cell id=“7" /> </Row> <Row id=“2”> <Cell id="1">3</Cell> <Cell id="2">DEPT</Cell> <Cell id="3">SMALLINT</Cell> <Cell id="4">2</Cell> …………… </Data> </ResultSet>

SELECT X.* FROM TABLE_XML, XMLTABLE('declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row' PASSING CO_DESCRIPTION COLUMNS "COLNAME" VARCHAR(30) PATH 'declare default element namespace "http://www.ibm.com/qmf"; Cell[@id = /DataSet/ResultSet/MetaData/ ColumnDescription[Name="COLNAME"]/@id]‘ AS X WHERE TB_CREATOR = 'Q' AND TB_NAME = 'STAFF'

Table Q.STAFF C0002571.xml

COLNAME

ID

NAME

DEPT

JOB

YEARS

SALARY

COMM

Presenter
Presentation Notes
Use the XMLTABLE function to return XML data in relational format (table format) . Missing elements will be returned as the NULL value. SQL scalar functions and aggregate functions can be used on the result table afterwards. Namespaces have to be mentioned in all XPath statements . In this example we use the <MetaData> to find the id attribute of COLNAME : Cell[@id = /DataSet/ResultSet/MetaData/ColumnDescription[Name="COLNAME"]/@id]‘ which is equivalent to 'Cell[@id = “2”]'
Page 29: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Querying the XML data with SQL and XPath

• Using XMLTABLE function : get all column meta data of table Q.STAFF

28

<ResultSet> <MetaData> </MetaData> <Data> <Row id="0”> <Cell id="1">1</Cell> <Cell id="2">ID</Cell> <Cell id="3">SMALLINT</Cell> <Cell id="4">2</Cell> <Cell id="5">0</Cell> <Cell id="6">N</Cell> <Cell id=“7" /> </Row> <Row id=“1”> <Cell id="1">2</Cell> <Cell id="2">NAME</Cell> <Cell id="3">VARCHAR</Cell> <Cell id="4">9</Cell> <Cell id="5">0</Cell> <Cell id="6">Y</Cell> <Cell id=“7" /> </Row> <Row id=“2”> <Cell id="1">3</Cell> <Cell id="2">DEPT</Cell> <Cell id="3">SMALLINT</Cell> <Cell id="4">2</Cell> …………… </Data> </ResultSet>

SELECT X.* FROM TABLE_XML, XMLTABLE(XMLNAMESPACES(DEFAULT 'http://www.ibm.com/qmf'), '$doc/DataSet/ResultSet/Data/Row' PASSING CO_DESCRIPTION AS "doc" COLUMNS "COLNO" INTEGER PATH 'Cell[@id = “1”]' ,"COLNAME" VARCHAR(30) PATH 'Cell[@id = “2”]' ,"COLTYPE“ VARCHAR(10) PATH 'Cell[@id = “3”]' ,"LENGTH" INTEGER PATH 'Cell[@id = “4”]' ,"SCALE" INTEGER PATH 'Cell[@id = “5”]' )AS X WHERE TB_CREATOR = 'Q' AND TB_NAME = 'STAFF'

Table Q.STAFF C0002571.xml

Presenter
Presentation Notes
The XMLTABLE function contains one row-generating XQuery expression and, in the COLUMNS clause, multiple column-generating expressions. The row-generating expression is the beginning and is applied to each XML document in the XML column and produces one or multiple rows per document. The row-generating expression produces one element per document. The number of elements produced by the row-generating XQuery expression determines the number of rows produced by the XMLTABLE function. The COLUMNS clause transforms XML data into relational format. Each of the entries in this clause defines a column with a column name and an SQL data type. The row-generating expression provides the context for the column-generating expressions. This means that the column-generating expressions are not absolute paths, but relative to the row-generating expression. You can typically append the column-generating expressions to the row-generating expression to get an intuitive idea of what a given XMLTABLE function returns in its columns. The result set of the XMLTABLE query can be treated like any SQL table. You can query and manipulate it much like you use regular row sets or views. The column definitions in the COLUMNS clause can use any SQL data type, such as INTEGER, DECIMAL, CHAR, DATE, and so on. If an extracted XML value cannot be cast to the assigned SQL type, the query fails with an error message. Namespaces have to be present in the row-generating expression and in all column-generating expressions. Use the XMLNAMESPACES function to define the default namespace for all generating expressions.
Page 30: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Querying the XML data with SQL and XPath

• Using XMLTABLE function : get all column meta data of table Q.STAFF

29

COLNO COLNAME COLTYPE LENGTH SCALE 1 ID SMALLINT 2 0 2 NAME VARCHAR 9 0 3 DEPT SMALLINT 2 0 4 JOB CHAR 5 0 5 YEARS SMALLINT 2 0 6 SALARY DECIMAL 7 5 7 COMM DECIMAL 7 5

Presenter
Presentation Notes
Additional SQL scalar functions and aggregate functions can be used on the result table afterwards.
Page 31: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Validating XML data using an XML Schema

30

Page 32: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Validating XML data using an XML Schema

• Validating the XML documents in our TABLE_XML table – QMF provides default schema “qmf_data.xsd” (EBCDIC) – Must first be registered in XSR – DB2 V9 : only user controlled validation

• through DSN_XMLVALIDATE function – SYSIBM.DSN_XMLVALIDATE : SQL Builtin function

31

SELECT SYSIBM.DSN_XMLVALIDATE( XMLSERIALIZE(TB_DESCRIPTION AS CLOB), 'SYSXSR.QMF_01) , SYSIBM.DSN_XMLVALIDATE( XMLSERIALIZE(CO_DESCRIPTION AS CLOB), 'SYSXSR.QMF_01') FROM TABLE_XML WHERE TB_ID = '0000001'

INSERT INTO TABLE_XML SELECT TB_CREATOR,TB_NAME, SYSIBM.DSN_XMLVALIDATE(XMLSERIALIZE(TB_DESCRIPTION AS CLOB),'SYSXSR.QMF_01), SYSIBM.DSN_XMLVALIDATE(XMLSERIALIZE(CO_DESCRIPTION AS CLOB),'SYSXSR.QMF_01'), TB_ID FROM TABLE_XML_TEMP

Presenter
Presentation Notes
QMF provides a default schema “qmf_data.xsd” as member DSQ1SCEM of the QMF samples data set SDSQSAPn (in EBCDIC) The first implementation of XML validation was through user-defined-function SYSFUN.DSN_XMLVALIDATE from within the XMLPARSE function. This UDF is now deprecated (see APAR PK90040) ; always use the newer and better SYSIBM.DSN_XMLVALIDATE builtin function (2010). There are several forms of DSN_XML_VALIDATE available with one, two or three parameters . As an example we used the form shown above with two parameters. The first parameter is a clob-expression corresponding to the document to validate and the second parameter is the name of a registered schema in the XSR. The result is always an XML type . The form with one parameter will use the registered schema based on the namespace of the root element or the namespace and schema location hint specified in the xsi:schemaLocation attribute in the document. The form with three parameters will use the registered schema corresponding with the specified namespace and schema location hint. Because DB2 V9 only supports user controlled validation through the DSN_XMLVALIDATE function, validation cannot be done during the LOAD utility in V9. A technique to do it during LOAD time is to LOAD the XML data first to a temporary table and to use INSERT with DSN_XMLVALIDATE afterwards to the final target table. Or just issue a SELECT statement with DSN_XMLVALIDATE on the LOADed table to see if one or all XML documents are valid.
Page 33: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Validating XML data using an XML Schema

• QMF default schema “qmf_data.xsd” XSD = XML Schema Definition

32

<?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://www.ibm.com/qmf" xmlns:QMF="http://www.ibm.com/qmf" xmlns:xs=http://www.w3.org/2001/XMLSchema elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="DataSet"> <xs:annotation> <xs:documentation> Root element of Data Set (data and extensions descriptions as well) </xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element ref="QMF:ResultSet"/> <xs:element name="Extensions"> <xs:annotation> <xs:documentation> Additional information about formatting of this Data Set </xs:documentation> </xs:annotation> ………………….

………………………………. <xs:sequence> <xs:element name="Name" type="xs:string"/> <xs:element name="Label" type="xs:string" minOccurs="0"/> <xs:element name="Type" type="QMF:DataType"/> <xs:choice> <xs:sequence> <xs:element name="Scale“ type=“xs:integer”/> <xs:element name="Precision“ type="xs:integer"/> </xs:sequence> <xs:element name="Width" type="xs:integer"/> </xs:choice> <xs:element name="Nullable" type="QMF:Nullable" minOccurs="0"/> <xs:element name="Format" ……………………….

Presenter
Presentation Notes
A schema is also an XML document . It defines which elements an XML document can contain, how they are organized and which attributes and attribute types elements can be assigned. It is written in the W3C XML Definition language or XSD language. It is much richer an allows more complex semantic rules than the DTD language (Document Type Definition) . A schema will contain a targetNamespace which corresponds with the default namespace of the XML document it describes. It has also a default namespace and a prefixed :xs namespace. We found that one :xs prefix was missing in the QMF default schema for element “Scale” in the “type” attribute. (“integer” belongs to the :xs namespace and not to the default namespace of the schema) . This was solved by apar PM94433 too.
Page 34: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Validating XML data using an XML Schema

• Registering the QMF schema in the XSR – Using the XSR stored procedures

• SYSPROC.XSR_REGISTER (C) • SYSPROC.XSR_ADDSCHEMADOC (C) • SYSPROC.XSR_COMPLETE (Java) • SYSPROC.XSR_REMOVE (C)

– Using DB2 CLP commands on Windows platform • register xmlschema (register primary schema document) • add xmlschema (only if additional schema documents) • complete xmlschema (complete registration)

– XSR must be installed (DB2 DSNTIJRT installation job)

33

Presenter
Presentation Notes
The XML Schema Repository (XSR) consists of a DB2 database DSNXSR with 8 tables and 4 DB2 stored procedures (optional installation job DSNTIJRT) . A schema can be registered using the stored procedures from a z/OS script or application , the Windows DB2 CLP or a Windows tool supporting the XSR like IBM Data Studio. I used the CLP from DB2 Connect V9.5 and did not find an equivalent CLP command to remove a schema from the XSR. Because SYSPROC.XSR_COMPLETE is a Java stored procedure, your z/OS environment must be set up to be able to use Java stored procedures and the IBM Data Server Driver for JDBC and SQLJ.
Page 35: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Validating XML data using an XML Schema

• Registering the QMF schema in the XSR – Copy QMF910.SDSQSAPE(DSQ1SCEM) to qmf_data.xsd’on Windows with FTP

34

db2 => connect to DB2GD Database Connection Information Database server = DB2 z/OS 9.1.5 SQL authorization ID = SIDDAGO Local database alias = DB2GD db2 => register xmlschema 'qmf_data.xsd' from file://Q:\qmf_data.xsd as SYSXSR.QMF_01 DB20000I The REGISTER XMLSCHEMA command completed successfully. db2 => complete xmlschema SYSXSR.QMF_01 DB20000I The COMPLETE XMLSCHEMA command completed successfully. db2 => quit DB20000I The QUIT command completed successfully.

db2 => ? register xmlschema REGISTER XMLSCHEMA schema-URI FROM content-URI [WITH properties-URI] [AS relational-identifier][Sub-document-clause] [COMPLETE [WITH schema-properties-URI][ENABLE DECOMPOSITION]] Sub-document-clause: ADD document-URI FROM content-URI [WITH properties-URI] ... db2 => ? complete xmlschema COMPLETE XMLSCHEMA relational-identifier [WITH schema-properties-URI][ENABLE DECOMPOSITION]

Presenter
Presentation Notes
Here we show the commands used to register the QMF schema in the z/OS XSR from a DB2 Connect V9.5 CLP . Help is available using the ? Command
Page 36: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Validating XML data using an XML Schema

• Enabling Java stored procedures on z/OS for XSR_COMPLETE – Download “DB2 9 for z/OS XSR Setup and Troubleshooting”

from web – One additional problem : exception 'java.lang.UnsatisfiedLinkError:

DSNNVBAT

• SYSIBM.DSN_XMLVALIDATE fails with sqlcode-904 rc00D50005

35

//V91AWLJA PROC DB2SSN=V91A,NUMTCB=2,APPLENV=WLMJAVA //TCBNUM1 EXEC PGM=DSNX9WLM,TIME=NOLIMIT, // PARM='&DB2SSN,&NUMTCB,&APPLENV', // REGION=0M //STEPLIB DD DSN=DSN910.SDSNLOD2,DISP=SHR // DD DSN=DSN910.SDSNLOAD,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=DSN.RUNLIB.LOAD ,DISP=SHR add unauthorized lib //JAVAENV DD DSN=WLMJAVA.JSPENV,DISP=SHR //JSPDEBUG DD SYSOUT=A //CEEDUMP DD SYSOUT=A //SYSPRINT DD SYSOUT=A

SETPROG LPA,ADD,MODNAME=(GXLIMODV),DSNAME=SYS1.SIEALNKE after each IPL

Presenter
Presentation Notes
Because SYSPROC.XSR_COMPLETE is a Java stored procedure, your z/OS environment must be set up to be able to use Java stored procedures and the IBM Data Server Driver for JDBC and SQLJ. This was not our case. We encountered some difficulties to get it working. A very good list of actions to do and common error messages can be found in the mentioned article. Additionally we encountered two more problems. To solve the first problem we had to add an arbitrary non-APF authorized library to the steplib of the application environment for java stored procedures. To solve the second problem we had to do the shown SETPROG LPA command (to be repeated after each IPL)
Page 37: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Create and use XML indexes

36

Page 38: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Create and use XML indexes

37

SELECT TB_CREATOR,TB_NAME FROM TABLE_XML WHERE XMLEXISTS('declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row[@id="0"]/ Cell[@id="1" and text()="DSQ1STBB"]' PASSING TB_DESCRIPTION )

CREATE INDEX SIDDAGO.I_TABLE_XML_03 ON SIDDAGO.TABLE_XML (TB_DESCRIPTION ) GENERATE KEY USING XMLPATTERN 'declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row/ Cell/text()' AS SQL VARCHAR(254) NOT PADDED USING STOGROUP SYSDEFLT BUFFERPOOL BP2 CLOSE YES ;

CREATE INDEX SIDDAGO.I_TABLE_XML_01 ON SIDDAGO.TABLE_XML (TB_DESCRIPTION ) GENERATE KEY USING XMLPATTERN 'declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row/ Cell' AS SQL VARCHAR(254) NOT PADDED USING STOGROUP SYSDEFLT BUFFERPOOL BP2 CLOSE YES ;

CREATE INDEX SIDDAGO.I_TABLE_XML_02 ON SIDDAGO.TABLE_XML (TB_DESCRIPTION ) GENERATE KEY USING XMLPATTERN 'declare default element namespace "http://www.ibm.com/qmf"; /DataSet/ResultSet/Data/Row/ Cell/@id' AS SQL VARCHAR(2) NOT PADDED USING STOGROUP SYSDEFLT BUFFERPOOL BP2 CLOSE YES ;

Presenter
Presentation Notes
An XML index can be used to improve the efficiency of queries on XML documents that are stored in an XML column. Instead of providing access to the beginning of a document, index entries in an XML index provide access to nodes within the document by creating index keys based on XML pattern expressions. Because multiple parts of a XML document can satisfy an XML pattern, DB2 might generate multiple index keys when it inserts values for a single document into the index. Such an index can then be used when this pattern is present in a predicate. In the above case indexes I_TABLE_XML_01 and I_TABLE_XML_02 will contain 8 keys per document and I_TABLE_XML_03 only 7 because the <Cell> elements with id=“8” never contain a value (table LABELs are never present) . As with any XPath expression, the namespace must always be specified . An XMLPATTERN cannot contain a predicate [ … ] . For testing purposes we defined 3 different XML indexes : one on element node “<Cell>” , one on attribute node “id from <Cell>” and one on text node “text of <Cell>” . These indexes are so called “lean” indexes because they contain only fully qualified paths (no wildcards * or // ) which is recommended for optimal performance. For the above query we have 2 predicates : one on path “DataSet/ResultSet/Data/Row/ Cell/@id' and one on path “DataSet/ResultSet/Data/Row/ Cell/text()' . Because we have a mix of character values and numeric values we use the VARCHAR type and not the DECFLOAT type . Use DECFLOAT with care because it can only be used to index numeric values . All nodes with values that cannot be casted to DECFLOAT will not be indexed and those indexes will only be used to evaluate numeric predicates (that’s why incomplete DECFLOAT indexes are safe after all because they can never lead to incomplete results). Assure that the VARCHAR length is big enough to contain the element value or attribute value . We cannot use UNIQUE indexes because all rows of table TABLE_XML will contain documents with the same nodes .
Page 39: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Create and use XML indexes

• Eligible XML indexes : 3 conditions – The data type of the index and the predicate must be compatible

• VARCHAR versus DECFLOAT

– Text nodes must be treated the same way in the index and predicate • [element=value] versus [element/text()=value] • Ex : [Cell="DSQ1STBB“] versus [Cell/text()="DSQ1STBB"]

– The index must “contain” the query predicate • Index pattern is equally or less restrictive than the predicate pattern

• Use “LEAN” indexes for best performance • Consider the use of UNIQUE indexes

– To force uniqueness across and within all XML documents of an XML column

38

Presenter
Presentation Notes
An XML Index is eligible if it can be used to answer a query predicate. An XML index stores only values of nodes that match the XPath pattern and the data type in the index definition .
Page 40: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Create and use XML indexes • Check explain PLAN_TABLE column ACCESSTYPE

• Check explain DSN_STATEMNT_TABLE column PROCSU

39

ACCESS TYPE

ACCESNAME INDEX ONLY

M N Multiple index scan DX I_TABLE_XML_03 Y XML index scan DX I_TABLE_XML_02 Y XML index scan DI N Intersection

PROCMS PROCSU TOTAL_COST

2 118 6.021E+00

659 MS 338764 SU without indexes

Presenter
Presentation Notes
Run explain and check the PLAN_TABLE to see if the DB2 optimizer considers the use of the XML indexes . In this case DB2 will use indexes I_TABLE_XML_02 and I_TABLE_XML_03 as expected to resolve the double predicate on Cell[@id] and Cell[text()] . DB2 starts with the index on text() which is very selective in this case. The relevant ACCESSTYPE values for XML queries in the PLAN_TABLE are : M : multiple index scans followed by an intersection or union of the returned lists DX : an XML index scan on the index that is named in ACCESSNAME. It returns all documents that match the XMLPATTERN of the index and the predicate in the form of a DOCID list. DI : an intersection of multiple DOCID lists DU : a union of multiple DOCID lists R : table space scan on XML table space ; no XML indexes used To compare different access paths have a look at the PROCSU column to have an estimate of the query cost in Service Units. After dropping index I_TABLE_XML_03 DB2 will use indexes I_TABLE_XML_02 followed by I_TABLE_XML_01 in a similar way with a much higher cost of 5408 SU . After also dropping index I_TABLE_XML_02 , index I_TABLE_XML_01 will not be used and we get a scan of the complete XML table space (38967 SU)
Page 41: DB2 pureXML experiences at ArcelorMittal Gent Davy … dec GSE Euroclear DB2 XML... · DB2 pureXML experiences at ArcelorMittal Gent ... • Querying the XML data with SQL and XPath

Conclusion

PureXML is fun ! Try it out Questions ?

40