ibminfospherestreams-databasetoolkit

56
IBM InfoSphere Streams Version 2.0.0.4 Database Toolkit

Upload: satish-gopalani

Post on 24-Oct-2014

26 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IBMInfoSphereStreams-DatabaseToolkit

IBM InfoSphere StreamsVersion 2.0.0.4

Database Toolkit

���

Page 2: IBMInfoSphereStreams-DatabaseToolkit
Page 3: IBMInfoSphereStreams-DatabaseToolkit

IBM InfoSphere StreamsVersion 2.0.0.4

Database Toolkit

���

Page 4: IBMInfoSphereStreams-DatabaseToolkit

NoteBefore using this information and the product it supports, read the general information under “Notices” on page 43.

Edition Notice

This document contains proprietary information of IBM. It is provided under a license agreement and is protectedby copyright law. The information contained in this publication does not include any product warranties, and anystatements provided in this manual should not be interpreted as such.

You can order IBM publications online or through your local IBM representative.v To order publications online, go to the IBM Publications Center at www.ibm.com/e-business/linkweb/

publications/servlet/pbi.wss

v To find your local IBM representative, go to the IBM Directory of Worldwide Contacts at www.ibm.com/planetwide

When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in anyway it believes appropriate without incurring any obligation to you.

© Copyright IBM Corporation 2009, 2012.US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contractwith IBM Corp.

Page 5: IBMInfoSphereStreams-DatabaseToolkit

Summary of changes

This topic describes updates to this documentation for IBM® InfoSphere® StreamsVersion 2.0 (all releases).

Note: The following revision characters are used in the InfoSphere Streamsdocumentation to indicate updates for Version 2.0.0.4:v In PDF files, updates are indicated by a vertical bar (|) to the left of each

new or changed line of text.v In HTML files, updates are surrounded by double angle brackets

(>> and <<).

Updates for Version 2.0.0.4 (Version 2.0, Fix Pack 4)v The following restrictions apply to Red Hat Enterprise Linux Version 6 (RHEL 6)

only:– On x86 systems running RHEL 6, Oracle databases are not supported.– On IBM POWER7® systems running RHEL 6, IBM solidDB®, Netezza®, and

Oracle databases are not supported.

These restrictions are added to Chapter 2, “How to use the Database Toolkit,” onpage 3 and Chapter 3, “Known issues and restrictions,” on page 33.

v Streams applications can use the ODBCRun operator to run generic user-definedSQL statements to manage data, work with tables, and call stored procedures.For more information about this operator, see “ODBCRun” on page 10.

v The ODBCAppend, ODBCRun, and the DB2PartitionedAppend operators have theoptional commitOnPunctuation parameter that allows you to specify whethertransactions are committed when the operator receives a punctuation. For moreinformation about this parameter, see “ODBCAppend” on page 7, “ODBCRun”on page 10, and “DB2PartitionedAppend” on page 18.

v The Connections Specifications Document now supports a statement element aspart of an access_specification element. The statement element specifiesinformation used by the ODBCRun operator to run an SQL statement. For moreinformation, see “Statement element” on page 26.

Updates for Version 2.0.0.3 (Version 2.0, Fix Pack 3)v The Database Toolkit includes operators that write data to a partitioned DB2®

database. Streams applications can use these operators to write data to DB2partitioned tables using parallel write operations for each of the partitions. Formore information about these operators, see “DB2SplitDB” on page 17 and“DB2PartitionedAppend” on page 18.

v The Database Toolkit includes the source files to build a db2helper program forthe DB2 libraries installed on your system. You can use this program todetermine the number of partitions in the database.– For more information about building the db2helper program, see “Building

db2helper” on page 37.– For more information about the db2helper options, see “Using db2helper” on

page 37.v An optional sleepTime parameter is added to the ODBCSource operator, which

specifies the minimal time the operator has to wait before it can execute a queryagain. For more information, see “ODBCSource” on page 13.

© Copyright IBM Corp. 2009, 2012 iii

Page 6: IBMInfoSphereStreams-DatabaseToolkit

v An optional key attribute is added to the attribute element to identify the keyfield in a table. For more information, see “Attribute element” on page 31.

v The droppedTuples metric is added to the ODBCAppend, ODBCEnrich, DB2SplitDB,and DB2PartitionedAppend operators to track the number of input tuples that areassociated with ODBC or DB2 failures. For more information about how each ofthese operators uses this metric, see the metrics section of the related operator.

v Due to internal performance enhancements made to the ODBCAppend operator, theoperator might use more memory at runtime than it did in previous releases.

Updates for Version 2.0.0.2 (Version 2.0, Fix Pack 2)

The Database Toolkit includes the source files to build an odbchelper program forthe UnixODBC package installed on your system. You can now use the odbchelperprogram options to run SQL commands in the sample applications and to test theconnection to an external data source.v For more information about the odbchelper options, see “Using odbchelper” on

page 35.v For more information about building the odbchelper program, see “Building

odbchelper” on page 35.

Updates for Version 2.0.0.1 (Version 2.0, Fix Pack 1)

This guide was not updated for Version 2.0.0.1.

Updates for Version 2.0

The Database Toolkit, which is made up of a subset of the operators that wasformerly named the Adapters Toolkit, was originally written for the SPADElanguage in earlier versions of the Streams product. This version of the toolkit waswritten for the IBM Streams Processing Language (SPL). The function of the earlierSPADE version and the new SPL version of the toolkit is equivalent. All operatorand parameter names are the same. The output from the operators in SPL are thesame as the output of the operators in SPADE if the same input data, parameters,and external data configurations are used.

iv IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 7: IBMInfoSphereStreams-DatabaseToolkit

Contents

Summary of changes. . . . . . . . . iii

Chapter 1. Overview . . . . . . . . . 1

Chapter 2. How to use the DatabaseToolkit . . . . . . . . . . . . . . . 3Operator common parameters . . . . . . . . 5Operator error output port . . . . . . . . . 6Operators . . . . . . . . . . . . . . . 7

ODBCAppend . . . . . . . . . . . . . 7ODBCEnrich . . . . . . . . . . . . . 8ODBCRun . . . . . . . . . . . . . . 10ODBCSource . . . . . . . . . . . . . 13SolidDBEnrich . . . . . . . . . . . . 15DB2SplitDB . . . . . . . . . . . . . 17DB2PartitionedAppend . . . . . . . . . 18

Operator runtime error conditions . . . . . . . 20ODBC and DB2 operators runtime errorconditions . . . . . . . . . . . . . . 20SolidDBEnrich operator runtime error conditions 22

Connection Specifications Document . . . . . . 22Connection_specification Element . . . . . . 23Access_specification element. . . . . . . . 24

Chapter 3. Known issues andrestrictions . . . . . . . . . . . . . 33

Chapter 4. Connection setup anddebug . . . . . . . . . . . . . . . 35Building odbchelper . . . . . . . . . . . 35Using odbchelper . . . . . . . . . . . . 35

Chapter 5. DB2 partition layout anddebug . . . . . . . . . . . . . . . 37Building db2helper . . . . . . . . . . . . 37Using db2helper . . . . . . . . . . . . . 37

Chapter 6. Sample applications . . . . 39Working with the samples in the command-lineenvironment . . . . . . . . . . . . . . 39

Updating database configuration information . . 39Working with the samples in Streams Studio . . . 40

Updating database configuration information . . 41

Notices . . . . . . . . . . . . . . 43

© Copyright IBM Corp. 2009, 2012 v

||

Page 8: IBMInfoSphereStreams-DatabaseToolkit

vi IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 9: IBMInfoSphereStreams-DatabaseToolkit

Chapter 1. Overview

IBM InfoSphere Streams applications process streams of data flowing from externalsources and convert result streams to external formats to be used by componentsthat are not part of InfoSphere Streams. Additionally, Streams applications canmerge data from external repositories with internal streams, enriching theircontents. The IBM Streams Processing Language (SPL) Standard Toolkit includessource and sink operators, which provide generic adapters for files and networksockets. However, much of the world's data is stored in and made available bydata systems and products with higher-level interfaces than files and sockets. TheDatabase Toolkit provides a set of SPL operators that allow easy integration withsuch external data systems.

The Database Toolkit includes these operators:v The ODBCAppend operator stores a stream in a DBMS table. A row is appended to

the table for each input stream tuple, using an SQL INSERT statement.v The ODBCEnrich operator generates a stream from an input tuple and the result

set of an SQL SELECT statement.v The ODBCRun operator generates a stream from a generic user-defined SQL

statement.v The ODBCSource operator generates a stream from the rows of the result set of an

SQL SELECT statement.v The SolidDBEnrich operator generates a stream from an input tuple and the

result set of a solidDB table query.

The Database Toolkit provides a set of operators that write data to a partitionedDB2 database using parallel write operations for each partition. Streamsapplications that process huge volumes of data can use these operators to provideimproved performance when writing data to partitioned tables.

The Database toolkit includes the following operators to write to a partitioned DB2database:v The DB2SplitDB operator determines the partition to use to write the input

tuples.v The DB2PartitionedAppend operator appends input tuples to a table in the

specified partition. A row is appended to the table for each input tuple, using anSQL INSERT statement.

Important: The DB2SplitDB and DB2PartitionedAppend operators require the 9.7version of the DB2 database to be installed on your system.

Note that the Database Toolkit is installed as part of the IBM InfoSphere Streamsproduct installation. For information about installing this product, see the IBMInfoSphere Streams: Installation and Administration Guide.

© Copyright IBM Corp. 2009, 2012 1

||

Page 10: IBMInfoSphereStreams-DatabaseToolkit

2 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 11: IBMInfoSphereStreams-DatabaseToolkit

Chapter 2. How to use the Database Toolkit

The Database Toolkit operators must be configured to connect to an external dataservice and to access specific data from that service. This configuration informationis specified in an XML document that is separate from the SPL application. Thereare two main reasons for this. First, this configuration information is oftencomplex, detailed, and specific to a particular vendor or vendor product. The sameconfiguration information is often shared by many operator declarations either in asingle application or across multiple applications. Repeating the same information,in several operator declarations, multiplies the opportunity for errors and isdifficult to keep consistent. We choose to consolidate the configuration informationto make it easier to maintain both the information itself and the SPL programs thataccess it. Second, the people who understand how to configure the external dataservices are often not the same people who are developing the SPL applications.Separating the configuration information from the SPL application allows thepeople in the two roles to work more independently of each other with less needfor low-level coordination.

We describe how the operators in the Database Toolkit specify the external dataservice configuration file in “Operators” on page 7. The format of this XML file isdescribed in “Connection Specifications Document” on page 22. Although theoperators of the Database Toolkit access data from external data services, they donot define entities in those services or otherwise manage the data or the service.External data services are managed by tools and processes supplied by theirvendors independently from the operators of the Database Toolkit and the SPLapplications that use them. For example, the ODBCAppend operator inserts rows intoa table in a DBMS (see “ODBCAppend” on page 7). The ODBCAppend operator doesnot attempt to create the table; if it does not already exist, the ODBCAppend operatorwill issue an error.

Applications that contain Database Toolkit operators are compiled with the SPLcompiler command, sc. To compile an application containing Database Toolkitoperators, you must specify the toolkit install directory in either theSTREAMS_SPLPATH environment variable, or with the -t option of the sc compilercommand.

The following is an example using the STREAMS_SPLPATH environment variable:export STREAMS_SPLPATH=$STREAMS_INSTALL/toolkits/com.ibm.streams.db

The following is an example using the -t option of the sc compiler command:sc -t $STREAMS_INSTALL/toolkits/com.ibm.streams.db -M MyMain

Each operator requires a set of environment variables to be set at applicationcompile time. These environment variables provide information needed to compilethe application, as described below.v ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource

These operators support access to databases that implement the ODBCspecification. The following table lists the specific databases that are supportedby these operators, and the environment variable that needs to be defined inorder to use them. Exactly one of the environment variables from the table mustbe defined, according to which database you are choosing to be targeted by the

© Copyright IBM Corp. 2009, 2012 3

|

|||||

Page 12: IBMInfoSphereStreams-DatabaseToolkit

operator's code generation at compile time. The value that the environmentvariable is assigned is not important, only that the variable itself be defined tosome value.

Restrictions: The following restrictions apply to Red Hat Enterprise LinuxVersion 6 (RHEL 6) only:– On x86 systems running RHEL 6, Oracle databases are not supported.– On IBM POWER7 systems running RHEL 6, IBM solidDB, Netezza, and

Oracle databases are not supported.

Table 1. Supported databases and the corresponding environment variables

Database Product Version Environment Variable

DB2 Runtime Client andDB2 Client

9.7 STREAMS_ADAPTERS_ODBC_DB2

IBM Data Server Client andIBM Data Server RuntimeClient

9.5 STREAMS_ADAPTERS_ODBC_DB2

IBM Informix® DynamicServer

11.50 STREAMS_ADAPTERS_ODBC_IDS

Oracle Database 11gRelease2

STREAMS_ADAPTERS_ODBC_ORACLE

IBM solidDB 6.5 STREAMS_ADAPTERS_ODBC_SOLID

MySQL 5.1 STREAMS_ADAPTERS_ODBC_MYSQL

Microsoft SQL Server 2008 STREAMS_ADAPTERS_ODBC_SQLSERVER

Netezza 6.0 STREAMS_ADAPTERS_ODBC_NETEZZA

In addition to assigning some value to exactly one of the environment variableslisted above, you must set the environment variablesSTREAMS_ADAPTERS_ODBC_INCPATH and STREAMS_ADAPTERS_ODBC_LIBPATH to theinstalled directory locations of the header files and the libraries for the databaseproduct that you are using.The operators also allow for additional databases that support ODBC via theUnixODBC driver to be configured. To use this capability, you should define theenvironment variable STREAMS_ADAPTERS_ODBC_UNIX_OTHER.The following table lists the drivers that are supported for the databases listed inTable 1.

Table 2. Supported drivers

Database Product Version Supported Driver

DB2 Runtime Client and DB2 Client 9.7 DB2 ODBC

IBM Data Server Client and IBM DataServer Runtime Client

9.5 IBM Data Server ODBC

IBM Informix Dynamic Server 11.50 IBM Informix ODBC

Oracle Database 11g Release 2 UnixODBC

IBM solidDB 6.5 UnixODBC

MySQL 5.1 UnixODBC

Microsoft SQL Server 2008 UnixODBC

Netezza 6.0 UnixODBC

v SolidDBEnrich

4 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

|||

||

|

||

||

|||

||||

|||

||

||||

||||

|

|||

|||

|||

||||

|||||

|||

||

||

|||

|||

||||

|||

|||

|||

|||

|||

||||

|

Page 13: IBMInfoSphereStreams-DatabaseToolkit

This operator supports access to solidDB databases using IBM solidDB 6.5. Theenvironment variables STREAMS_ADAPTERS_SOLIDDB_LIBPATH andSTREAMS_ADAPTERS_SOLIDDB_INCPATH must be defined to be the path names of thedirectories where the external libraries and header files for solidDB 6.5 areinstalled.

v DB2SplitDB, DB2PartitionedAppendThese operators support access to DB2 databases using a partitioned databaseconfiguration. The environment variables STREAMS_ADAPTERS_DB2_LIBPATH andSTREAMS_ADAPTERS_DB2_INCPATH must be defined to be the path names of thedirectories where the external libraries and header files for DB2 are installed.

Note: If you are using both ODBC and DB2 operators in your application toaccess a DB2 database, you must set both sets of _LIBPATH and _INCLUDEPATHvariables described above.

Operator common parametersThe Database Toolkit operators are similar to the adapter operators in the SPLStandard Toolkit. For more information about the Adapter Operators, see the IBMStreams Processing Language Standard Toolkit Reference. The Database Toolkitoperators are defined as SPL primitive operators.

All of the operators in the Database Toolkit use three common parameters. Theseparameters describe the connection specifications document that the operator uses(for details about the contents of a connection specifications document, see“Connection Specifications Document” on page 22) as well as the particularconnection specification and access specification within that document. We describethese parameters here.v connectionDocument

The connectionDocument parameter specifies the path name of a file containingthe connection and access specifications identified by the connection and accessparameters (see “Connection Specifications Document” on page 22). Theconnection and access specifications defined by the operator's invocationparameters are set at SPL compile time; any change in the reference document orparameter settings requires a re-compile to take effect. Once compiled, theconnections.xml document is not required for job submission. TheconnectionDocument parameter is optional. If present, it must have exactly onevalue of type rstring. If the parameter is absent, the operator will look for a filecalled connections.xml in the etc subdirectory of the SPL application directory(the current working directory where the sc command is invoked). For example,if you invoke the sc command from /home/myapp, the compiler would look for aconnection document at /home/myapp/etc/connections.xml.

v connection

The connection parameter specifies the name of a connection_specificationelement in the connection specifications document that identifies the externalservice to which this operator will connect (see “Connection_specificationElement” on page 23). This parameter is required and must have exactly onevalue of type rstring.

v access

The access parameter specifies the name of an access_specification element inthe connection specifications document (see “Access_specification element” onpage 24). This access specification specifies how this operator will access specificdata in the external service identified by the connection parameter. Thisparameter is required and must have exactly one value of type rstring. The

Chapter 2. How to use the Database Toolkit 5

Page 14: IBMInfoSphereStreams-DatabaseToolkit

connection specification named in the connection parameter of this operatordeclaration must be the value of the connection attribute of a uses_connectionelement of the named access specification.

Important: Although the SPL compiler in conjunction with the Database Toolkitdoes check that the connections specification and the access specificationassociated with a Database Toolkit operator are semantically valid XML, itcannot check at compile time that the operator can connect to the external dataservice and access data as configured. These operators have internal checks forcorrect configuration of the external data service that might result in a runtimefailure, captured in the processing element logs. For information about tracingand logging, see the IBM Streams Processing Language Streams Debugger Reference.The Database Toolkit also provides utilities to help find setup and configurationissues. For more information about these utilities, see “Using odbchelper” onpage 35 and “Using db2helper” on page 37.

Operator error output portThe ODBC and DB2 operators in the Database Toolkit provide the capability tospecify an optional output port containing information about SQL errors that occurat runtime. This port gives an application writer the capability to handle errors asthey occur within the application if desired.

The output port can have up to four attributes, all of which are optional. The firstattribute is an embedded tuple containing all of the attributes of the input tuplepertaining to the SQL error. This embedded tuple is only valid for operators thathave input ports (for example, ODBCSource has no input port and thus cannotinclude the embedded tuple in its error output port).

The remaining three attributes correspond to the SQL return code, SQL message,and SQL state returned on an SQL error. The data types of these attributes areint32, rstring, and rstring respectively. The attributes can have any name. Thefirst rstring attribute, if specified, contains the SQL message data, and the secondrstring attribute, if specified, contains the SQL state.

The error output port is non-mutating and its punctuation mode is Free.

Tuples are generated on this port when certain runtime SQL error conditions occur.For a complete list of these runtime error conditions, see “Operator runtime errorconditions” on page 20.

The following is an example of the ODBCAppend operator invocation, using aconfigured error output port:stream <tuple<PersonSchema> inTuple, int32 sqlcode, rstring sqlmessage, //1

rstring sqlstate> errors = ODBCAppend ( MyInputDataStream ) //2{ //3

param //4connectionDocument : "connections.xml"; //5connection : "PersonDB"; //6access : "PersonSink"; //7

} //8

The following is an example of the ODBCSource operator invocation, using aconfigured error output port. Note that the operator has two output ports, the firstone being the port containing the operator's generated source data tuples. Alsonote that ODBCSource has no input port, so the error output port does not containthe embedded input tuple.

6 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 15: IBMInfoSphereStreams-DatabaseToolkit

(stream <int32 id, rstring fname, rstring lname>; //1stream <int32 sqlcode, rstring sqlmessage, rstring sqlstate> errors) = //2ODBCSource () //3{ //4

param //5connectionDocument : "connections.xml"; //6connection : "PersonDB"; //7access : "InfPeronIFL"; //8initDelay : 3.0; //9

} //10

OperatorsThe root namespace for all toolkit operators is com.ibm.streams.db. The followingtable summarizes the namespace declarations needed (in an SPL application) foreach operator.

Table 3. Namespace for Database Toolkit operators

Operator Namespace declaration

ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource,SolidDBEnrich

use com.ibm.streams.db::*;

DB2SplitDB, DB2PartitionedAppend use com.ibm.streams.db.db2::*;

ODBCAppendNamespace

com.ibm.streams.db

DescriptionThe ODBCAppend operator stores an input stream into a DBMS table. A rowis appended to the table for each input stream tuple, using an SQL INSERTstatement based on the information specified in the table element of theaccess specification named by the access parameter.

Input PortsThe ODBCAppend operator has one required input port. The input port isnon-mutating and its punctuation mode is Oblivious (there is noreasonable mapping to a table row).

The names and types of input stream tuple attributes must correspond tothe names and types of the attribute elements of the external_schemaelement (see “External_schema element” on page 30) of the accessspecification named by the access parameter. The external_schemaattribute names must be the same as the names of the columns of the tablebeing appended to. Also, the attribute types must be the SPL type thatcorresponds to the ODBC type of the table column (see Table 6 on page32).

Note: It is not required that all columns of a table be represented in theinput stream schema and access specification external_schema. Columnswhich have an automatically assigned or default value at insert time canbe excluded.

Output PortsThe ODBCAppend operator has one optional output port. This output portsubmits a tuple when an insert error occurs when trying to insert a tuplerecord into the table. For detailed information about the error output port,see “Operator error output port” on page 6.

Chapter 2. How to use the Database Toolkit 7

Page 16: IBMInfoSphereStreams-DatabaseToolkit

ParametersThe ODBCAppend operator has the following parameter besides the set ofcommon Database Toolkit operator parameters (see “Operator commonparameters” on page 5).

commitOnPunctuationThis optional boolean parameter allows you to specify whether ornot transactions are committed when the operator receives apunctuation. The default is false. If the parameter is set to true,the operator will perform the following actions when a windowpunctuation is received:v If the current rowset is not empty, the rowset will be inserted.v If the number of rows inserted since the last transaction commit

is greater than 0, the transaction will be committed and theuncommitted row counter will reset to 0.

If no window punctuation is received, a commit will continue tooccur if the number of rows inserted reaches the number specifiedin the transaction_batchsize attribute of the table element. Formore information on the “Table element” on page 25 see theConnection Specification Document.

WindowingThe ODBCAppend operator does not accept any windowing configurations.

AssignmentsThe ODBCAppend operator does not allow assignments to output attributes.

MetricsThe ODBCAppend operator provides the following metric:v droppedTuples: The number of input tuples that are dropped (not

inserted into the table) because of an insert failure.

ExceptionsThe ODBCAppend operator does not throw any exceptions. For a list of errorconditions that are logged, see “Operator runtime error conditions” onpage 20.

() as mySink = ODBCAppend(persondata) //1{ //2

param //3connection : "PersonDB"; //4access : "PersonSink"; //5connectionDocument : "connections.xml"; //6

} //7

ODBCEnrichNamespace

com.ibm.streams.db

DescriptionThe ODBCEnrich operator executes an SQL SELECT statement for each inputtuple, and submits an output tuple for each record in the result set of theSELECT statement. If an invocation of the SELECT statement does not resultin any records, no output tuples are submitted for that invocation.

The SELECT statement is specified as the value of the query attribute of thequery element of the access specification named by the access parameter(see “Query element” on page 25). Any valid SELECT statement for thedatabase specified in the connection specification named by the connection

8 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

|

Page 17: IBMInfoSphereStreams-DatabaseToolkit

parameter can be used. For each row in the result set, the operatorproduces a tuple. The tuple automatically assigns the values of thecolumns of the result set to the output stream attribute with the samename and data type. Null data in a result set's column produces 0 fornumeric attributes and an empty string for the rstring attributes. Theseautomatic assignments will not override an explicit assignment to anoutput attribute or an auto-assignment of an input stream attribute withthe same name.

The names and data types of the columns of the result set of the SELECTstatement are specified by the attribute elements of the external_schemaelement of the access specification (see “External_schema element” on page30). However, no external schema attribute can have the same name as aninput stream attribute, since there is no syntactic means to indicatewhether such a name refers to the input stream attribute or the externalschema attribute. Additionally, the operator checks that every outputstream attribute either has an explicit assignment, has the same name as aninput stream attribute, or has the same name as an external schemaattribute.

Input PortsThe ODBCEnrich operator has one required input port. The input streamtuples must contain attribute(s) corresponding to the columns of the tablebeing queried. The input port is non-mutating and its punctuation mode isOblivious.

Output PortsThe ODBCEnrich operator has one required output port and one optionaloutput port.

The required output port submits a tuple for each row in the result set ofthe SELECT statement executed for an input tuple. The resulting tuplecontains the input tuple attributes, plus additional desired attribute(s)corresponding columns in the result set. The output port is mutating andits punctuation mode is Preserving.

The optional output port submits a tuple when an error occurs on the SQLSELECT executed as the result of an input tuple. For detailed informationabout the error output port, see “Operator error output port” on page 6.

ParametersIn addition to the set of common Database Toolkit operator parameters(see “Operator common parameters” on page 5), the ODBCEnrich operatorsupports query-specific parameters that permit the parameterization of theSQL SELECT statement that is executed. In the access specification named bythe access parameter, a parameter element is associated with each ODBCparameter marker in the SELECT statement (see “Parameter element” onpage 30). The operator declaration must specify a parameter with the samename as each of the parameter elements in its access specification. Thevalue(s) of a query-specific parameter must have the data type specified byits parameter element as well as the specified number of values. In thefollowing example, the access specification PersonIAGST has these queryand parameter elements:stream <int32 id, rstring fname, rstring lname, //1int16 age, rstring gender, float32 score, float64 total> //2MyCompletePersonStream = ODBCEnrich ( MyPersonNamesStream ) //3{ //4param //5connection : "PersonDB"; //6access : "PersonIAGST"; //7connectionDocument : "connections.xml"; //8pid : id; //9

Chapter 2. How to use the Database Toolkit 9

Page 18: IBMInfoSphereStreams-DatabaseToolkit

} //10//11

<query query="SELECT id, age, gender, score, total FROM personsrc WHERE ID = ?" /> //12<parameters> //13<parameter name="pid" type="int32" /> //14

</parameters> //15

Note that the SELECT statement has an ODBC parameter marker (the “?”)in its WHERE clause. The parameter element named pid is associated withthat parameter marker. In the operator declaration above, thecorresponding operator parameter named pid has a single value, id, whichdenotes the input stream attribute with that name. The data type of thevalue of the pid parameter in the operator declaration, int32, matches thatof the pid parameter element. For each incoming tuple, the operatorexecutes this SELECT statement, providing the current value of the inputstream attribute id as the value of that ODBC parameter marker.A SELECT statement can have multiple ODBC parameter markers ("?"s). Thenumber of parameter markers must match the number of parameterelements. ODBC parameter markers are processed in the order in whichthey appear in the SELECT statement, and correspond to the order of theparameter elements in the configuration file.

WindowingThe ODBCEnrich operator does not accept any windowing configurations.

AssignmentsThe ODBCEnrich operator does not allow assignments to output attributes.

MetricsThe ODBCEnrich operator provides the following metric:v droppedTuples: The number of input tuples that are dropped (not

processed) because of an SQL failure.

ExceptionsThe ODBCEnrich operator does not throw any exceptions.

For a list of error conditions that are logged, see “Operator runtime errorconditions” on page 20.

For an example of the ODBCEnrich operator, see the Parameters section.

ODBCRunNamespace

com.ibm.streams.db

DescriptionThe ODBCRun operator runs a generic user-defined SQL statement as part ofan application. This operator is commonly used to update, merge, anddelete data. The ODBCRun operator is also used to create tables, drop tablesand call stored procedures.

Input PortsThe ODBCRun operator is configurable with one required input port. Theinput port is non-mutable and its punctuation mode is Oblivious. Theuser-defined SQL statement is run each time a tuple is received on theinput port. If the statement has ODBC parameter markers, the operator canbe configured to use input tuple attribute values for these parametermarkers at statement run time. For additional details about statementparameters, see the Parameters section of this operator.

10 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

|

||

|||||

||||||||

Page 19: IBMInfoSphereStreams-DatabaseToolkit

Output PortsThe ODBCRun operator has one required output port and one optionaloutput port.

The required output port is mutating and its punctuation mode isPreserving. The output port submits a tuple for each row in the result setof the user-defined statement, if the statement produces a result set. Theoutput tuple can contain any of the following results assigned in thisorder:v Any of the columns returned in the result setv Any of the attributes from the input tuplev Any operator-configured output assignments

The optional output port submits one or more tuples when an error occurswhile running the user-defined SQL statement. For detailed informationabout the error output port, see “Operator error output port” on page 6.

Note: If the statement element of the access_specification parameter isconfigured with a transaction_batchsize > 1, the error output port willsubmit a tuple for each record in the rowset processed by the statement.

ParametersIn addition to the set of “Operator common parameters” on page 5 in theDatabase Toolkit, the ODBCRun operator supports the following parameter:

commitOnPunctuationYou can specify whether transactions are committed when theoperator receives a window punctuation with this optional Booleanparameter. The default parameter is false. If the parameter is setto true, the operator performs the following actions when awindow punctuation is received:v If the current rowset is not empty, the rowset is inserted.v If the number of rows inserted since the last transaction commit

is greater than 0, the transaction is committed and theuncommitted row counter is reset to 0.

If no window punctuation is received, a commit continues to occurif the number of rows inserted reaches the number specified in thetransaction_batchsize attribute of the table element. For moreinformation about the “Table element” on page 25, see theConnection Specification Document.

Additionally, the ODBCRun operator supports statement-specific parametersthat establish the framework of the user-defined statement being run. Inthe access specification named by the access parameter, a parameterelement is associated with each ODBC parameter marker in the statement(see “Parameter element” on page 30). The operator declaration mustspecify a parameter with the same name as each of the parameter elementsin its access specification. The value of a statement-specific parameter musthave the data type specified by its parameter element as well as thespecified number of values. For a usage example, see the Parameterssection of the ODBCEnrich operator.

AssignmentsThe ODBCRun operator can assign the following output tuple attributes:v If the statement produces a result set, the external_schema section of the

access_specification element specified in the access parameter of the

Chapter 2. How to use the Database Toolkit 11

|||

|||||

|

|

|

|||

|||

|||

||||||

|

|||

|||||

||||||||||

||

||

Page 20: IBMInfoSphereStreams-DatabaseToolkit

operator must correspond to the output stream schema. Output tupleattributes are assigned the values of the columns returned in the resultset.

v After result set values are assigned (or if there is no result set),unassigned output tuple attributes with a matching input tuple attributeare automatically assigned with the input tuple attribute value.

v All remaining unassigned output tuple attributes must have an explicitassignment specified in the output section of the operator configuration.

WindowingThe ODBCRun operator does not accept any window configurations.

MetricsThe ODBCRun operator has the following metrics:v droppedTuples The number of input tuples which result in a statement

failure.

ExceptionThe ODBCRun operator does not throw any exceptions. For a list of errorconditions that are logged, see “Operator runtime error conditions” onpage 20.

ExampleThe following sample SPL application creates a table at startup, insertsrecords read from a file, and updates them with new values.use com.ibm.streams.db::*;

composite Main {

type

TableSchema = rstring tablename;

PersonSchema = int32 id,rstring name,float32 salary;

graph

////////////////////////////////////////////////// Create a new table////////////////////////////////////////////////stream <TableSchema> tablebeacon = Beacon(){

paramiterations : 1u;

}

stream <TableSchema> createtable = ODBCRun(tablebeacon){

param

connection : "DBPerson";access : "PersonCreate";connectionDocument : “./etc/connections.xml”;

}

////////////////////////////////////////////////// Read from file #1////////////////////////////////////////////////stream<PersonSchema> persondata = FileSource(){

param

12 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

|||

|||

||

||

||

||

||||

|||

||||||||||||||||||||||||||||||||||||||

Page 21: IBMInfoSphereStreams-DatabaseToolkit

file : "testdata.csv";format : csv;initDelay : 5.0;

}

////////////////////////////////////////////////// Insert records into the table created earlier////////////////////////////////////////////////

() as tablePopulate = ODBCAppend(persondata){paramconnection : "DBPerson";access : "PersonSink";connectionDocument : “./etc/connections.xml”;}

////////////////////////////////////////////////// Read updated records from file #2////////////////////////////////////////////////stream<PersonSchema> persondata_updated = FileSource(){

param

file : "testdata_updated.csv";format : csv;initDelay : 5.0;

}

////////////////////////////////////////////////// Update table with new values////////////////////////////////////////////////() as tableUpdate = ODBCRun(persondata_updated){

paramconnection : "DBPerson";access : "PersonUpdate";connectionDocument : "./etc/connection.xml";salary : salary;id : id;

}

---connection.xml snippet---<access_specification name=”PersonCreate”><statement statement=”CREATE TABLE ?

(ID INTEGER NOT NULL, NAME CHAR(20), SALARY FLOAT)” /><parameters><parameter name=”tablename” type=”rstring” /></parameters>

</access_specification>

<access_specification name=”PersonUpdate”><statement statement="UPDATE TABLE person SET salary = ? WHERE id = ?” /><parameters><parameter name=”salary” type=”float32” /><parameter name="id" type="int32" /></parameters></access_specification>---end connection.xml snippet---

ODBCSourceNamespace

com.ibm.streams.db

Chapter 2. How to use the Database Toolkit 13

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

|

Page 22: IBMInfoSphereStreams-DatabaseToolkit

DescriptionThe ODBCSource operator generates a stream from the result set of an SQLSELECT statement. The SELECT statement is specified as the value of thequery attribute of the query element of the access specification named bythe access parameter (see “Query element” on page 25). Any valid SELECTstatement for the database specified in the connection specification namedby the connection parameter can be used.

For each row in the result set, the operator produces a tuple byautomatically assigning the values of the columns of the result set to theoutput stream attributes with the same name and data type. When acolumn of the result set contains null data, the corresponding outputstream attribute is set to 0 for numeric data types and to the empty stringfor the rstring data type. The columns of the result set of the SELECTstatement, as specified by the attribute elements of the external_schemaelement of the access specification (see “External_schema element” on page30), must be a superset of the attributes of the output stream. The values ofthe columns of the SELECT statement result set are assigned to outputstream attributes by name.

If you want to stream the result set of your query into your applicationmore than once, specify the number of times as the value of the replaysattribute of the query element in the access specification. The operator willexecute the query that number of times. If the value of the replaysattribute is 0, the ODBCSource operator will execute the query repeatedlyuntil the application is canceled. If the value of the replays attribute is not0, a final punctuation will be generated when all queries for the operatorinvocation have completed.

Input PortsThe ODBCSource operator has no input ports.

Output PortsThe ODBCSource operator has one required output port and one optionaloutput port.

The required output port submits a tuple for each row in the result set ofthe SELECT statement. The output port is mutating and its punctuationmode is Generating. A punctuation is generated after all of the tuples havebeen submitted as the result of a query. This allows downstream operatorsto distinguish between tuples from different query executions.

The optional output port submits a tuple when an error occurs on the SQLSELECT statement being executed. For detailed information about the erroroutput port, see “Operator error output port” on page 6.

ParametersIn addition to the set of common Database Toolkit operator parameters(see “Operator common parameters” on page 5), the ODBCSource operatorhas the following additional parameters:

initDelayThe initDelay parameter specifies an initial processing delay, inseconds, before the operator begins emitting tuples. It is equivalentto the initDelay parameter of the FileSource or TCPSourceoperator in the SPL Standard Toolkit. This parameter is optional. Ifpresent, it must have exactly one value of type float64.

sleepTimeThis parameter of type float64 specifies the minimal time in

14 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 23: IBMInfoSphereStreams-DatabaseToolkit

seconds that the operator has to wait before it can execute a queryagain. It is equivalent to the sleepTime parameter of theDirectoryScan operator in the SPL Standard Toolkit. If thisparameter is not specified, the operator does not wait for anyamount of time between query executions. If the time differencebetween the last executed query and the current time is less thansleepTime seconds, the operator sleeps until the time since the lastexecuted query is sleepTime seconds. If more than sleepTimeseconds have already passed, the query is executed immediately.

Additionally, the operator supports query-specific parameters that permitthe parameterization of the SQL SELECT statement that is executed. In theaccess specification named by the access parameter, a parameter element isassociated with each ODBC parameter marker in the SELECT statement (see“Parameter element” on page 30). The operator declaration must specify aparameter with the same name as each of the parameter elements in itsaccess specification. The value(s) of a query-specific parameter must havethe data type specified by its parameter element as well as the specifiednumber of values. For a usage example, see the Parameters section of theODBCEnrich operator.

WindowingThe ODBCSource operator does not accept any windowing configurations.

AssignmentsThe ODBCSource operator does not allow assignments to output attributes.

MetricsThe ODBCSource operator does not provide any metrics.

ExceptionsThe ODBCSource operator does not throw any exceptions.

For a list of error conditions that are logged, see “Operator runtime errorconditions” on page 20.

stream <int32 id, rstring fname, rstring lname> //1MyPersonNamesStream = ODBCSource() //2{ //3

param //4connection : "PersonDB"; //5access : "InfPersonIFL"; //6connectionDocument : "connections.xml"; //7initDelay : 3; //8sleepTime : 6; //9

} //10

SolidDBEnrichNamespace

com.ibm.streams.db

DescriptionThe SolidDBEnrich operator is similar to the ODBCEnrich operator, but itsimplementation is based on the proprietary solidDB SA API that achievesvery high performance by bypassing the full SQL query engine. Inparticular, the query is executed against a solidDB table, and the searchconditions on the query are much more restricted than those offered by theWHERE clause of an SQL SELECT statement.

For each incoming tuple, operator executes a query against the tablespecified in the tablename attribute of the tablequery element of the access

Chapter 2. How to use the Database Toolkit 15

Page 24: IBMInfoSphereStreams-DatabaseToolkit

specification named by the access parameter (see “Tablequery element” onpage 27). The search conditions for the query are specified by theparameter_condition and static_condition elements of the tablequeryelement. An output tuple is produced for each row in the result set byautomatically assigning the values of the columns of the result set to theoutput stream attributes with the same name and data type. Theseautomatic assignments will not override an explicit assignment to anoutput attribute or an auto-assignment of an input stream attribute withthe same name. The names and data types of the columns of the solidDBtable are specified by the attribute elements of the external_schemaelement of the access specification (see “External_schema element” on page30). However, no external schema attribute can have the same name as aninput stream attribute, since there is no syntactic means to indicatewhether such a name refers to the input stream attribute or the externalschema attribute. Additionally, the operator checks that every outputstream attribute either has an explicit assignment, has the same name as aninput stream attribute, or has the same name as an external schemaattribute.

Input PortsThe SoliDDBEnrich operator has one required input port. The input streamtuples must contain attribute(s) corresponding to the columns of the tablebeing queried. The input port is non-mutating and its punctuation mode isOblivious.

Output PortsThe SoliDDBEnrich operator has one required output port. The output portsubmits a tuple for each row in the result set of the query executed for aninput tuple. The resulting tuple contains the input tuple attributes, plusadditional desired attribute(s) corresponding columns in the result set. Theoutput port is mutating and its punctuation mode is Preserving.

ParametersIn addition to the set of common Database Toolkit operator parameters(see “Operator common parameters” on page 5), the SoliDDBEnrichoperator supports tablequery-specific parameters that permit the values ofsearch constraints on the solidDB query to be specified in the operatordeclaration rather than in the access specification.

For example, the values of search constraints can be taken from inputstream attributes. In the access specification named by the accessparameter, a parameter element is associated with eachparameter_condition element of the tablequery element. The operatordeclaration must specify a parameter with the same name as each of theparameter elements in its access specification. The value(s) of atablequery-specific parameter must have the data type specified by itsparameter element as well as the specified number of values.

In the example below, the access specification bargainIndexRatings hasthese parameter_condition and parameter elements:stream <rstring ticker, rstring ttype, float64 price, float64 volume, //1

float64 ratingInternal, float64 ratingExternal, float64 epsExternal> //2EnrichedTrades = SolidDBEnrich ( TradeQuote ) //3{ //4param //5connection : "FinancialDB"; //6access : "bargainIndexRatings"; //7ticker : ticker; //8ttype : ttype; //9

} //10//11

<tablequery tablename="ratings"> //12<parameter_condition column="symbol" condition="equal" /> //13

16 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 25: IBMInfoSphereStreams-DatabaseToolkit

<parameter_condition column="type" condition="equal" /> //14</tablequery> //15<parameters> //16<parameter name="ticker" type="rstring" /> //17<parameter name="ttype" type="rstring" /> //18

</parameters> //19

In the example operator declaration above, there are twotablequery-specific parameters, ticker and ttype, whose values are theinput stream attributes of the same names. The data type of the values ofthese parameters in the operator declaration, rstring, matches those of theparameter elements. For each incoming tuple, the operator executes thequery, providing the current value of the input stream attributes tickerand ttype to the query search condition.

WindowingThe SolidDBEnrich operator does not accept any windowingconfigurations.

AssignmentsThe SolidDBEnrich operator does not allow assignments to outputattributes.

MetricsThe SolidDBEnrich operator does not provide any metrics.

ExceptionsThe SolidDBEnrich operator does not throw any exceptions.

For other errors, messages are written to the log. For a list of errorconditions that are logged, see “Operator runtime error conditions” onpage 20.

For an example of the SolidDBEnrich operator, see the Parameters section.

DB2SplitDBNamespace

com.ibm.streams.db.db2

DescriptionThe DB2SplitDB operator uses DB2 database table key information from aninput tuple to determine its corresponding partition number in the DB2database table. The input tuples must contain the attributes thatcorrespond to the key fields in the database table. The operator has nnumber of output streams where n is equal to the number of partitions inthe database. The operator submits the input tuple to an output streamthat corresponds to the partition number.

The access specification named in the operator declaration must containthe tablename attribute within the table element. The access specificationmust also contain an external_schema element with attribute elements foreach column in the table specified by the tablename attribute within thetable element.

Input PortsThe DB2SplitDB operator has one required input port. The input porttuples must contain the attribute(s) corresponding to the key field(s) forthe database table. The input port is non-mutating and its punctuationmode is Oblivious.

Output PortsThe DB2SplitDB operator has one output port open set. The required

Chapter 2. How to use the Database Toolkit 17

Page 26: IBMInfoSphereStreams-DatabaseToolkit

number of output streams is the same as the number of partitions in thedatabase. The output port is mutating and its punctuation mode isPreserving.

ParametersThe DB2SplitDB operator does not have any additional parameters besidesthe set of common Database Toolkit operator parameters (see “Operatorcommon parameters” on page 5).

WindowingThe DB2SplitDB operator does not accept any window configurations.

AssignmentsThe DB2SplitDB operator does not allow assignments to output attributes.

MetricsThe DB2SplitDB operator provides the following metric:v droppedTuples: The number of input tuples that are dropped (not

processed) due to a failure in retrieving the partition number.

ExceptionsThe DB2SplitDB operator will throw an exception and terminate in thefollowing cases:v Unable to determine number of partitionsv The number of output streams does not match the number of partitions.v The number of attributes in the external_schema element of the

access_specification element that have the key=true attribute does notmatch the number of hash keys configured for the database table.

For other errors, messages are written to the log. For a list of errorconditions that are logged, see “Operator runtime error conditions” onpage 20.

(stream <PersonSchema> out0; //1stream <PersonSchema> out1; //2stream <PersonSchema> out2; //3stream <PersonSchema> out3) = DB2SplitDB(persondata) //4{ //5

param //6connection : "DBPerson"; //7access : "PersonSinkDefault"; //8connectionDocument : "./etc/connections.xml"; //9

} //10

DB2PartitionedAppendNamespace

com.ibm.streams.db.db2

DescriptionThe DB2PartitionedAppend operator performs the same function as theODBCAppend operator, but allows the user to specify a DB2 partition number,used on the SQL INSERT operation. This allows an application to takeadvantage of DB2 high performance partitioned databases by writingdirectly to partitions.

The operator is intended to be used in conjunction with the DB2SplitDBoperator, which determines the partition number using the key fields of aninput tuple.

Input PortsThe DB2PartitionedAppend operator has one required input port. The input

18 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 27: IBMInfoSphereStreams-DatabaseToolkit

stream tuples must contain the attribute(s) corresponding to the columns ofthe table being appended to. The input port is non-mutating and itspunctuation mode is Oblivious (there is no reasonable mapping to a tablerow).

Output PortsThe DB2PartitionedAppend operator has one optional output port. Thisoutput port submits a tuple when an insert error occurs when trying toinsert a tuple record into the table. For detailed information about the erroroutput port, see “Operator error output port” on page 6.

ParametersIn addition to the set of common Database Toolkit operator parameters(see “Operator common parameters” on page 5), the DB2PartitionedAppendoperator has the following additional parameters:

partitionNumberThis required int32 parameter is used to specify the DB2 databasepartition number to be used on the SQL INSERT operation.

commitOnPunctuationThis optional boolean parameter allows you to specify whether ornot transactions are committed when the operator receives apunctuation. The default is false. If the parameter is set to true,the operator will perform the following actions when a windowpunctuation is received:v If the current rowset is not empty, the rowset will be inserted.v If the number of rows inserted since the last transaction commit

is greater than 0, the transaction will be committed and theuncommitted row counter will reset to 0.

If no window punctuation is received, a commit will continue tooccur if the number of rows inserted reaches the number specifiedin the transaction_batchsize attribute of the table element. Formore information on Table Elements see the ConnectionSpecification Document.

WindowingThe DB2PartitionedAppend operator does not accept any windowingconfigurations.

AssignmentsThe DB2PartitionedAppend operator does not allow assignments to outputattributes.

MetricsThe DB2PartitionedAppend operator provides the following metric:v droppedTuples: The number of input tuples that are dropped (not

inserted into the table) because of an insert failure.

ExceptionsThe DB2PartitionedAppend operator does not throw any exceptions. For alist of error conditions that are logged, see “Operator runtime errorconditions” on page 20.

ExampleThe following example assumes a database with four partitions, withpartitioninformation determined by the example listed in the DB2SplitDB operator.

Chapter 2. How to use the Database Toolkit 19

|

Page 28: IBMInfoSphereStreams-DatabaseToolkit

() as appendTable0 = DB2PartitionedAppend(out0) //1{ //2

param //3connection : "DBPerson"; //4access : "PersonSinkDefault"; //5connectionDocument : "./etc/connection.xml"; //6partitionNumber : 0u; //7

} //8() as appendTable1 = DB2PartitionedAppend(out1) //9{ //10

param //11connection : "DBPerson"; //12access : "PersonSinkDefault"; //13connectionDocument : "./etc/connection.xml"; //14partitionNumber : 1u; //15

} //16() as appendTable2 = DB2PartitionedAppend(out2) //17{ //18

param //19connection : "DBPerson"; //20access : "PersonSinkDefault"; //21connectionDocument : "./etc/connection.xml"; //22partitionNumber : 2u; //23

} //24() as appendTable3 = DB2PartitionedAppend(out3) //25{ //26

param //27connection : "DBPerson"; //28access : "PersonSinkDefault"; //29connectionDocument : "./etc/connection.xml"; //30partitionNumber : 3u; //31

} //32

Operator runtime error conditionsThe following conditions will result in an error message being generated in theprocessing element log during the operator runtime. For information about tracingand logging, see the IBM Streams Processing Language Streams Debugger Reference.

ODBC and DB2 operators runtime error conditionsThe OCBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, andDB2PartitionedAppend operators automatically respond to error conditions.

Table 4. Runtime error conditions for ODBCAppend, ODBCEnrich, ODBCRun,ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators

Error Condition Operators System Action

Unable to allocateODBC environmenthandle

ODBCAppend, ODBCEnrich,ODBCRun, ODBCSource,DB2SplitDB,DB2PartitionedAppend

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

Unable to allocateODBC connectionhandle

ODBCAppend, ODBCEnrich,ODBCRun, ODBCSource,DB2SplitDB,DB2PartitionedAppend

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

20 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 29: IBMInfoSphereStreams-DatabaseToolkit

Table 4. Runtime error conditions for ODBCAppend, ODBCEnrich, ODBCRun,ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators (continued)

Error Condition Operators System Action

Unable to connect todatabase

ODBCAppend, ODBCEnrich,ODBCRun, ODBCSource,DB2SplitDB,DB2PartitionedAppend

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

Unable to allocateSQL statement

ODBCAppend, ODBCEnrich,ODBCRun, ODBCSource,DB2SplitDB,DB2PartitionedAppend

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

Unable to setisolation level

ODBCEnrich, ODBCRun,ODBCSource

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

Unable to prepareSQL statement

ODBCAppend, ODBCEnrich,ODBCRun, ODBCSource,DB2PartitionedAppend

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

Unable to bind SQLparameters

ODBCAppend, ODBCEnrich,ODBCRun, ODBCSource,DB2PartitionedAppend

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

Unable to setAutoCommit

ODBCAppend, ODBCRun,DB2PartitionedAppend

Operator will retry the operationattempt after a wait period (startingwith 1 second). The wait periodincreases by a power of two afterevery attempt.

Unable to run SQLINSERT statement

ODBCAppend,DB2PartitionedAppend

If the optional output error stream isspecified, a tuple will be generatedon this stream.

Unable to run SQLSELECT statement

ODBCEnrich, ODBCSource Operator will retry the operationafter a wait period (starting with 1second). The wait period increasesby a power of two after everyattempt.

If the optional output error stream isspecified, a tuple containing the SQLerror information will be generatedon this stream.

Unable to run SQLstatement

ODBCRun If the optional error stream isspecified, a tuple will be generatedon this stream.

Unable to determinenumber of statementresult columns

ODBCAppend, ODBCEnrich,ODBCRun, ODBCSource,DB2PartitionedAppend

If the optional error stream isspecified, a tuple will be generatedon this stream.

Unable to fetch SQLSELECT results

ODBCEnrich, ODBCRun,ODBCSource

Not applicable.

Chapter 2. How to use the Database Toolkit 21

|||

Page 30: IBMInfoSphereStreams-DatabaseToolkit

Table 4. Runtime error conditions for ODBCAppend, ODBCEnrich, ODBCRun,ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators (continued)

Error Condition Operators System Action

Unable to close SQLSELECT resultscursor

ODBCEnrich, ODBCRun,ODBCSource

Not applicable.

Unable to determinepartitioninformation for atuple

DB2SplitDB Not applicable.

SolidDBEnrich operator runtime error conditionsThe SolidDBEnrich operator automatically responds to error conditions.

Table 5. Runtime error conditions for SolidDBEnrich operator

Error Condition System Action

Unable to connect to solidDBdatabase

Operator will retry the operation attempt after a waitperiod (starting with 1 second). The wait periodincreases by a power of two after every attempt.

Unable to bind solidDB tablecolumn

Operator will retry the operation attempt after a waitperiod (starting with 1 second). The wait periodincreases by a power of two after every attempt.

Unable to open solidDB tablecursor

Operator will retry the operation attempt after a waitperiod (starting with 1 second). The wait periodincreases by a power of two after every attempt.

Unable to set solidDB searchconstraints

Not applicable.

Unable to execute solidDB query Operator will retry the operation attempt after a waitperiod (starting with 1 second). The wait periodincreases by a power of two after every attempt.

If the optional output error stream is specified, a tuplecontaining the solidDB error information will begenerated on this stream.

Unable to advance solidDB tablecursor

Operator will not attempt to process anymore resultsfor the current input tuple.

Unable to reset solidDB tablecursor

Not applicable.

Connection Specifications DocumentA connection specifications document is an XML document that describes howoperators in the Database Toolkit connect to and access specific external dataservices. Each document contains a collection of connection specification elementsand access specification elements. These connection and access specifications canbe for a single operator, all the operators in an application, or organized byadapter type or any other criterion you choose. The only restriction is that theconnection specification and the access specification for a given operatordeclaration must be in the same connection specifications document. Therelationship between connection specifications and access specifications ismany-to-many. Operators can connect to the same external data service (oneconnection specification) and access several different data resources from that

22 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 31: IBMInfoSphereStreams-DatabaseToolkit

service (many access specifications). On the other hand, operators can accessequivalent data (one access specification) from several different external dataservices (many connection specifications), for example, accessing data from both atest system and a production system.

When the SPL compiler encounters a Database Toolkit operator declaration, it mustread the connection specifications document named by that operator'sconnectionDocument parameter. It checks that the document conforms to thesemantic rules of the XML schema defined for these documents. The SPL compileruses the information given in the connection and access specifications to configurethe operator. The compiler does not attempt to connect to the external data serviceor access its data to verify correct configuration at compile time. The operatorshave runtime checks to validate configuration; if the configuration is incorrectthese checks might result in runtime failures which are captured in the processingelement logs. For information about tracing and logging, see the IBM StreamsProcessing Language Streams Debugger Reference.

A valid connection specifications document consists of a connections root elementwhich contains one connection_specifications element and oneaccess_specifications element. These elements serve as containers for theconnection specifications and access specifications, which we explain in detail inthe following sections. Here is an abridged example of a complete connectionspecifications document, with all connection_specification andaccess_specification elements omitted.<?xml version="1.0" encoding="UTF-8"?>

<st:connections xmlns:st="http://www.ibm.com/xmlns/prod/streams/adapters"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<connection_specifications>...

</connection_specifications><access_specifications>

.

.

.</access_specifications>

</st:connections>

Connection_specification ElementA connection_specifications element is a sequence of zero or moreconnection_specification elements. Each connection_specification element hasa single attribute, name, whose value can be specified in the connection parameterof a toolkit operator declaration to identify the named connection specification. Aconnection_specification element must have exactly one element. For theDatabase Toolkit, this can be ODBC or solidDB.

ODBC elementThe ODBC element specifies the information needed to establish a connection to adatabase using the ODBC SQLConnect() function. Here is an example connectionspecification containing an ODBC element.<connection_specification name="PersonDB">

<ODBC database="person" user="user1" password="somepw" /></connection_specification>

The ODBC element has three attributes.v database

Chapter 2. How to use the Database Toolkit 23

Page 32: IBMInfoSphereStreams-DatabaseToolkit

The value of the database attribute is the data source name (DSN) of the targetdatabase. This attribute is required.

v user

The value of the user attribute is the user identification under which theconnection to the database will be attempted. This attribute is optional; ifomitted, the corresponding parameter to the ODBC SQLConnect() function call isNULL.

v password

The value of the password attribute is the authentication credentials (password)for the user ID. This attribute is optional; if omitted, the correspondingparameter to the ODBC SQLConnect() function call is NULL.

solidDB ElementThe solidDB element specifies the information needed to establish a connection to adatabase using the solidDB SA API SaConnect() function.

Here is an example connection specification containing a solidDB element.<connection_specification name="ratingsDB">

<solidDB protocol="tcp" hostname="server1" port="333333"user="user2" password="anotherpw" />

</connection_specification>

The solidDB element has five attributes.v protocol

The value of the protocol attribute is a communications protocol supported bysolidDB for establishing a connection to a database. It is part of the data sourceconnect string for that database. This attribute is required.

v hostname

The value of the hostname attribute is the host computer name of the database towhich a connection will be established. It is part of the data source connectstring for that database. This attribute is optional.

v port

The value of the port attribute is the port number on which the host computerof the database is listening for connections. It is part of the data source connectstring for that database. This attribute is optional.

v user

The value of the user attribute is the user identification under which theconnection to the database will be attempted. This attribute is required.

v password

The value of the password attribute is the authentication credentials (password)for the user ID. This attribute is required.

Access_specification elementAn access_specifications element is a sequence of zero or moreaccess_specification elements. Each access_specification element has a singleattribute, name, whose value can be specified in the access parameter of a DatabaseToolkit operator declaration to identify the named access specification. Anaccess_specification element has a choice of a query, table, statement, ortablequery element; an optional parameters element; one or more uses_connectionelements; and exactly one external_schema element.

24 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 33: IBMInfoSphereStreams-DatabaseToolkit

The SPL compiler checks that the type of access specification used in a DatabaseToolkit operator declaration is valid for that operator. Specifically, the accessspecification named in an ODBCAppend, DB2SplitDB, or DB2PartitionedAppenddeclaration must contain a table element; in an ODBCEnrich or ODBCSourcedeclaration, a query element; in an ODBCRun declaration, a statement element, andin a SolidDBEnrich declaration, a tablequery element.

Query elementThe query element specifies information used by an ODBCEnrich or ODBCSourceoperator using the access specification to query a database and produces a resultset.

Here is an example of an abridged access specification containing a query element.<access_specification name="PersonRemainder">

<query query="SELECT id, fname, lname FROM personsrc"isolation_level="READ_UNCOMMITTED" replays="0" />

</access_specification>

The query element has three attributes.v query

The value of the query attribute is any valid SQL SELECT statement for thedatabase specified in the connection specification of the associated operator. Thecolumns of the result set of the SELECT statement must correspond to theattribute elements of the external_schema element of the access specification.The SELECT statement can contain ODBC parameter markers; if so, the accessspecification must have a parameter element corresponding to each ODBCparameter marker (for an example, see “ODBCEnrich” on page 8). This attributeis required.

v isolation_level

The value of the isolation_level attribute specifies the isolation level at whichthe query in the database will be executed. The values for this attribute are theODBC isolation levels. This attribute is optional; if omitted, the query isexecuted at level SQL_TXN_READ_UNCOMMITTED.

v replays

The value of the replays attribute specifies the number of times an ODBCSourceoperator will execute the query. This attribute is ignored by an ODBCEnrichoperator. If the value is 0, the ODBCSource operator will execute the queryrepeatedly until the application is canceled. This attribute is optional; if omitted,the query will be executed once.

Table elementThe table element specifies information used by the ODBCAppend, DB2SplitDB, orDB2PartitionedAppend operator associated with the access specification toefficiently insert rows in a database table.

One attribute, tablename, directly contributes to the SQL INSERT statement used bythe operator. The other attributes control how many rows at a time are sent to thedatabase server and how many rows are committed per transaction. Here is anexample of an abridged access specification containing a table element.<access_specification name="PersonSink">

<table tablename="personSink"transaction_batchsize="10" rowset_size="4" />

Chapter 2. How to use the Database Toolkit 25

|

Page 34: IBMInfoSphereStreams-DatabaseToolkit

</access_specification>

The table element has three attributes.v tablename

The value of the tablename attribute identifies the target of the SQL INSERTstatement, usually a table. This attribute is required.

Note: When using the DB2 database with the Database Toolkit operators, thetablename attribute value must be qualified with the schema name (for example,schemaname.tablename) for the DB2SplitDB operator whereas for other operatorsthis value need not be schema-qualified. However, other operators allow thetablename attribute value to be schema qualified. So, if you use the DB2SplitDBoperator with other operators in your application, you can qualify the tablenameattribute value with the schema name to use the same connection configurationfor all the operators.

v transaction_batchsize

The value of the transaction_batchsize attribute specifies the number of rowsto commit per transaction. It must be greater than or equal to the value of therowset_size attribute. This attribute is optional; if omitted, one thousand rowinserts are committed per transaction.

v rowset_size

The value of the rowset_size attribute specifies the number of rows that are sentto the database server per network flow. It must be less than or equal to thevalue of the transaction_batchsize attribute. This attribute is optional; ifomitted, one hundred rows are sent per network flow.

Statement elementThe statement element specifies information used by the ODBCRun operator usingthe access specification to run an SQL statement.

The statement element supports four attributes:v statement

The value of the statement attribute is any valid SQL statement. The statementcan contain ODBC parameter markers; if so, the access specification must have aparameter element corresponding to each ODBC parameter marker. Thisattribute is required.

v isolation_level

The value of the isolation_level attribute specifies the isolation level at whichthe statement will be run. This attribute is optional; if omitted, the statement isrun at level SQL_TXN_READ_UNCOMMITTED. This attribute applies if the statementproduces a result set.

v rowset_size

The value of the rowset_size attribute specifies the number of rows that are sentto the database server per network flow. The value must be less than or equal tothe value of the transaction_batchsize attribute. This attribute is optional; ifnot specified, the default rowset size is 1. This attribute applies if the statementis transactional in nature (for example, UPDATE).

v transaction_batchsize

The value of the transaction_batchsize attribute specifies the number of rowsto commit per transaction. It must be greater than or equal to the value of the

26 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

|||

|

|

||||

|

||||

|

|||||

|

||

Page 35: IBMInfoSphereStreams-DatabaseToolkit

rowset_size attribute. This attribute is optional; if not specified, the defaulttransaction size is 1 and transactions will be auto-commited. This attributeapplies if the statement is transactional in nature (for example, UPDATE).

Consider these three examples of access specifications containing a statementelement:<access_specification name="PersonCreate”>

<uses_connection connection=”DBPerson” /><statement statement=”CREATE TABLE PERSON (ID INTEGER NOT NULL, FNAME CHAR(15),LNAME CHAR(20), AGE SMALLINT, GENDER CHAR(1), SCORE FLOAT,TOTAL DOUBLE PRECISION)” /><parameters>

<parameter name=”table” type=”rstring”/></parameters>

</access_specification>

<access_specification name="PersonQuery”><uses_connection connection=”DBPerson” /><statement statement=”SELECT * FROM PERSON” isolation_level=”READ_COMMITTED” /><parameters></parameters><external_schema>

<attribute name="id" type="int32" /><attribute name="fname" type="rstring" length="15" /><attribute name="lname" type="rstring" length="20" /><attribute name="age" type="int32" /><attribute name="gender" type="rstring" length="1" /><attribute name="score" type="float32" /><attribute name="total" type="float64" />

</external_schema></access_specification>

<access_specification name="PersonUpdate”><uses_connection connection=”DBPerson” /><statement statement=”UPDATE PERSON SET FNAME=?, LNAME=?, AGE=?,GENDER=?, SCORE=?, TOTAL=? WHERE ID=?” /><parameters>

<parameter name="fname" type="rstring" length="15" /><parameter name="lname" type="rstring" length="20" /><parameter name="age" type="int32" /><parameter name="gender" type="rstring" length="1" /><parameter name="score" type="float32" /><parameter name="total" type="float64" /><parameter name=”id” type=”int32”/>

</parameters><external_schema>

<attribute name="id" type="int32" /><attribute name="fname" type="rstring" length="15" /><attribute name="lname" type="rstring" length="20" /><attribute name="age" type="int32" /><attribute name="gender" type="rstring" length="1" /><attribute name="score" type="float32" /><attribute name="total" type="float64" />

</external_schema></access_specification>

Tablequery elementThe tablequery element specifies information used by the SolidDBEnrich operatorassociated with the access specification to query a solidDB table and produce aresult set.

Here is an example of an abridged access specification containing a tablequeryelement.

Chapter 2. How to use the Database Toolkit 27

|||

||

|||||||||

|||||||||||||||

|||||||||||||||||||||||

|

Page 36: IBMInfoSphereStreams-DatabaseToolkit

<access_specification name="bargainRatings"><tablequery tablename="ratings">

<parameter_condition column="symbol" condition="equal" /><parameter_condition column="type" condition="equal" /><static_condition column="ratingInternal" condition="atLeast"

value="1" type="float64" /><static_condition column="epsExternal" condition="atMost"

value="6.3" type="float64" /></tablequery>

</access_specification>

Each tablequery element has a single attribute, tablename, whose value is thename of the database table against which the query is executed. The searchconstraints for the query are specified using the parameter_condition andstatic_condition elements, which are described in the next sections.

Static_condition element:

The static_condition element specifies a search constraint on the query declaredby its parent tablequery element.

A tablequery element can have zero or more static_condition elements. Theconstraint consists of the column of the database table to which it applies, thecondition that must be met, and the value to which to compare the column's value.

The static_condition element has four attributes.v column

The value of the column attribute is the name of a column of the table which isbeing queried. This column name must be specified as one of the attributeelements of the external_schema element of the same access specification. Thisattribute is required.

v condition

The value of the condition attribute specifies the type of constraint the columnvalue must meet. This attribute is required. Valid values are equal, atLeast,atMost, and like.

v value

The value of the value attribute is compared to the specified column's value todetermine if the search constraint is met. This attribute is required. Its value is aliteral of the SPL data type specified by the type attribute.

v type

The value of the type attribute is the SPL data type of the value attribute. Thisattribute is required. Valid values are int32, int64, float32, float64, andrstring.

Parameter_condition element:

The parameter_condition element specifies a search constraint on the querydeclared by its parent tablequery element.

A tablequery element can have zero or more parameter_condition elements. Theconstraint consists of the column of the database table to which it applies, thecondition that must be met, and the value to which to compare the column's value.This value is specified as a parameter in the SolidDBEnrich operator declarations

28 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 37: IBMInfoSphereStreams-DatabaseToolkit

that use this access specification. For each parameter_condition element, the accessspecification must have a corresponding parameter element to establish theassociation between the parameter_condition element and a parameter in theSolidDBEnrich operator declaration.

The parameter_condition element has two attributes.v column

The value of the column attribute is the name of a column in the table that isbeing queried. This column name must be specified as one of the attributeelements of the external_schema element of the same access specification. Thisattribute is required.

v condition

The value of the condition attribute specifies the type of constraint the columnvalue must meet. This attribute is required. Valid values are equal, atLeast,atMost, and like.

Parameters elementThe parameters element provides the linkage between a parametrized query,statement, or tablequery element in the same access specification and parametersin ODBCEnrich, ODBCRun, ODBCSource, or SolidDBEnrich operator declarations thatuse the access specification. This element declares the names for the parameters inthe operator declarations and provides information about the values thoseparameters can have, for example, their datatypes. Here is an example of anabridged access specification containing a parameter element; it extends theexample in “Tablequery element” on page 27.<access_specification name="bargainIndexRatings">

<tablequery tablename="ratings"><parameter_condition column="symbol" condition="equal" /><parameter_condition column="type" condition="equal" />

</tablequery><parameters>

<parameter name="ticker" type="rstring" /><parameter name="ttype" type="rstring" />

</parameters>

</access_specification>

The parameters element has no attributes, only parameter elements. The numberand order of the parameter elements depend on the contents of the query,statement, or tablequery element in the same access specification.v If the SQL SELECT statement in a query element, or the user-defined SQL

statement in a statement element, contain ODBC parameter markers ("?"), theaccess specification must contain a parameter element for each ODBC parametermarker. The parameter elements and ODBC parameter markers are associated inlexicographic order; that is, the first ODBC parameter marker in the SELECTstatement is associated with the first parameter element, and so on.

v If an access specification contains a tablequery element thathas parameter_condition elements, it must also contain a parameter element foreach parameter_condition element. A parameter element is associated with theparameter_condition element at the same ordinal position within its parent feedelement as the parameter element is within its parent parameters element.

Chapter 2. How to use the Database Toolkit 29

Page 38: IBMInfoSphereStreams-DatabaseToolkit

Parameter element:

The parameter element declares the name for an access specification dependentparameter of an Database Toolkit operator declaration. It also provides informationabout the values that the parameter can have.

The parameter element has five attributes.v name

The value of the name attribute specifies the name by which this parameter willbe identified in an Database Toolkit operator declaration. This attribute isrequired.

v type

The value of the type attribute specifies what SPL data type the value(s) of theparameter in an Database Toolkit operator declaration must have. This attributeis required.

v default

The value of the default attribute specifies the value that this parameter willhave if none is specified in an operator declaration. The attribute's value is aliteral of the SPL data type specified by the type attribute. This attribute isoptional.

v length

The value of the length attribute is the maximum length of a parameter valuewhose SPL data type is rstring. This attribute is ignored for other data types.This attribute is optional.

v cardinality

The value of the cardinality attribute is the number of values for thisparameter that must be specified in an operator declaration. Valid values for thisparameter are -1, 0, or any positive integer. The special value -1 indicates thatthis parameter must have one or more values specified in an operatordeclaration. This attribute is optional; if omitted, its value is 1.

Uses_connection elementA uses_connection element identifies a connection specification that can be usedwith the access specification.

Here is an example of an abridged access specification containing uses_connectionelements.<access_specification name="PersonSink">

.

.<uses_connection connection="testSysten" /><uses_connection connection="productionSystem" />

.

.

.</access_specification>

The uses_connection element has a single attribute, connection, whose value is thename of a connection_specification element in the same connection specificationsdocument. An access_specification element must have at least oneuses_connection element.

External_schema elementThe external_schema element specifies the schema of the data received from orsent to an external data service.

30 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 39: IBMInfoSphereStreams-DatabaseToolkit

Here is an example of an abridged access specification containing anexternal_schema element; it extends the example in “Query element” on page 25.<access_specification name="PersonRemainder">

<query query="SELECT id, fname, lname FROM personsrc"isolation_level="READ_UNCOMMITTED" replays="0" />

.

.

.<external_schema>

<attribute name="id" type="int32" /><attribute name="fname" type="rstring" length="15" /><attribute name="lname" type="rstring" length="20" />

</external_schema></access_specification>

The external_schema element has no attributes, only attribute elements. Thenumber and order of the attribute elements depend on the contents of the query,table, or tablequery element in the same access specification.v If an access specification contains a query element, its external_schema element

must have an attribute element for each column in the result set of the SQLSELECT statement in the query element. The attribute elements must be in thesame order as the columns of the result set.

v If an access specification contains a table element, its external_schema elementmust have an attribute element for each column in the database table namedby the tablename attribute of the table element. The attribute elements must bein the same order as the columns of that table.

v If an access specification contains a tablequery element, its external_schemaelement must have an attribute element for each column named in aparameter_condition or static_condition element of the tablequery element, aswell as each column that will be assigned to an output stream attribute in aSolidDBEnrich operator that uses this access specification. Since the solidDB SAAPI accesses columns based on their names, the order of the attribute elementsis not significant.

Important: It is a common mistake for the information in external_schemaelements to be inconsistent with that of the external data service. The SPLcompiler cannot flag these inconsistencies at compile time, but the DatabaseToolkit operators have internal checks that might result in a runtime failure,captured in the processing element logs. For information about tracing andlogging, see the IBM Streams Processing Language Streams Debugger Reference.

Attribute element:

The attribute element specifies information about a database table column froman external data service.

The attribute element has three attributes.v name

The value of the name attribute specifies the identifier by which a table column isknown in an external data service. The Database Toolkit operators use theseidentifiers exactly as specified to access data in the external service; for example,the operators do not change the case of the identifiers. This attribute is required.

v type

The value of the type attribute specifies the SPL data type that a table columnwill map to as an SPL attribute. Note that the value is not a type from the nativetype system of the external data service. This attribute is required.

Chapter 2. How to use the Database Toolkit 31

Page 40: IBMInfoSphereStreams-DatabaseToolkit

Note: The SPL data types supported by a specific database is limited to the datatypes that are actually supported by that database.The following table lists the valid type values (and their corresponding ODBCtypes) for the access specifications used by ODBCAppend, ODBCEnrich, ODBCRun,ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators.

Table 6. SPL to ODBC type mapping

SPL type value ODBC type

rstring SQLCHAR

int8 SQLCHAR

int16 SQLSMALLINT

int32 SQLINTEGER

int64 SQLBIGINT

float32 SQLREAL

float64 SQLDOUBLE

boolean SQLCHAR

The following table lists the valid type values (and their corresponding solidDBtypes) for the access specifications used by the SolidDBEnrich operator.

Table 7. SPL to solidDB type mapping

SPL type value solidDB SA type

float32 Float

int32 Int

int64 Long

float64 Double

rstring Str

v length

The value of the length attribute specifies the maximum length of a tablecolumn whose SPL type is rstring. The length value is representative of thesize of the column in the database. If the length value specified is smaller thanthe size of the database column, the data might be truncated. The lengthattribute is required for type rstring and is ignored for all other data types.

v key

The key attribute specifies whether the attribute is a key field in the table. Thisattribute is optional. If specified, the only valid value is true. This attribute isonly used for the DB2SplitDB operator and is ignored for all other operators.

32 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 41: IBMInfoSphereStreams-DatabaseToolkit

Chapter 3. Known issues and restrictions

The following table lists the problems that were encountered with the versions ofdatabases and ODBC that are supported by the toolkit. Future versions of thesedatabases and ODBC might resolve these problems.

Table 8. Database Toolkit known issues and restrictions

Issue DB/OS/Driver Workaround/Resolution

Data is truncated or lost usingthe ODBCAppend operatorconfigured to do batch inserts.

DB - SQLServer

OS - All 64-bit versions ofStreams-supported operatingsystems.

Driver - UnixODBC driver, usingthe Free TDS client driver.

The problem appears to be with the Free TDSclient driver. Tests were successful using analternate driver (EasySoft).

ODBC libraries for Oracle donot support the SQL_C_SBIGINTODBC type, which correspondsto the int64 SPL type.

DB - Oracle

OS - All operating systems

Driver - All drivers

There is no workaround. Users using an Oracledatabase should not use the SPL int64 type intheir applications.

Issue with int32/SQLINTEGERtype during batch inserts -SQLINTEGER are defined as 8bytes instead of 4 bytes.

DB - Oracle

OS - All operating systems

Driver - All drivers

The default code generation for an ODBCAppendoperator configured to use Oracle andtransaction_batchsize > 1 is to define anint32 schema type as the C type SQLLEN. If thiscauses your application to insert an incorrectnumber of bytes for your int32 field, definethe environment variableSTREAMS_ADAPTERS_ODBC_ORACLE_SQLINTEGER=1and recompile the application. This willgenerate code for int32 schema types as a Ctype of SQLINTEGER instead.

Locking issues can occur whenrunning multiple ODBCAppendoperators writing to the sametable whentransaction_batchsize > 1.

DB - Informix

OS - All operating systems

Driver - All drivers

When using Informix, if an applicationrequires multiple ODBCAppend operatorinstances writing to the same table in parallel,set transaction_batchsize=1 for the tableelement being used by the ODBCAppend operatorin the connections.xml file.

Several issues with usingUnixODBC version 2.2.

DB - All DBs

OS - All operating systems

Driver - UnixODBC 2.2

It is recommended that users use UnixODBC2.3 or greater with Database Toolkitapplications.

Rowsets greater than one resultin only the first row of therowset being inserted whenusing MySQL ODBC connectorversion 5.1.7 or earlier.

DB - MySQL

OS - All operating systems

ODBC Connector version 5.1.7 orearlier

Use the MySQL ODBC Connector driverversion 5.1.8 or later when using the MySQLdatabase in an application.

Oracle is not currentlysupported on Red HatEnterprise Linux Version 6. TheOracle client is needed tointerface with the Oracledatabase.

DB - Oracle

OS - RHEL6

Driver - All drivers

This is a current restriction for users using anOracle database.

© Copyright IBM Corp. 2009, 2012 33

|||||

|

|

||

|||

||||||

|

|

|

||

Page 42: IBMInfoSphereStreams-DatabaseToolkit

Table 8. Database Toolkit known issues and restrictions (continued)

Issue DB/OS/Driver Workaround/Resolution

On IBM® POWER7® systemsrunning Red Hat EnterpriseLinux Version 6, IBM solidDB®,Netezza®, and Oracle databasesare not supported.

DB - Oracle

DB - IBM solidDB®

DB - Netezza®

OS - RHEL6

Driver - All drivers

This is a current restriction for users usingIBM® POWER7® systems.

34 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

|||||

|

|

|

|

|

||

Page 43: IBMInfoSphereStreams-DatabaseToolkit

Chapter 4. Connection setup and debug

While setup of your external data source and ODBC configuration is outside thescope of the Database Toolkit documentation, the Database Toolkit provides aprogram that can be used to help find setup and configuration issues. The sourcefor this program, called odbchelper, is provided, along with the Makefile, whichallows you to build the program using your own version of UnixODBC.

Note: The odbchelper program is provided to help debug connection andconfiguration issues. It is not intended to be used in an application.

Building odbchelperAbout this task

Before using the odbchelper program, you must create the program from sourceusing the following procedure.

Procedure1. Create a new directory. For example, you can create a directory in your home

directory.mkdir $HOME/odbchelper

2. Copy the odbchelper source and Makefile to the directory.cp -R $STREAMS_INSTALL/toolkits/com.ibm.streams.db/etc/odbchelper/* $HOME/odbchelper

3. Go to the odbchelper directory.cd $HOME/odbchelper

4. Set the environment variables STREAMS_ADAPTERS_ODBC_INCPATH andSTREAMS_ADAPTERS_ODBC_LIBPATH to the locations where the UnixODBC includeand library files are located, respectively.export STREAMS_ADAPTERS_ODBC_INCPATH=$HOME/unixodbc/includeexport STREAMS_ADAPTERS_ODBC_LIBPATH=$HOME/unixodbc/lib

Note: If you use the operating-system-supplied UnixODBC package, you donot need to set these environment variables.

5. Run make to build the odbchelper program.make

Using odbchelperThe odbchelper program has several action flags that can be used.v help: Displays the options and parameters available.v testconnection: Tests the connection to an external data source instance with a

user ID and password.v runsqlstmt: Runs an SQL statement, either passed in on the command

invocation, or in a specified file.v runsqlquery: Runs an SQL query, either passed in on the command invocation,

or in a specified file. The results of the query are returned to STDOUT.v loaddelimitedfile: Allows you to pass in a comma-delimited file, used to create

and populate a database table.

© Copyright IBM Corp. 2009, 2012 35

Page 44: IBMInfoSphereStreams-DatabaseToolkit

To use one of the action options, invoke odbchelper followed by the action option(and any additional parameters required). For example,$HOME/odbchelper/odbchelper help$HOME/odbchelper/odbchelper runsqlstmt -i myinstance -u myuserid -p mypassword -stmt ’DROP TABLE MYTABLE’

A common use of odbchelper is to test that the external data source information inthe connections.xml file is correct for your external data source setup. Thetestconnection action flag allows you to do this. For example, if theconnection_specification portion of the connection.xml file is:<connection_specification name="mydatabaseconnection" >

<ODBC database="mydatabase" user="myuserid" password="mypassword" /></connection_specification>

You run the following odbchelper invocation to test the connection:$HOME/odbchelper/odbchelper testconnection -i mydatabase -u myuserid -p mypassword

36 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 45: IBMInfoSphereStreams-DatabaseToolkit

Chapter 5. DB2 partition layout and debug

The Database Toolkit provides a program that can be used to determine thenumber of partitions in the database. The source for this program, calleddb2helper, is provided, along with the Makefile, which allows you to build theprogram using your own version of DB2 installation libraries. This program givesyou the partition information you require to set up your application. For example,the number of DB2PartitionedAppend operators to include in the application,whether partitions are logical or configured on different physical nodes and so on.

Building db2helperAbout this task

Before using the db2helper program, you must create the program from sourceusing the following procedure.

Procedure1. Create a new directory. For example, you can create a directory in your home

directory.mkdir $HOME/db2helper

2. Copy the db2helper source and Makefile to the directory.cp -R $STREAMS_INSTALL/toolkits/com.ibm.streams.db/etc/db2helper/* $HOME/db2helper

3. Go to the db2helper directory.cd $HOME/db2helper

4. Set the environment variables STREAMS_ADAPTERS_DB2_INCPATH andSTREAMS_ADAPTERS_DB2_LIBPATH to the locations for the DB2 client include andlibrary files, respectively.export STREAMS_ADAPTERS_DB2_INCPATH=/opt/ibm/db2/V9.7/includeexport STREAMS_ADAPTERS_DB2_LIBPATH=/opt/ibm/db2/V9.7/lib64

5. Run make to build the db2helper program.make

Using db2helperThe db2helper program has several action flags that can be used.v help: Displays the options and parameters available.v testconnection: Tests the connection to an external data source instance with a

user ID and password.v partitionconfig: Lists the partition configuration for the external DB2 data source

instance.

To use one of the action options, invoke db2helper followed by the action option(and any additional parameters required). For example,$HOME/db2helper/db2helper help$HOME/db2helper/db2helper partitionconfig -i myinstance -u myuserid -p mypassword -table tablename[-key keyvalue]

© Copyright IBM Corp. 2009, 2012 37

Page 46: IBMInfoSphereStreams-DatabaseToolkit

38 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 47: IBMInfoSphereStreams-DatabaseToolkit

Chapter 6. Sample applications

The Database Toolkit contains a set of simple sample applications illustrating howto use the operators. In the $STREAMS_INSTALL/toolkits/com.ibm.streams.db/samples directory, there are four subdirectories named for each of the fouroperators. Each of these directories contains an SPL source file for the sampleapplication, a Makefile, an info.xml file, and three subdirectories named data, etc,and .settings. The etc subdirectory contains a connections.xml document that isused by the application.

Before using the samples, make sure that Streams is installed and that theSTREAMS_INSTALL environment variable is set to the Streams install directory. Youshould also set the appropriate toolkit environment variables for the database thatyou will be using to run the sample. See Chapter 2, “How to use the DatabaseToolkit,” on page 3 for instructions on setting the necessary compile-timeenvironment variables for the database you are using.

Each sample contains two files, Setup.sql and Cleanup.sql. The Setup.sql filecontains SQL statements for creating and populating the table used by the sample.The Cleanup.sql contains SQL statements for removing the table. You should referto your database's specific documentation for details on how to run SQLstatements for your particular database.

Alternatively, you can use the odbchelper program to run the SQL commands inthese files, using the odbchelper runsqlstmt action flag. For more information, see“Using odbchelper” on page 35.

Working with the samples in the command-line environmentAbout this task

Create your own copy of the samples prior to compiling and running them, as youwill need to make modifications to some of the configuration files.

Procedure1. Create a new directory. For example, you can create a directory in your home

directory.mkdir $HOME/dbsamples

2. Copy the samples to this directory.cp -R $STREAMS_INSTALL/toolkits/com.ibm.streams.db/samples/* $HOME/dbsamples/

What to do next

Update the database configuration information needed to access the external datasource. For more information about updating the database configuration, see“Updating database configuration information.”

Updating database configuration informationBefore you begin

Create a copy of the sample applications.

© Copyright IBM Corp. 2009, 2012 39

Page 48: IBMInfoSphereStreams-DatabaseToolkit

About this task

After you create a copy of the sample applications, you need to update theconnections.xml file with the database configuration information needed to accessthe external data source.

Procedure1. Open the connections.xml file in your favorite editor, for example, emacs, and

find the following section. For example, if you want to compile and run theODBCSource sample application, you would need to open the$HOME/dbsamples/ODBCSource/etc/connections.xml file.<connection_specification name="DBPerson"><ODBC database="replace-with-database-name" user="replace-with-userid"

password="replace-with-password" /></connection_specification>

2. Update the database configuration information:a. Replace replace-with-database-name with the name of the database you are

connecting to.b. Replace replace-with-userid and replace-with-password with the userid

and password you are using to connect to the database.c. For the SolidDBEnrich sample, replace replace-with-host,

replace-with-port, replace-with-userid, and replace-with-password. Formore information about these values, see “Connection_specificationElement” on page 23.

d. For the DB2ParallelWriter sample, replace replace-with-schema with theDB2 database schema name for your database.

What to do next

After you have updated the connections.xml file, you can perform any of thefollowing tasks:v Run make in the application directory. By default, the sample is compiled as a

distributed application.v To compile the application as a stand-alone application, run make standalone.v To remove all the generated files and return the sample to its original state, run

make clean.

Working with the samples in Streams StudioAbout this task

To import the sample application into Streams Studio, you must first add theDatabase Toolkit to the Toolkit Locations section. You need to add the toolkitlocation only once.

Procedure1. Add the toolkit location:

a. From the Streams Explorer, right-click Toolkit Locations.b. Select Add Toolkit Location.c. Enter the directory or click Directory to select the install location of the

Database Toolkit, and click OK. The Database Toolkit is located in the$STREAMS_INSTALL/toolkits/com.ibm.streams.db directory.

2. Import the sample application:

40 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 49: IBMInfoSphereStreams-DatabaseToolkit

a. Click File > Import.b. Expand the InfoSphere Streams folder, and select SPL Project.c. Enter the directory or click Browse to select the directory of the sample you

wish to import, and click Finish.

What to do next

Update the database configuration information needed to access the external datasource. For more information about updating the database configuration, see“Updating database configuration information.”

Updating database configuration informationBefore you begin

Import the sample application into Streams Studio.

About this task

After you import the sample application project, you need to update theconnections.xml file with the database configuration information needed to accessthe external data source.

Procedure1. In the Project Explorer, under the project for the sample you imported, expand

Resources and etc.2. Open the connections.xml file in the Eclipse editor, and find the following

section.<connection_specification name="DBPerson"><ODBC database="replace-with-database-name" user="replace-with-userid"

password="replace-with-password" /></connection_specification>

3. Update the database configuration information:a. Replace replace-with-database-name with the name of the database you are

connecting to.b. Replace replace-with-userid and replace-with-password with the userid

and password you are using to connect to the database.c. For the SolidDBEnrich sample, replace replace-with-host,

replace-with-port, replace-with-userid, and replace-with-password. Formore information about these values, see “Connection_specificationElement” on page 23.

d. For the DB2ParallelWriter sample, replace replace-with-schema with theDB2 database schema name for your database.

What to do next

After you have updated the connections.xml file, you can use Streams Studio tobuild and run the application. For more information about using Streams Studio,see the IBM InfoSphere Streams: Studio Installation and User's Guide.

Chapter 6. Sample applications 41

Page 50: IBMInfoSphereStreams-DatabaseToolkit

42 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 51: IBMInfoSphereStreams-DatabaseToolkit

Notices

This information was developed for products and services offered in the U.S.A.Information about non-IBM products is based on information available at the timeof first publication of this document and is subject to change.

IBM may not offer the products, services, or features discussed in this document inother countries. Consult your local IBM representative for information on theproducts and services currently available in your area. Any reference to an IBMproduct, program, or service is not intended to state or imply that only that IBMproduct, program, or service may be used. Any functionally equivalent product,program, or service that does not infringe any IBM intellectual property right maybe used instead. However, it is the user's responsibility to evaluate and verify theoperation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matterdescribed in this document. The furnishing of this document does not grant youany license to these patents. You can send license inquiries, in writing, to:

IBM Director of LicensingIBM CorporationNorth Castle DriveArmonk, NY 10504-1785U.S.A.

For license inquiries regarding double-byte character set (DBCS) information,contact the IBM Intellectual Property Department in your country or sendinquiries, in writing, to:

Intellectual Property LicensingLegal and Intellectual Property LawIBM Japan Ltd.1623-14, Shimotsuruma,Yamato-shiKanagawa 242-8502 Japan

The following paragraph does not apply to the United Kingdom or any othercountry/region where such provisions are inconsistent with local law:INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THISPUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHEREXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESSFOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express orimplied warranties in certain transactions; therefore, this statement may not applyto you.

This information could include technical inaccuracies or typographical errors.Changes are periodically made to the information herein; these changes will beincorporated in new editions of the publication. IBM may make improvementsand/or changes in the product(s) and/or the program(s) described in thispublication at any time without notice.

© Copyright IBM Corp. 2009, 2012 43

Page 52: IBMInfoSphereStreams-DatabaseToolkit

Any references in this information to non-IBM Web sites are provided forconvenience only and do not in any manner serve as an endorsement of those Websites. The materials at those Web sites are not part of the materials for this IBMproduct and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way itbelieves appropriate without incurring any obligation to you.

Licensees of this program who wish to have information about it for the purposeof enabling: (i) the exchange of information between independently createdprograms and other programs (including this one) and (ii) the mutual use of theinformation that has been exchanged, should contact:

IBM Canada LimitedOffice of the Lab Director8200 Warden AvenueMarkham, OntarioL6G 1C7CANADA

Such information may be available, subject to appropriate terms and conditions,including, in some cases, payment of a fee.

The licensed program described in this document and all licensed materialavailable for it are provided by IBM under terms of the IBM Customer Agreement,IBM International Program License Agreement, or any equivalent agreementbetween us.

Any performance data contained herein was determined in a controlledenvironment. Therefore, the results obtained in other operating environments mayvary significantly. Some measurements may have been made on development-levelsystems, and there is no guarantee that these measurements will be the same ongenerally available systems. Furthermore, some measurements may have beenestimated through extrapolation. Actual results may vary. Users of this documentshould verify the applicable data for their specific environment.

Information concerning non-IBM products was obtained from the suppliers ofthose products, their published announcements, or other publicly available sources.IBM has not tested those products and cannot confirm the accuracy ofperformance, compatibility, or any other claims related to non-IBM products.Questions on the capabilities of non-IBM products should be addressed to thesuppliers of those products.

All statements regarding IBM's future direction or intent are subject to change orwithdrawal without notice, and represent goals and objectives only.

This information may contain examples of data and reports used in daily businessoperations. To illustrate them as completely as possible, the examples include thenames of individuals, companies, brands, and products. All of these names arefictitious, and any similarity to the names and addresses used by an actualbusiness enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs, in source language, whichillustrate programming techniques on various operating platforms. You may copy,

44 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 53: IBMInfoSphereStreams-DatabaseToolkit

modify, and distribute these sample programs in any form without payment toIBM for the purposes of developing, using, marketing, or distributing applicationprograms conforming to the application programming interface for the operatingplatform for which the sample programs are written. These examples have notbeen thoroughly tested under all conditions. IBM, therefore, cannot guarantee orimply reliability, serviceability, or function of these programs. The sampleprograms are provided “AS IS”, without warranty of any kind. IBM shall not beliable for any damages arising out of your use of the sample programs.

Each copy or any portion of these sample programs or any derivative work mustinclude a copyright notice as follows:

© (your company name) (year). Portions of this code are derived from IBM Corp.Sample Programs. © Copyright IBM Corp. _enter the year or years_. All rightsreserved.

Trademarks

IBM, the IBM logo, ibm.com and InfoSphere are trademarks or registeredtrademarks of International Business Machines Corp., registered in manyjurisdictions worldwide. A current list of IBM trademarks is available on the Webat “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

The following terms are trademarks or registered trademarks of other companiesv Linux is a registered trademark of Linus Torvalds in the United States, other

countries, or both.v Java and all Java-based trademarks and logos are trademarks of Sun

Microsystems, Inc. in the United States, other countries, or both.v UNIX is a registered trademark of The Open Group in the United States and

other countries.v Microsoft, Windows, Windows NT, and the Windows logo are trademarks of

Microsoft Corporation in the United States, other countries, or both.

Other product and service names might be trademarks of IBM or other companies.

Notices 45

Page 54: IBMInfoSphereStreams-DatabaseToolkit

46 IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Page 55: IBMInfoSphereStreams-DatabaseToolkit
Page 56: IBMInfoSphereStreams-DatabaseToolkit

����

Printed in USA