introduction - ibm · introduction batch processing is a mission critical workload for the...

Introduction

Batch processing is a mission critical workload for the enterprise. End of day/month/year reporting, bulk account processing for credit scores and assessing interest, and reconciling banking activities are just a few examples of tasks carried out with batch processing. Whether they like it or not, or even are cognizant of its role and importance, enterprises depend on batch processing. Yet, in this context, enterprises are suffering. Over the past decade, their online-transactional processing systems (OLTP) have evolved, where application servers such as WebSphere® Application Server served as the foundation for that evolution.

As the standards for Web services and other OLTP technologies emerged, programming models such as JEE were standardized, and service-oriented architecture (SOA) was pursued. Throughout this evolution, however, the batch programming model was perhaps not given as much attention since there were no significant changes to the batch programming model

WebSphere® Compute Grid provides a batch framework for rapidly generating batch programs, along with a Java-based programming model which makes development and built of dependable batch programs to be created with total reliability of running in a WebSphere Application Server.

WebSphere Compute Grid is a complete enterprise Java batch programming solution that has the following features:

• Concise and powerful programming model based on POJO (plain old Java object).• Simple packaging.• Simple deployment model.• Full-featured job control language (JCL).• Sophisticated job scheduler.• Robust execution environment.• Comprehensive workload management and administrative tools.

Rational® Application Developer provides a full-function development environment for WebSphere Compute Grid, including the following tools:

• Batch project creation wizard• Batch job creation wizard• Batch project deployment descriptor editor. With WebSphere Compute Grid tools included in Rational Application Developer and WebSphere Compute Grid in WebSphere Application Server, you can develop and accommodate quickly develop transactional applications and applications for grid work.

In this tutorial, you see how to use the WebSphere Compute Grid tools provided by Rational Application Developer to create batch programs easily and execute these programs on WebSphere Application Server.

1. Anatomy of a batch job Batch jobs commonly process large volumes of input/output data that are frequently record-oriented, usually representing critical business data, such as customer accounts, sales, and so on. Business processing tasks performed by batch jobs can be wide-ranging, for example invoice

generation, account optimization, and opportunity analysis. Batch jobs have been used in the System z (mainframe) environment for decades and continue as a backbone of many large and medium sized businesses to this day.

A batch job is a declarative construct that directs the execution of a series of one or more batch applications, and specifies their inputs and outputs. A batch job performs this set of tasks in sequence to accomplish a particular business function. Batch applications are programs designed to run in the background with no human interaction . Input and output is generally accessed as logical constructs by the batch application and are mapped to concrete data resources by the batch job definition. The basic anatomy of a batch job is shown in Figure 1.

Figure 1. Batch job anatomy

The job definition describes the batch steps to be executed and the sequence of execution. Each step is defined with a particular batch application to call, and its input and output data. Common sources and destinations for data can be files, databases, transaction systems, and message queues.

2. Batch programming model

A WebSphere Compute Grid batch application consists of a set of POJOs and runs under the control of the WebSphere Compute Grid batch container, which itself runs as an extension to a standard WebSphere Application Server. Figure 2 depicts the application components and their relationship to the batch container.

Figure 2. Batch application anatomy

The batch container runs a batch job under the control of an asynchronous bean, which is similar to a container-managed thread. The batch container processes a job definition and carries out its lifecycle, using an async bean as the unit of execution.

A batch application is made up of these user-provided components:

• Batch job step: This POJO provides the business logic to execute as a step in a batch job. The batch container runs the batch job step while processing a job definition.

• Batch data stream: This POJO provides the batch job step with access to data. A batch application can be written to access one batch data stream or several. A batch data stream can be written to provide access to any sort of data, including data from relational databases, file systems, and message queues, through J2C connectors.

The batch container does open, close, and checkpoint-related callbacks to a batch data stream during the lifecycle of a job step. The batch job step itself calls methods on the batch data stream to get and put data.

A batch application can optionally include these user-provided components:

• Checkpoint algorithm: The batch container provides a checkpoint/restart mechanism to support job restart from a known-point of consistency. A job might need to be interrupted and then subsequently restarted after a planned or unplanned outage. The batch container calls the checkpoint algorithm periodically to determine if it is time to take a checkpoint.

WebSphere Compute Grid provides two pre-built checkpoint algorithms, one that supports a time-based checkpoint interval, and another that supports a checkpoint interval based on record-count.

• Results algorithm: Each batch job step supplies a return code when it is completed. The results algorithm has visibility to the return codes from all steps in a batch job and returns a final, overall return code for the job as a whole.

WebSphere Compute Grid provides a pre-built results algorithm that returns the numerically highest step return code as the overall job return code.

After this brief introduction to the batch container, which is part of the WebSphere Compute Grid runtime, here is a list of some of the functions about the batch container:

• It orchestrates the lifecycle of a batch job, according to a job definition.• It generates a job log to capture the history of a job’s execution, including standard stream

output from the batch job steps.• It collects performance and usage metrics, to facilitate work load management and accounting

functions.• It provides a powerful job failover model, based on checkpoint/restart semantics.

Additional details about the batch container are beyond the scope of this article. For further information, see Resources.

1.Batch programming interfaces

The WebSphere Compute Grid batch programming model consists of four main interfaces. Two of these interfaces are essential to build a batch application, and two are optional and intended for advanced scenarios.

• Essential interfaces◦ BatchJobStepInterface defines the interaction between the batch container and the batch

application.

◦ BatchDataStream abstracts a particular input source or output destination for a batch application and defines the interaction between WebSphere Compute Grid and a concrete BatchDataStream implementation.

• Optional interfaces◦ CheckpointPolicyAlgorithm defines the interaction between WebSphere Compute Grid

and a custom checkpoint policy implementation. A checkpoint policy is used to determine the time when WebSphere Compute Grid created a checkpoint for a running batch job so that the job can be restarted after a planned or unplanned interruption. WebSphere Compute Grid includes two ready-to-use checkpoint policies, shown in Table 3.

◦ ResultsAlgorithm defines the interaction between WebSphere Compute Grid and a custom results algorithm. The purpose of the results algorithm is to provide the overall return code for a job. The algorithm has visibility to the return codes from each of the job steps. WebSphere Compute Grid includes one ready-to-use results algorithm, shown in Table 3.

Developing a simple batch application

In this tutorial, a bank system is simulated, where the system must execute a process where specific client accounts are updated according to a specific condition, which is that all the accounts with a balance lower than 4100 be updated by increasing that balance ten times. This process must be

executed at specific intervals, which could be daily, weekly, or monthly.

This tutorial explains how to create a batch project and batch job with all its artifacts to retrieve information from a database, process the records, and update the proper existent database with the results. The batch job will be started by using a scheduler, which is also explained in this tutorial. Briefly, these are the steps:

1. Set up the environment 2. Start the database

1. Create a data source connection 2. Create tables 3. Insert records

3. Open the WebSphere Application Server administrative console 4. Create a JDBC provider in WebSphere Application Server 5. Create a data source in WebSphere Application Server 6. Create a Batch project in Rational Application Developer 7. Create a Batch job, a job step, an input stream, and an output stream 8. Edit the created classes

1. Create a data bean 2. Create the fields, and the getter and setter methods 3. Edit batch processor, input class, output class

9. Run the batch job 10. Create a job scheduler in WebSphere Application Server to run the batch job automatically 11. Verify the results of the batch job execution

Set up the environment

1. Install WebSphere Application Server 8.5 server or later.2. Install Rational Application Developer 9.1.1 or later and be sure to select the WebSphere

Batch feature during the installation.

Start the database

1. Locate the startNetworkServer script in your WebSphere Application Server installation folder. The default location is the path {WebSphere Application Server installationFolder}/derby/bin/networkServer. Start the script by double-clicking it.

2. Create a data source connection in Rational Application Developer:

1. In the Data Source Explorer view, right-click Database Connections and click New.2. Select Derby as database manager

3. Select Derby 10.8 – Derby Client JDBC Driver Default as JDBC Driver in the New Connection dialog. Supply the required details (see the following screenshot).Click Test Connection and ensure that the connection is established successfully.

3. After the connection is successfully established, right click the data base and select New SQL Script, then create any required queries. In the next two steps, you will create three tables and

populate them with data.4. Run the following commands on the database to create the tables.

create table SAMPLESCHEMA.CUSTOMER (name varchar(100),customerID varchar(50),email varchar(100),accountNumber Integer);create table SAMPLESCHEMA.CUSTOMER_ACCOUNT (customerID varchar(50), accountNumber Integer);create table SAMPLESCHEMA.BALANCE (accountNumber Integer, accountBalance DOUBLE PRECISION);

5. Run the following scripts on the database to insert data into these tables.

INSERT INTO SAMPLESCHEMA.CUSTOMER (name,customerID,email,accountNumber) VALUES ('John Smith', 'client0001', '[email protected]', 1);INSERT INTO SAMPLESCHEMA.CUSTOMER (name,customerID,email,accountNumber) VALUES ('Jane Johnson', 'client0002', '[email protected]', 2);INSERT INTO SAMPLESCHEMA.CUSTOMER (name,customerID,email,accountNumber) VALUES ('Deep Blue', 'client0003', '[email protected]', 3);

INSERT INTO SAMPLESCHEMA.CUSTOMER_ACCOUNT (customerID, accountNumber) VALUES ('client0001', 1);INSERT INTO SAMPLESCHEMA.CUSTOMER_ACCOUNT (customerID, accountNumber) VALUES ('client0001', 2);INSERT INTO SAMPLESCHEMA.CUSTOMER_ACCOUNT (customerID, accountNumber) VALUES ('client0001', 3);INSERT INTO SAMPLESCHEMA.CUSTOMER_ACCOUNT (customerID, accountNumber) VALUES ('client0002', 4);INSERT INTO SAMPLESCHEMA.CUSTOMER_ACCOUNT (customerID, accountNumber) VALUES ('client0003', 5);

INSERT INTO SAMPLESCHEMA.BALANCE (accountNumber, accountBalance) VALUES (2, 4200);INSERT INTO SAMPLESCHEMA.BALANCE (accountNumber, accountBalance) VALUES (3, 1200);INSERT INTO SAMPLESCHEMA.BALANCE (accountNumber, accountBalance) VALUES (4, 6200);INSERT INTO SAMPLESCHEMA.BALANCE (accountNumber, accountBalance) VALUES (5, 1200);INSERT INTO SAMPLESCHEMA.BALANCE (accountNumber, accountBalance) VALUES (1, 2200);

Open the WebSphere Application Server Administrative Console

In Rational Application Developer, in the Servers view, right-click WebSphere Application Server, and click Administration > Run Administrative Console.

Create a JDBC provider in WebSphere Application Server

1. On the Administrative console, click Resources > JDBC > JDBC providers.2. In the JDBC Providers page, select a scope from the dropdown menu and click New. On the

Create new JDBC provider page, provide the required values. For this tutorial, the following values were used.

Click Next twice, until the Summary page is shown, then click Finish.

Create a data source in WebSphere Application Server

1. Click the just created JDBC provider. Go to Addtional Properties and click Data sources.2. Click Resources > Data sources. In the Data sources page click New.3. On the Create a data source page, provide the required information. 4. Provide the JNDI name by using the following format: jdbc/<name of the data source>. 5. Click Next.

Provide the Database name and click Next twice.1. On the Summary page click Finish. Click Save to keep the changes into the mater

configuration.2. Click the data source you just created. Click on the following link

3. On the page, provide the required information to authenticate in the database.

Create a Batch Project in Rational Application Developer

1. Click File > New > Batch Project2. Provide a Project name.3. Select the Target runtime and click Finish.

Create a batch job, a job step, an input stream, and an output stream

1. Right-click the project you just created and click New > Batch Job.2. Provide a name for the batch job and click Next.

Create a job step by supplying the following details:

1. Select Generic option for the Select Pattern2. Provide a name for the step.3. Specify the pattern to be generic.4. On the BATCHRECORDPROCESSOR property click Create and set the following values:

1. Package: com.ibm.batch 2. Name: BatchProcessor3. Click Finish.

In the Algorithms section:

1. Click the Add button next to Checkpoint Algorithm field.2. Ensure that the pattern on the Checkpoint Algorithm page is specified as Record Based.3. Provide the following values:4. Name: Checkpoint Algorithm5. Transaction TimeOut: 1006. recordcount: 17. Click Finish

Add a Result Algorithm:

1. Click the Add button next to Result Algorithm2. On the Result Algorithm page, provide the following details:

1. Name: Result Algorithm2. Select Pattern: Job Sum3. Click Finish.

3.Click Next

Add an input stream by providing the following data on the Step Stream page:

1. Specify a name for the stream.2. Specify the pattern to be JDBC Reader.3. In the Required Properties section:4. PATTERN_IMPL_CLASS: Click Create

1. Package: com.ibm.batch2. Name: InputStream

5. ds_jndi_name: {provide the name of your datasource JNDI name}

6. Click Next

Create an output stream by providing the following data on the Step Stream page:

1. Specify a name for the stream.2. Specify the pattern to be JDBC Writer3. In the Required Properties section:

1. PATTERN_IMPL_CLASS: Click Create1. Package: com.ibm.batch2. Name: OutputStream

On the Optional Properties section add a property named ds_jndi_name and set your datasource jndi name as the value.

Click Finish.

Edit the created classes

Before editing the classes that you created, you must create a data bean to store the information that is retrieved from the database.

1. Create a Java class and name it CustomerBean.

2. Create the following fields, and their getters and setters methods to access those fields:

private String customerNameprivate String customerID;private String email;private int accountNumber;private double accountBalance;

public void setCustomerName(String customerName)public String getCustomerName()public void setCustomerID(String customerID)public String getCustomerID()public void setEmail(String email)public String getEmail()public void setAccountNumber(int accountNumber)public int getAccountNumber()public void setAccountBalance(double accountBalance)public double getAccountBalance()

3. Edit the created classes. To apply the business logic to the data that is retrieved from the database, open BatchProcessor.java and replace the completeProcessing() method with the following code:

public int completeProcessing() {System.out.println("Complete processing");return 0;

}

4. Replace the processRecord() method with the following code

public Object processRecord(Object arg0) throws Exception {CustomerBean customer = (CustomerBean) arg0;System.out.println("Customer:" + customer.getCustomerID() + " information

retrieved");return customer;

}

5. Save the changes and close this editor.

6. Open the InputStream.java class. This class is the class that is used to retrieve data from the database. Therefore, the class needs the SQL query to get the records to be processed. Insert the following code in this class:

private final String customerID = "customerID";private final String accountNumber = "accountNumber";private final String accountBalance = "accountBalance";private final String email = "email";private final String customerName = "name";private String sqlQuery = "SELECT * FROM SAMPLESCHEMA.CUSTOMER customer, SAMPLESCHEMA.BALANCE balance WHERE balance.accountNumber = customer.accountNumber AND balance.accountBalance < 4500 ";

7. Replace the fetchRecord() method with the following lines:

public Object fetchRecord(ResultSet arg0) {CustomerBean customer = new CustomerBean();try {

customer.setCustomerID(arg0.getString(customerID));customer.setAccountNumber(arg0.getInt(accountNumber));customer.setCustomerName(arg0.getString(customerName));customer.setAccountBalance(arg0.getDouble(accountBalance));customer.setEmail(arg0.getString(email));

} catch (SQLException e) {e.printStackTrace();

}return customer;

}

8. For the getInitialLookupQuery() and getRestartQuery() make them return the existent query:

return sqlQuery;

9. Replace the getRestartTokens() method with the following lines:

public String getRestartTokens() {String token = "0";return token;

}

10. Save changes and close this file.

Now let's go to the OutputStream.java class. This class is used to update the records in the database after the retrieved data has been processed by the InputStream.java class

On this class we are going to construct an SQL query for updating records.

11. Open the OutputStream.java class and add the following lines to create an SQL query for updating the records.

protected String tableName = "SAMPLESCHEMA.BALANCE";protected String sqlQueryPreTablename = "UPDATE ";protected String sqlQueryPostTablename = " SET ACCOUNTBALANCE = ? WHERE ACCOUNTNUMBER = ? ";

12. In the next method, build the query by replacing the existent code for getSQLQuery() with the following lines:

public String getSQLQuery() {String query=this.sqlQueryPreTablename+this.tableName+this.sqlQueryPostTablename;return query;

}

13. In the writeRecord() method, replace the existent code with the following lines:

public PreparedStatement writeRecord(PreparedStatement pstmt, Object record) {if(record instanceof CustomerBean){

try{

System.out.println("Updating the customer record into the balance table");

CustomerBean customerRecord=(CustomerBean)record;pstmt.setInt(2,customerRecord.getAccountNumber());System.out.println("Customer Account Number:" +

customerRecord.getAccountNumber());pstmt.setDouble(1,customerRecord.getAccountBalance() * 10);pstmt.executeUpdate();

}catch(SQLException sqle) {System.out.println("Exception while making the prepared Statement");sqle.printStackTrace();

}

} else {System.out.println("Record is not an instance of the Customer");

}

return pstmt;}

14. Save changes and close the file.

Running the sample

To run this sample, you must add the EAR project to the server and be sure the server is running. Then, right-click the MyJobBatch.xml file in the xJCL folder of MyBatchProject and click Run As > Modern Batch Job.

When the batch job starts, the Modern Batch Job Management Console for WebSphere is opened automatically, showing the job log with the configuration of the xJCL file.

To verify that all of the jobs were started in the current server from the just opened page, go to the bottom of the page and click View jobs.

A list is displayed with all the executed jobs and their status. When a job is executed successfully, the status is shown as ended.

Now you have a running batch job. This procedure will work every time you launch it, but what if you want to run the procedure automatically? You will then need a scheduler.You must define a scheduler is to launch a specific batch job at specific intervals(daily, weekly, or monthly). To create and configure a job scheduler, follow the instructions in the remaining part of this document. These steps are needed only if you need a batch job to be launched by the server scheduler.

Create and Configure a Job Scheduler in WebSphere Application Server

The job scheduler accepts job submissions and determines where to run them. As part of managing jobs, the job scheduler stores job information in an external job database. To configure a job scheduler, you must specify how to select the deployment target, and supply the data source JNDI name, database schema name, and endpoint job log location.

1. Click Run Administrative Console > Security > Global security. In the Application security section, ensure that application security is enabled.

2. Open the Modern Batch Job Management Console and click Schedule Management, click Create a scheduler, and provide the data for the schedule.

3. Click Next and select the Batch Job you just created, by specifying the file location.

4. Click Next. The schedule is created.

5. Verify the Performance Monitoring Infrastructure is enabled in selecting Monitoring and Tuning > Performance Monitoring Infrastructure > Enable Performance Monitoring Infrastructure from the Administrative Console of the server.

6. Now you only need to wait for the time you specified in the scheduler, and the batch will be executed automatically.

7. To verify the results of the batch job execution, you can create an SQL query in the data source explorer view and compare the result with the initial values of the SAMPLESCHEMA.BALANCE table. So you will need to execute the following SQL query:

SELECT c.name, c.customerID, c.email, c.accountNumber, b.accountBalance FROM SAMPLESCHEMA.CUSTOMER c, SAMPLESCHEMA.BALANCE b WHERE c.accountNumber = b.accountNumber;

When the above query is run before the batch job process is launched, the values in the database would be something similar to the following table:

After the batch job process is executed, if you run the above query, the result should be something like the following table:

Batch programming interfaces details

Table 1. BatchJobStepInterface Methods

Return Type Method summary

voidcreateJobStep()createJobStep is called by the Batch Container before calling processJobStep.

intdestroyJobStep()destroyJobStep is called when Batch Container has finished processing the job step. Any clean up code can be added here.

java.util.Properties getProperties()Returns the properties specified in xJCL for the batch job step.

intprocessJobStep()processJobStep should contain all the business logic for the batch job step.

voidsetProperties(java.util.Properties properties)Called by the Batch Container to make the properties specified in the xJCL available to the batch job step.

Table 2. BatchDataStream Methods

Return Type Method summaryvoid close()

The close method is called by the Batch Container to indicate to the BDS that the user of the BDS is done working with the BDS.

java.lang.String externalizeCheckpointInformation()The externalizeCheckpointInformation method is called by the Batch Container during the checkpoint completion phase of processing.

java.lang.String getName()Returns logical name of this batch data stream.

java.util.Properties getProperties()Returns the properties specified in xJCL for this BDS.

void initialize(java.lang.String logicalname, java.lang.String jobstepid)The initialize method is called by Batch Container during the initialization of the job step. This allows the BDS to initialize the stream for use by the batch step.

void intermediateCheckpoint()The intermediateCheckpoint method is called by the Batch Container to

indicate to the BDS that a checkpoint has just completed.void internalizeCheckpointInformation(java.lang.String chkptinfo)

The internalizeCheckpointInformation mehtod is called by the Batch Container during the restart of a batch step, this allows the BDS to restart its internal state to the point it was at when the last successful checkpoint was processed.

void open()The open method is called by the Batch Container to indicate that the BDS is about to be used and to prepare the BDS for operation.

void positionAtCurrentCheckpoint()The positionAtCurrentCheckpoint method is called by Batch Container to provide a signal to the BDS that it should start processing the stream at the point that was defined in the internalizeCheckpointInformation method set.

void positionAtInitialCheckpoint()The positionAtInitialCheckpoint is called by Batch Container to provide a signal to the BDS that it should start processing the stream at the initial point as defined by the xJCL inputs.

void setProperties(java.util.Properties properties)The setProperties is called by the Batch Container to pass BDS properties specified in xJCL to the BDS as a java.util.Properties object.

Table 3. Ready-to-use algorithms

Algorithm Class name Description

Checkpoint policy com.ibm.wsspi.batch.checkpointalgorithms.recordbased

Checkpoints the batch job step based on number of input records processed.

Checkpoint policy com.ibm.wsspi.batch.resultsalgorithms.jobsum Returns the highest step return code.

Results com.ibm.wsspi.batch.checkpointalgorithms.recordbased

Checkpoints the batch job step based on number of input records processed.

References• https://www.ibm.com/developerworks/community/blogs/xtp/entry/batch_processing_with_web

sphere_compute_grid_red_paper _published9?lang=en

• http://pic.dhe.ibm.com/infocenter/wxdinfo/v6r1m1/index.jsp?topic= %2Fcom.ibm.websphere.gridmgr.doc%2Fscheduler%2Fccgtws.html

• http://www.ibm.com/developerworks/websphere/techjournal/0801_vignola/0801_vignola.html

http://www.ibm.com/developerworks/websphere/techjournal/0801_vignola/0801_vignola.html

http://pic.dhe.ibm.com/infocenter/wxdinfo/v6r1m1/index.jsp?topic=%2Fcom.ibm.websphere.gridmgr.doc%2Fscheduler%2Fccgtws.html

http://pic.dhe.ibm.com/infocenter/wxdinfo/v6r1m1/index.jsp?topic=%2Fcom.ibm.websphere.gridmgr.doc%2Fscheduler%2Fccgtws.html

https://www.ibm.com/developerworks/community/blogs/xtp/entry/batch_processing_with_websphere_compute_grid_red_paper_published9?lang=en



© Copyright IBM Corporation 2014. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in these materials to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in these materials may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. IBM, the IBM logo, Rational, and other IBM products and services are trademarks of the International Business Machines Corporation, in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others.

introduction - ibm · introduction batch processing is a mission critical workload for the...

Documents