batch processing with websphere
TRANSCRIPT
© 2010 IBM Corporation
Batch processing with WebSphere
Sridhar Sudarsan, Chief Architect, Batch processing strategy
4th March 2011
© 2010 IBM Corporation2
Outline and objectives
WebSphere batch solutions Overview– Architecture– Components– Topology
WebSphere Batch offerings– WebSphere Feature Pack for modern batch– WebSphere Compute Grid
Summary
© 2010 IBM Corporation3
WebSphere Extended Deployment (XD) is now 3 separate products
Software to virtualize, control, and turbo-charge your application infrastructure
Infrastructure Optimization
Intelligent Workload Management
Virtualization
(VIRTUAL ENTERPRISE)
Data Fabrics&
Caching(EXTREME SCALE)
Innovative Application Patterns, like Java batch,beyond OLTP
(COMPUTE GRID)
Automatic Sense & Respond
Management(VIRTUAL ENTERPRISE)
© 2010 IBM Corporation4
What is Compute Grid (CG) ? -- J(2)EE View
Set of binaries that get deployed to WebSphere Application Server Network Deployment (WAS ND) nodes within a cell
Those nodes are then CG-enabled, and become the potential “Grid Execution Environment (GEE)” (or “Long Running Execution Environment (LREE)”
Java developers use the CG framework to code the batch application and deploy it as a typical .ear file
ND admins manage the .ear like any other WAS ND application (same console, same skills)
Some additional components, and a Job Management Console, of which more later.
© 2010 IBM Corporation5
Batch Applications Need Batch Middleware
Batch Application
Batch Container
Job Control Language
Job Scheduler
PJM WLM
Security
Logging/archival
Resource Mgmt
Batch Framework
Application S
upportB
atch M
iddleware
- Parallelization, WLM, and availability
- Library of common data access and utilities
- Declarative job definition (xml based)
- Runtime engine for batch applications
- Job dispatcher, operational control point
- Manager for job history and output
HA
- Business and custom data access logic
- Rule-based CPU and file limits
- Security for jobs and job operations
© 2010 IBM Corporation6
Fundamental Concept -- WAS Provides the Foundation ...... and Java EE solutions can leverage that foundation to provide additional functionality such as batch processing:
WAS Muliple JVM modelW eb and EJB C ontainersJD B C , JC A, JMS, MQSecurity ManagementTransaction ManagementThread ManagementIntegration w ith W LM
OS Platform
Batch Execution Platform Solution
B atch Feature Pack, W ebSphere C ompute Grid
Your batch business function
Several Things Going on Here:You are relieved of having to code custom middleware functionalityYou stay focused on core business imperatives
The IBM batch developers are relieved of having to re-invent WAS platform functionalityThis is important -- it helps maintain currency of specification support and keeps product service aligned. It allow s for focused attention on batch functionality development and not on low er-level foundational issues.
This is what allows the mixing of OLTP and batch while maintaining SLAs for bothEspecially w hen deployed to z /OS or co-deployed w ith W VE on D istributed platforms. WAS z /OS has very rich w orkload classification and system resource management functionality. The IB M Java batch solutions ride on top of that existing capability.
© 2010 IBM Corporation7
Lets look at the basic WebSphere Batch runtime
WAS Server 1
Batch App
WAS Server
Jobdispatcher
xJCL
BatchContainer
Job Repository
Job # 1
App Data store
Job Scheduler/Dispatcher (JS)
– The job entry point to Compute Grid
– Job life-cycle management (Submit, Stop, Cancel, etc) and monitoring
– Dispatches workload to either the PJM or GEE
– Hosts the Job Management Console (JMC)
WAS Server
Jobdispatcher
WAS Server 1
Batch App
BatchContainer
Job # 1
Grid Endpoints (GEE)
– Executes the actual business logic of the batch job
– Hosts the programming model
xJCL
– XML descriptor for the job
– Allows variable substitution
Dispatcher interfaces
Command window
EJB call
JMC
© 2010 IBM Corporation8
WebSphere Compute Grid: Job step
WebSphere Compute Grid enables Java as a language for batch workloads on the mainframe or in distributed environments to create an infrastructure for Batch and OLTP processing that can share business
logic to lower costs, eliminate Batch window and deliver high availability.
Input
Output
BatchJob Step
Fixed Block DatasetVariable Block
DatasetJDBC
FileIBATIS
More to come…
Fixed Block DatasetVariable Block DatasetJDBCJDBC w/ BatchingFileIBATISMore to come….
Map Data to Object
TransformObject
Map Objectto Data
A simplified batch job
© 2010 IBM Corporation9
The Batch Programming M odelFunctions and class libraries supplied w ith the Feature Pack and Compute Grid
Job ControlxJC L -- very much like
traditional JC L, except it is coded in XML. Equivalents
to JOB cards, D D statements, STEPs, etc.
Batch Controller BeanPart of the B atch C ontainer
code supplied by IB M
Batch Data StreamsProvides data input and output
services for the job steps
Checkpoint AlgorithmsService to programmatically determine
and handle checkpointing
Results and Return CodesServices to determine, manipulate and act
upon return codes, both at the application and system level
Job Step ControlInvoking and coordinating processing betw een steps
Batch Container
Batch AppPOJO
Step 1
Step 2
Step n
Development Libraries
R AD or Eclipse
WAS Runtime InterfacesJD B C , JC A, Security, Transaction,
Logging, D eployment, etc., etc.
© 2010 IBM Corporation10
Lifecycle …
BatchContainer
setProperties(Properties p) {
}
…
createJobStep() {
…
}
processJobStep() {
…
}
destroyJobStep() {
…}
1
2
3
4
Compute Grid makes it easy for developers to create transactional batch applications by allowing them to use a streamlined POJO model and to focus on business logic
and not on the batch infrastructure
The anatomy of a transactional batch application – batch job stepBatch Programming Model
© 2010 IBM Corporation11
WebSphere Batch makes it easy for developers to encapsulate input/output data streams using POJOs that optionally support checkpoint/restart semantics.
Job Start
BatchContainer
open()
positionAtInitialCheckpoint()
externalizeCheckpoint()
close()
1
2
3
4
Job Restart
BatchContainer
open()
internalizeCheckpoint()
positionAtCurrentCheckpoint()
externalizeCheckpoint()
close()
1
2
3
5
4
Checkpoint & Restart with Batch Data Streams
© 2010 IBM Corporation12
xJCL -- The Job Control DefinitionNot JCL (no // and no column issues) ... but amazingly similar in concepts:
<?xml version="1.0" encoding="UTF-8" ?>
<job name="name" ... ><jndi-name>batch_controller_bean_jndi</jndi-name><substitution-props>
<prop name="property_name" value="value" /></substitution-props>
<job-step name="name"><classname>package.class</classname>
<checkpoint-algorithm-ref name="chkpt"/><results-ref name="jobsum"/><batch-data-streams>
<bds><logical-name>input_stream</logical-name>
<props><prop name="name" value="value"/>
</props></bds>
</batch-data-streams></job-step>
<job-step
</job>
Roughly analogous to the JOB card
A job step
Like the EXEC PGM = statement in JCL
Similar to DD statements
A brief sampling ... many things not shown.
Do you see the similarity to traditional JCL?
© 2010 IBM Corporation13
Parallel job manager (PJM)
PJM breaks large batch jobs into smaller partitions for parallel execution
– Installed as a system application
– Can be installed to a single server or a cluster
– Provides out of the box and custom SPIs to implement
PJM is the target application of a parallel job
– PJM does not process batch data streams
– It submits or restarts sub jobs under the control of step properties which identify the sub job in the job repository and the count of sub jobs to process.
– A parallel job is submitted using the xJCL for its ‘top-level’ job that specifies these details.
PJM is a Sub job Manager, where a subjob is
– An instance of a regular batch job that can be bounded by substitution properties specified in its xJCL.
– Submitted to the job scheduler by the PJM.
– Aggregated by the PJM into one logical top level job for status, result code, life cycle management
© 2010 IBM Corporation14
Compute Grid runtime components with PJM
WAS Server 1
Batch App
WAS Server N
Batch App
…
WAS Server
JobScheduler
WAS Server
BatchContainer
ParallelJob Manager
ParameterizerSPI
Logical TXSynchronization
SPI
SubJobAnalyzer
SPI
SubJobCollector
SPI
xJCL
SubJobCollector
SPI
logical transaction
scope
BatchContainer
BatchContainer
Job Repository
Sub Job
Name
SubJob # 1
SubJob # N
© 2010 IBM Corporation15
logical Deployment
JobScheduler
BatchContainer
Workload Connector
Wo
rkload
Sch
edu
ler(e.g
. TW
S)
BatchContainer
BatchContainer
Per Line of B
usiness
Jobs Jobs
Jobs
Jobs
Jobs
Jobs
JobJob
Job
Jobs
Jobs
Jobs
Jobs Console
Jobs
OnlineApplications
public submit(Job j) { _sched.submit(j);}A
PI
s
ParallelJob
Manager
© 2010 IBM Corporation16
Per Line of B
usiness
Batch Containers
Batch Containers
Batch Containers
Enterprise scheduler like TWS
OnlineApplications
WA
S N
D C
ell
JobsJobs
Jobs
Jobs
Jobs
Jobs
Jobs
Job Scheduler
PJM
Physical Deployment - Distributed
© 2010 IBM Corporation17
Admin & Configuration with WAS admin console
© 2010 IBM Corporation18
Integrated operational control
Provides an operational infrastructure for job life cycle
Integrates with existing enterprise schedulers such as Tivoli Workload Scheduler
Provides log management and integrates with archiving and auditing
Provides resource usage monitoring
Integrates with existing security and disaster recovery procedures
Configures as a highly available component
Bulk application container
WCG Batch Container
Information storage
Data access management services
“File”
Data access
Queue based data
access
In-memory data access
Custom
data acces
s
Infrastructure servicesWCG Batch Framework
Bulk application
developmentEnvironment
for creating and migrating bulk applications
System
management and operationsManage,
monitor and secure bulk processes
Analytics for scheduling, check-pointing, resource management
WCG Eclipse Plugin
WCG SchedulerGateway
Bulk Partner services
Business process and
event services
Scheduler services
Invocation & Scheduling optimization
Resource brokering, Split & Parallelize, Pace, Throttle
Invocation services
Ad hoc Planned
© 2010 IBM Corporation19
Job Management Console – View jobs
© 2010 IBM Corporation20
Job management console: Job schedules
• Save a job definition
– xJCL– Schedule
• Date and time
• Repeating
• Manage schedules
– View details– Cancel
© 2010 IBM Corporation21
Benefits of running WebSphere Compute Grid on z/OS
© 2010 IBM Corporation22
Essential Story – Exploitation of Lower Level Benefits
WebSphere Compute GridFunction Common
Across All Platforms
WebSphere Application Server z/OSFunction Common
Across All Platforms
System zInherent Reliability zAAPs
z/OS and Parallel SysplexWLM
RRS
SMF Shared Data
SAF CoLocation
Awareness of WAS z/OS
Exploit WAS z/OS
Function Specific to z/OS
Exploit Platform
© 2010 IBM Corporation23
zAAPs – Providing a Java Cost Advantage on z/OS
Java workload offloaded to zAAP processors
Completely transparent to Java applications, including batch
Benefits:• MIPs related to Java on zAAPs not
counted towards other software monthly license charges
• Frees GPs to do traditional z/OS work, such as CICS, DB2 and IMS
© 2010 IBM Corporation24
RRS – Sysplex Wide Global Transaction Syncpoint Coordinator
Very fast and reliable
Excels at TX rollback when needed
© 2010 IBM Corporation25
WLM Classification – Prioritize Work
Prioritize Compute Grid Relative to Other Tasks within the z/OS System
Prioritize Batch Jobs Relative to Other Batch Jobs within Compute Grid
WebSphere Compute Grid z/OS
Higher Priority Job
Medium Priority Job
Lower Priority Job
Relatively more system resources
Relatively less system resources Example of WAS z/OS
is exploiting WLM
© 2010 IBM Corporation26
SMF – Accounting Information ... Very Efficient, Very Fast
WebSphere Compute Grid
z/OS
z/OS SMF Interface
WAS z/OS
RMF
DB2
CICS
MQMemory Buffers
SMF Data Sets
Data Analysis Tools
Other z/OS subsystems and
facilities
Job identifier
Job submitter
Final Job state
Server
Node
Accounting information
Job start time
Last update time
CPU consumed
Type 120, Subtype 20
• Chargeback• Performance and Tuning• Capacity Planning
© 2010 IBM Corporation27
Parallel Sysplex – Availability and Scalability
z/OS Instance
WAS z/OS +Compute Grid
DB2, CICS, IMS, MQ
z/OS Instance
WAS z/OS +Compute Grid
DB2, CICS, IMS, MQ
z/OS Instance
WAS z/OS +Compute Grid
DB2, CICS, IMS, MQ
Local Data Caches
Centralized shared data structures with integrated data locking and update
Proven ScalabilityNear linear up to 32
nodes in Sysplex
AvailabilityThis provides thefoundation for ahighly available
architecture
Parallel JobsExcellent platform on which to use Compute
Grid’s Parallel Job Manager
Direct value to Compute Grid and Your Batch Processes
© 2010 IBM Corporation28
SAF – Centralized Security
Centralized SAF Security Repository• Userids and Groups• EJBROLE Role Enforcement• Digital Certificates and Keyrings• Much more related to WAS z/OS Security• Extensive auditing
Proven secure, and centralized enables tighter control
© 2010 IBM Corporation29
WebSphere Batch control by external workload scheduler (e.g. Control-M, etc)
Tivoli Workload Scheduler
JES
//JOB1 JOB ‘…’//STEP1 PGM=IDCAMS//STEP2 PGM=WSGRID,//WGJOB DD *<job … >…</job>
submit
monitor
WebSphereBatch
Scheduler
submit
monitor
WASBatchApp<job name=“JOB1" …
<job-step name=“STEP2"> …
WSGrid
JobSchedule
External Scheduler Integration on z/OS
MQ Messages
• JCL/xJCL jobs have synchronized lifecycle
• xJCL job restartable from JCL job
• xJCL job log piped to JCL job, written to SYSOUT dataset
• xJCL job RC is step RC in JCL job
© 2010 IBM Corporation30
WSGrid JCL Example
© 2010 IBM Corporation31
WSGrid JCL Job Output (SYSPRINT DD – Top of File)
© 2010 IBM Corporation32
WSGrid JCL Job Output (SYSPRINT DD – Bottom of File)
© 2010 IBM Corporation33
Revisit the Picture from a Higher PerspectiveJust to reinforce the key concepts ...
W ebSphere Application Server Runtime
System Platform
Platform Exploitation below the open standard specification line
Batch ContainerYour batch applications deployed into the batch-enabled server or cluster
Eclipseor
RAD
B atch D ata S tream D evelopment
Framew ork C lasses
AppServer JVM
xJC L Job D efin ition F ile
Job Scheduler
Job C onsole
Brow se rW e b Se rvicesEJB IIOP
Schedule r dispa tches to e nd points based on know ledge of e nvironm ent
Batch Platform(not just a progra m m ing fram e w ork)
Built on WAS
Avoid custom m idd lew are
IB M supplies m idd lew are , you focus on your
business batch requ irem ents
Other batch container end points
If WAS z/OS then w e have the w hole "W hy WAS z/OS" story to te ll.
If it applies to OLTP, it a pplies to ba tch a s w e ll
© 2010 IBM Corporation34
Feature-set Options
WebSphere App Server
WebSphere Batch
Feature PackJob
Scheduler
Batch Toolkit
WebSphere Compute Grid Product
ParallelJob
Manager
Start with the Feature Pack;grow into Compute Grid!
BatchContainer
JobScheduler
Batch Toolkit
BatchContainer
EnterpriseConnectors
AdvancedOperations
Pack
© 2010 IBM Corporation35
Common batch container, development tools to develop batch applications, operational commands to manage batch job life cycle
√ √ √
Container managed checkpoint/restart capabilities √ √ √
Job management console √ √ √
Application Execution Platform √ √ √
Basic Scheduler/Job dispatcher √ √ √
System managed job logs √ √ √
High availability and clustering of Batch Job Scheduler/Job Dispatcher √ √
Multi-site disaster recovery for batch platform √ √
Integration with WLM on z/OS √ √
Interoperability between Java and COBOL on z/OS √
Non-disruptive batch application update/endpoint quiesce √
Job usage accounting, including SMF integration on z/OS √
Job classes and workload classification √
Integrated “Parallel Job Manager” for job parallelization across multi-JVMs √
Enterprise Scheduler connectors √
Enterprise Monitoring capabilities √
Disaster Recovery with operational state transfer √
Integration with VE for goal oriented job placement √
Features and QoS Guidance to choose optimal deployment option for Batch workloads
WebSphere Compute Grid
WAS on z/OS, WAS ND with FeP for Modern Batch
WAS Base with FeP for Modern Batch
Deployment Options
© 2010 IBM Corporation36
Summary
WebSphere Batch solutions create the separation of concerns between business and application logic and the batch infrastructure.
WebSphere Batch solutions provide an environment and infrastructure for running mixed workloads in Java efficiently
WebSphere Batch solutions are strategically important and a fundamental component of IBM’s Batch infrastructure leadership
– WebSphere Compute Grid provides market leading capabilities for development to accelerate time-to-value for clients
– WebSphere Compute Grid is production ready with many customers running mission critical Batch workloads
© 2010 IBM Corporation37 Back Office Operation Center – New Assets Overview and Insights
© 2010 IBM Corporation38
Comparison of JZOS and WebSphere Compute Grid (WCG)
Java Batch Execution– Both JZOS and WCG provide an environment to execute Java batch programs
JES/JCL Jobs– Both JZOS and WCG workload can be described/submitted/run through JES/JCL
Control-M Scheduling– Both JZOS and WCG workload can be directly scheduled/controlled by TWS
Managed Job Restart– Both JZOS and WCG workload can be restarted through TWS
SMF Usage Recording– Both JZOS and WCG workload can be measured with SMF records
Where they are the same …
© 2010 IBM Corporation39
Comparison of JZOS and WCG – Where they differ …
Feature JZOS WCG
Transactionality Local transaction mode only Local transaction mode (1PC)RRS transaction mode (2PC)XA transaction mode (2PC)
Service Integration Remote calls only Remote callsLocal, optimized calls (co-location)
Inter-language Java/COBOL interoperability, but NO connection sharing
Java/COBOL interoperability with DB2 connection sharing
Java Services J2SE, JZOS J2SE, JZOS, J2EE, WS*
Environment JES-managed Batch Initiator JES-managed Batch Initiator + WebSphere Application Server
JVM LifeCycle Deposable JVM Reusable JVM(operational efficiency)
Checkpoints Application-managed System-managed(operational optimization)
Parallelization Ad-hoc or roll-your-own System-managed(operational control)