three key concepts for understanding jsr-352: batch programming for the java platform

34
© 2013 IBM Corporation Timothy C. Fanelli - Senior IT Specialist 23 September 2013 Three Key Concepts for Understanding JSR-352: Batch Applications for the Java Platform

Upload: timfanelli

Post on 22-Nov-2014

1.296 views

Category:

Technology


0 download

DESCRIPTION

In this presentation, Tim Fanelli provides an introduction to JSR352 programming, and builds a simple application utilizing the JSR 352 chunk processing model. The sample program presented may be downloaded here: https://www.dropbox.com/s/55fsjt4ylny95hc/MySampleBatch.jar Or, email Tim Fanelli - the contact information is on slide 3!

TRANSCRIPT

Page 1: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Timothy C. Fanelli - Senior IT Specialist

23 September 2013

Three Key Concepts for Understanding JSR-352: Batch Applications for the Java Platform

Page 2: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Important Disclaimers

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES.

ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.

IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.

IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS

2

Page 3: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

About me

§ Timothy C. Fanelli

§ Senior IT Specialist, IBM - Mainframe Workload Modernization

§ Instructor of Software Engineering - Clarkson University, Potsdam NY

§ [email protected]

§ [email protected], [email protected]

§ Visit the IBM booth #5112 and meet other IBM developers at JavaOne 2013

3

Page 4: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Agenda

§ Background

§ Three Key Concepts– Implementation– Orchestration– Execution

§ An Example JSR 352 Application

§ Advanced Topics– Splits and Flows– Partitioning– Java EE

§ Conclusion and Thoughts on What’s Next...

4

Page 5: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Background

Page 6: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Batch Processing

§ One of the oldest processing paradigms

§ Historically associated with mainframe computing

§ Still incredibly relevant today, with fresh challenges in an OLTP driven world

6

Page 7: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Java for Batch Processing?

§ Mainframe developers have shied away from Java– Performance concerns over native languages– Integration concerns for legacy data– Disparate developer skill set between System Z and Java

§ Java and JavaEE have dominated the Online Transaction Processing world

§ Time to bridge the two worlds together– IBM Java for zOS, IBM WebSphere, and Spring Batch paved new paths– Just-in-Time Compilation, Garbage Collection optimizations proved it out– Adoption is wide-spread!

§ Only remaining challenge was the lack of a standard– The need for JSR-352 was obvious

7

Page 8: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

JSR 352: Batch Applications for the Java Platform

§ Expert working group formed 29 November 2011– IBM*, VMWare, RedHat, Oracle, Credit Suisse, Independent participants– Broad range of talent with deep batch experience

§ Final Release 24 May 2013

§ Included in Java EE 7!

8

Page 9: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Three Key Concepts...

Page 10: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Three Key Concepts ...

§ JSR 352 defines– Implementation: A programming model for

implementing the artifacts – Orchestration: A Job Specification Language,

which orchestrates the execution of a batch artifacts within a job.

– Execution: A runtime environment for executing batch application, according to a defined lifecycle.

§ Note: “key” concepts, not “new” concepts! – Roles and abstractions should be familiar to

SOA and JavaEE developers

10

Implement

Execute

Orchestrate

Page 11: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Anatomy of JSR352

11

Implement

Execute

Orchestrate

§ Those concepts define the anatomy of JSR 352: Batch Applications for the Java Platform...

Job Repository

JobOperator Job Step

Batchlet

Chunk

Reader

Processor

Writer

ChunkChunk

ChunkListeners

ListenersContexts

ListenersPartitioning

Page 12: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Implementation: The programming model

§ Chunk and Batchlet provide models for implementing a step.

§ Contexts provide Job- and Step- level runtime information, and provide interim data persistence.

§ Listeners provide callback hooks to respond to lifecycle events on batch artifacts.

§ Partitioning provides a mechanism imposing parallel processing on jobs and steps

12

Batchlet

Chunk

Reader

Processor

Writer

ChunkChunkChunkListeners

ListenersContexts

ListenersPartitioning

Page 13: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Implementation: The programming modelChunk vs Batchlet

§ Both are implementations of a step within a batch job

§ The chunk model– Encapsulates a very common pattern: ETL– Single “reader”, “processor” and “writer”– Reader/Processor combination is invoked until

an entire “chunk” of data is processed– Output “chunk” is written atomically

§ Batchlet provides a “roll your own” step type– Invoked and runs to completion, producing a

return code upon exit.

13

Batchlet

Chunk

Reader

Processor

Writer

ChunkChunkChunkListeners

ListenersContexts

ListenersPartitioning

Page 14: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Orchestration: The Job Specification Language (JSL)

§ The JSL defines a batch job as an XML document

§ Describes a step as an assemblage of batch artifacts

§ Provides for the description of steps, step groupings, and execution sequencing

14

Job Step

Page 15: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Execution: The JobOperator and Repository

§ JobOperator is the runtime interface for job management, including start, stop, restart and job repository related commands

§ The Job Repository holds information about completed and executing jobs

§ To start a batch job, get a JobOperator instance use it to start a job described (described by JSL).

15

Job Repository

JobOperator

Page 16: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Execution: JobInstance, JobExecution, and StepExecution

§ The state of a job is broken down into various parts, and persisted in the repository

– Submitting a job creates a JobInstance, a logical representation of a particular “run” of a job.

– A JobExecution is a single attempt to run a JobInstance. A restart attempt creates another JobExecution

– Similarly, a StepExecution is a single attempt to run a step within a job. It is created when a step starts execution.

16

Job RepositoryJobOperator

JobInstance

JobExecution

StepExecution

Job Step

*

*

* *

Page 17: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

An Example JSR 352 Application

Page 18: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Application

18

‣ A typical “batch hello world”:– Reads strings from an input file– Performs some validation or transforms– Writes validated or transformed string to an output file

‣ Key capabilities– If something goes wrong, we don’t want to discard all the

prior work; and we want to pick up where we left off– We want control over the transaction scoping so prevent

lock contention in high volume periods– We want flexibility to “plug and play” where our records

come from– For unit testing, development testing, and QA testing -

records may come from a variety of sources

Page 19: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Design

19

‣ Let’s implement a string-transform in an extract-transform-load pattern

‣ We’ll use JSR352’s Chunk programming model– Encapsulates the ETL pattern components as Reader, Processor, and Writer interfaces– Loosely coupled artifacts will be orchestrated into a single-step job later– “Free” checkpoint/restart capability– Transaction scoping imposed externally in the job descriptor

‣ Job will be executed as a Java SE command line batch application

Page 20: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Code

20

Implement

‣ An ItemReader encapsulates the data access and deserialization of a record.

‣ No restriction on data access paradigm: use DAO patterns, JDBC, JPA, Hibernate, Spring Data, etc!

‣ Checkpoint/Restart data provided as Serializable argument to “open” and from “checkpointInfo” methods.

Page 21: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Code

21

Implement

§ An ItemWriter is the output counterpart to ItemReader

§ Primary difference is that writeItems accepts a “chunk” of output objects (as a list) to serialize.

§ Again, no restriction on data access paradigm!

Page 22: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Code

22

Implement

§ An ItemProcessor encapsulates the business logic applied to each record

§ “main” here demonstrates the invocation of a batch job, using the JobOperator

§ Would typically not be in the processor implementation

Page 23: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Batch Descriptor and Job Specification

§ batch.xml defines and names the batch artifacts in this application archive

§ sample.xml is an example Job Specification Language document for SampleBatchApp

23

Orchestrate

Page 24: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Execution

§ Package the application as a standard JAR or WAR for deployment in JavaSE or EE environments

– batch.xml goes in META-INF or WEB-INF/classes/META-INF

– JSL may go in META-INF/batch-jobs, or submitted from an external source (up to the provider!)

24

Execute

Page 25: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

The Execution

§ See it live?

25

Execute

Page 26: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Advanced Topics

Page 27: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Job Management - Restart, Stop, Abandon

§ Had something gone wrong, what then?– The “main” program shown was too simple... only

“started” the job

§ JobOperator exposes APIs for a variety of job management tasks: start, stop, abandon, restart

– Would have had to take advantage of these for advanced job management capabilities.

§ The door is left open for more advanced batch job management systems to be built!

– Integration into existing enterprise schedulers?– New Java EE batch scheduling standard?– Plenty of options, but currently left to the provider

to implement

27

Execute

Page 28: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Java EE Integration

§ JSR-352: Java Batch is included in Java EE 7

§ Provides EE clustering, security, resource management, etc to Java Batch applications

§ Performance benefits to dispatching into long-running, reusable container

– JIT compilation through the first couple runs – Eliminates overhead of starting / stopping JVM

28

Execute

Page 29: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Parallel Job Processing

§ Splits and Flows provide a mechanism for executing job steps concurrently at the orchestration layer

§ A flow is a sequence of one or more steps which execute sequentially, but as a single unit.

§ A Split is a collection of flows that may execute concurrently

– A split may only contain “flows”; a step is not implicitly a flow

§ This is done entirely in the JSL descriptor– Imposed on the batch application with no code

changes!

29

Orchestrate

Page 30: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Parallel Job Processing

§ Step-level parallelism can be achieved programmatically using step partitioning

§ A partitioned step runs as multiple instances with distinct property sets

§ PartitionMapper defines the number of partitions, and property values for each partition

– Can be a fixed set of partitions in JSL– Can be dynamic using a PartitionMapper implementation

30

Implement

Page 31: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Parallel Job Processing

31

Page 32: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Wrap up...

Page 33: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Batch Processing

§ The oldest “new thing” in Java

§ JSR 352 applies the modern thinking and abstraction of Java EE and SOA and applies it to sequential batch processing

§ The standardized programming model provides application developers vendor portability

§ Inclusion in Java EE 7 ensures wide spread availability

33

Page 34: Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform

© 2013 IBM Corporation

Questions?

Find this presentation and more!http://ibm.co/JavaOne2013