spring batch christopher jeffers august 2012. agenda intro to spring batch and use-cases spring...

Post on 11-Jan-2016

248 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Spring BatchChristopher Jeffers

August 2012

2

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

3

Spring Batch Overview

• Lightweight framework designed to enable the development of robust batch applications used in enterprise systems

• As a part of Spring, it builds on the ease of use of the POJO-based development approach, while making it easy for developers to use more advanced enterprise services when necessary

• Provides reusable functions that are essential in processing large volumes of data

• Provides scaling features, including multi-threading and massive parallelism for Spring Batch Jobs

4

Batch Use-Cases

• DataRoomBatch– Physically delete all rows marked for deletion from a given

bucket (DeepSix)

– Rerun user documents through publishing workflow

– Proactive auditing of the environment

• Public Records Batch Processing– User inputs file with search criteria for many individuals

and program searches database for changes in information, returning a report of hits to user

– Read, Process, and Write sequence

– Satisfies Government and Corporate requirements

5

Reason for Spring Batch POC

• Current batch system for public records is not powerful enough to handle very large requests

• Have had to turn away customers because of this

• A more powerful and flexible batch solution could solve this problem

6

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

7

Architecture

• Layered architecture

• The application layer contains all batch jobs and custom code

• Batch Core contains runtime classes necessary to launch and control a batch job

• Batch Infrastructure contains common readers and writers, and services used by both the application and the core framework

http://static.springsource.org/spring-batch/reference/html/spring-batch-intro.html

8

The Batch Job

• A Job entity encapsulates an entire batch process

• A Job is comprised of Steps, which encapsulate a phase of a batch job– Step can be as complex or simple as developer wants

http://static.springsource.org/spring-batch/reference/html/domain.html

9

Chunk Processing

• Typical Spring Batch Step– Read, Process, Write sequence

• Multiple items are read and processed before being written as a “chunk”– Size of chunk declared in configuration (commit-interval)

http://static.springsource.org/spring-batch/reference/html/configureStep.html

10

Step Flow

• Steps can be configured to flow sequentially or conditionally– Allows for some complex jobs

http://static.springsource.org/spring-batch/reference/html/configureStep.html

11

Job Repository

• The JobRepository is used to do CRUD operations with Meta-Data relating to Job and Step execution– Example: Job Parameters, Job/Step status, etc.

http://static.springsource.org/spring-batch/reference/html/domain.html

12

Step Skipping

• Step is skipped if an exception listed in the configuration is thrown, rather than stopping the batch execution

• Used for exceptions that will be thrown on every attempt of the Step– FileNotFoundException, Parse Exceptions, etc.

• SkipListener can be used to log skipped items

13

Retrying Steps

• If an exception listed in the configuration is thrown, the operation is attempted again

• Used for exceptions that may not be thrown on every attempt of the Step– ConcurrencyFailureException,

DeadlockLoserDataAccessException, etc.

• Can set a limit on number of retries

• RetryListener can be used to log retried items

• RetryTemplate can be used to further customize retry logic

14

Scaling Features (Single Process)

• Multi-Threaded Jobs or Steps– Using Spring’s TaskExecutor object

• Parallel Steps– Using split flows and a TaskExecutor in Job configuration.

http://static.springsource.org/spring-batch/reference/html/scalability.html

15

Scaling Features (Multi-Process)

• Remote Chunking– Splits Step processing across multiple processes, using

some middleware to communicate

http://static.springsource.org/spring-batch/reference/html/scalability.html

16

Scaling Features (Multi-Process)

• Step Partitioning– Splits input and executes remote steps in parallel

– PartitionHandler sends StepExecution requests to remote steps

– Partitioner generates the input for new step executions

http://static.springsource.org/spring-batch/reference/html/scalability.html

17

Job Flow with Client/Server and Partitioning

18

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

19

Solving the Use-Cases

• DataRoomBatch (DeepSix Example)– Bucket is input to JdbcCursorItemReader

– Create an Item Processor to check if the row is marked for deletion and delete it if so

– Item Writer could be empty or used to output statistics

– Partitioning easily done by dividing up number of rows per partition

20

Solving the Use-Cases

• Public Records Batch Processing– Input file is input to FlatFileItemReader

– Custom Item Processor to search the database for hits

– Custom Item Writer to compile report of search results

– Following step to send report to user

– Easy to implement a Partitioner for the input file

21

Benefits of Spring Batch

• Part of Spring Framework– Allows easy integration with other Spring features

– General simplicity offered by Spring

• Step flow customizable

• Basic Item Readers and Writers already available

• Features available for monitoring Jobs and Steps

• Many scaling options available

22

Issues with Spring Batch

• No built-in scheduler– Not a big issue, scheduler libraries easily integrated

• Potentially a lot of XML configuration– Business logic across Java and XML files can complicate

debugging and maintenance

– Annotations can help

• Anything but very basic components will need to be created as new classes

23

Helpful Integration Options

• Spring Batch Admin– Web-Based administration console

– Contains Spring Batch Integration, allowing use of Spring Integration messages to launch and monitor jobs

• Scheduler (cron, Spring Scheduling, Quartz)

• Clustering Framework (Hadoop, GridGain, Terracotta)– Ideal for improving horizontal scaling

– Spring Data Hadoop is a fairly new Spring feature that helps integrate Spring with Hadoop

24

Future Steps

• Get Spring Batch set up with a clustered environment– Evaluate performance

– Figure out dynamic load balancing

• Play around with more features and integration options– Spring Batch Admin, manual job restarting, etc.

• Implement Spring Batch Admin into Cobalt GUI?

• Look more into the information stored in Meta-data database and figure out how to use for monitoring/managing jobs

• Look into Partitioning and how much must be done to implement sending partitions off to remote machines

• Look into job/step timeout

Questions?

top related