spring batch christopher jeffers august 2012. agenda intro to spring batch and use-cases spring...

25
Spring Batch Christopher Jeffers August 2012

Upload: randall-eaton

Post on 11-Jan-2016

247 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

Spring BatchChristopher Jeffers

August 2012

Page 2: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

2

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

Page 3: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

3

Spring Batch Overview

• Lightweight framework designed to enable the development of robust batch applications used in enterprise systems

• As a part of Spring, it builds on the ease of use of the POJO-based development approach, while making it easy for developers to use more advanced enterprise services when necessary

• Provides reusable functions that are essential in processing large volumes of data

• Provides scaling features, including multi-threading and massive parallelism for Spring Batch Jobs

Page 4: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

4

Batch Use-Cases

• DataRoomBatch– Physically delete all rows marked for deletion from a given

bucket (DeepSix)

– Rerun user documents through publishing workflow

– Proactive auditing of the environment

• Public Records Batch Processing– User inputs file with search criteria for many individuals

and program searches database for changes in information, returning a report of hits to user

– Read, Process, and Write sequence

– Satisfies Government and Corporate requirements

Page 5: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

5

Reason for Spring Batch POC

• Current batch system for public records is not powerful enough to handle very large requests

• Have had to turn away customers because of this

• A more powerful and flexible batch solution could solve this problem

Page 6: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

6

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

Page 7: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

7

Architecture

• Layered architecture

• The application layer contains all batch jobs and custom code

• Batch Core contains runtime classes necessary to launch and control a batch job

• Batch Infrastructure contains common readers and writers, and services used by both the application and the core framework

http://static.springsource.org/spring-batch/reference/html/spring-batch-intro.html

Page 8: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

8

The Batch Job

• A Job entity encapsulates an entire batch process

• A Job is comprised of Steps, which encapsulate a phase of a batch job– Step can be as complex or simple as developer wants

http://static.springsource.org/spring-batch/reference/html/domain.html

Page 9: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

9

Chunk Processing

• Typical Spring Batch Step– Read, Process, Write sequence

• Multiple items are read and processed before being written as a “chunk”– Size of chunk declared in configuration (commit-interval)

http://static.springsource.org/spring-batch/reference/html/configureStep.html

Page 10: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

10

Step Flow

• Steps can be configured to flow sequentially or conditionally– Allows for some complex jobs

http://static.springsource.org/spring-batch/reference/html/configureStep.html

Page 11: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

11

Job Repository

• The JobRepository is used to do CRUD operations with Meta-Data relating to Job and Step execution– Example: Job Parameters, Job/Step status, etc.

http://static.springsource.org/spring-batch/reference/html/domain.html

Page 12: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

12

Step Skipping

• Step is skipped if an exception listed in the configuration is thrown, rather than stopping the batch execution

• Used for exceptions that will be thrown on every attempt of the Step– FileNotFoundException, Parse Exceptions, etc.

• SkipListener can be used to log skipped items

Page 13: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

13

Retrying Steps

• If an exception listed in the configuration is thrown, the operation is attempted again

• Used for exceptions that may not be thrown on every attempt of the Step– ConcurrencyFailureException,

DeadlockLoserDataAccessException, etc.

• Can set a limit on number of retries

• RetryListener can be used to log retried items

• RetryTemplate can be used to further customize retry logic

Page 14: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

14

Scaling Features (Single Process)

• Multi-Threaded Jobs or Steps– Using Spring’s TaskExecutor object

• Parallel Steps– Using split flows and a TaskExecutor in Job configuration.

http://static.springsource.org/spring-batch/reference/html/scalability.html

Page 15: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

15

Scaling Features (Multi-Process)

• Remote Chunking– Splits Step processing across multiple processes, using

some middleware to communicate

http://static.springsource.org/spring-batch/reference/html/scalability.html

Page 16: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

16

Scaling Features (Multi-Process)

• Step Partitioning– Splits input and executes remote steps in parallel

– PartitionHandler sends StepExecution requests to remote steps

– Partitioner generates the input for new step executions

http://static.springsource.org/spring-batch/reference/html/scalability.html

Page 17: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

17

Job Flow with Client/Server and Partitioning

Page 18: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

18

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

Page 19: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

19

Solving the Use-Cases

• DataRoomBatch (DeepSix Example)– Bucket is input to JdbcCursorItemReader

– Create an Item Processor to check if the row is marked for deletion and delete it if so

– Item Writer could be empty or used to output statistics

– Partitioning easily done by dividing up number of rows per partition

Page 20: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

20

Solving the Use-Cases

• Public Records Batch Processing– Input file is input to FlatFileItemReader

– Custom Item Processor to search the database for hits

– Custom Item Writer to compile report of search results

– Following step to send report to user

– Easy to implement a Partitioner for the input file

Page 21: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

21

Benefits of Spring Batch

• Part of Spring Framework– Allows easy integration with other Spring features

– General simplicity offered by Spring

• Step flow customizable

• Basic Item Readers and Writers already available

• Features available for monitoring Jobs and Steps

• Many scaling options available

Page 22: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

22

Issues with Spring Batch

• No built-in scheduler– Not a big issue, scheduler libraries easily integrated

• Potentially a lot of XML configuration– Business logic across Java and XML files can complicate

debugging and maintenance

– Annotations can help

• Anything but very basic components will need to be created as new classes

Page 23: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

23

Helpful Integration Options

• Spring Batch Admin– Web-Based administration console

– Contains Spring Batch Integration, allowing use of Spring Integration messages to launch and monitor jobs

• Scheduler (cron, Spring Scheduling, Quartz)

• Clustering Framework (Hadoop, GridGain, Terracotta)– Ideal for improving horizontal scaling

– Spring Data Hadoop is a fairly new Spring feature that helps integrate Spring with Hadoop

Page 24: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

24

Future Steps

• Get Spring Batch set up with a clustered environment– Evaluate performance

– Figure out dynamic load balancing

• Play around with more features and integration options– Spring Batch Admin, manual job restarting, etc.

• Implement Spring Batch Admin into Cobalt GUI?

• Look more into the information stored in Meta-data database and figure out how to use for monitoring/managing jobs

• Look into Partitioning and how much must be done to implement sending partitions off to remote machines

• Look into job/step timeout

Page 25: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job

Questions?