spring batch overivew

29
Korea Spring User Group 27 Nov 2014 Spring Batch A Quickstart guide to running batch plications with Spring 최 찬영 ( ) 에듀앤텍 [email protected]

Upload: -

Post on 16-Jul-2015

213 views

Category:

Software


6 download

TRANSCRIPT

Page 1: Spring batch overivew

Korea Spring User Group27 Nov 2014

Spring BatchA Quickstart guide to running batch plications with Spring

최 찬영 ( 주 )에듀앤텍[email protected]

Page 2: Spring batch overivew

2

Agenda

●Batch processing

●Spring Batch high-level overview

●Quick start using Spring Batch

●Batch Specification Language

●General Principles and Guidelines

Page 3: Spring batch overivew

3

What are the Batch characteristics

● Long-running

– Often outside office hours

● Non-interactive

– Often include logic for handling errors or restarts

● Process large volumes of data

– More than fits in memory or a single transaction

Page 4: Spring batch overivew

4

Batch processing

● Close of business processing

– Order processing– Business reporting– Account reconciliation

● Import/export handling

– a.k.a. ETL jobs (Extract‐Transform‐Load)– Instrument/position import– Data warehouse synchronization

● Large-scale output jobs

– Loyalty scheme emails– Bank statements

Page 5: Spring batch overivew

Batch Domain● The batch domain adds some value to a plain

business process by introducing new concepts:

– A job has an identity – defines what needs to be done– A job has steps– A job instance can be restarted after a failure – a new execution

– Each execution has a start time, stop time, status– The job instance has an overall status– Each execution can tell us how many items were processed, how many commits, rollbacks, skips

● Add value through robustness, reliability, traceability (SLA)

“The EndOfDay Job”

“The EndOfDay Job for 2014/11/27”

“ The first attempt at EndOfDay Job for 2014/11/27”

Page 6: Spring batch overivew

6

Batch Applications for the Java

API for robust batch processing targeted to Java EE, Java

● ItemReader class is designed to consume a chunk of the processing data (usually a single record);

● ItemProcessor, for which business and domain logic is to be imposed upon the chunk;

● ItemWriter, to which records will be delegated post-processing, and thereafter aggregated

JobOperator Job Step

Job Repository

ItemProcessor

ItemReader

ItemWriter

Page 7: Spring batch overivew

7

Sample: Import flat files to database

Item

Rea

der

File

Database

Item

Wri

ter

Step

Item

Pro

cess

or

Page 8: Spring batch overivew

8

Job Configuration

<job id="myJob"> <step name="myStep"> <tasklet> <chunk reader="myItemReader" processor="myItemProcessor" writer="myItemWriter" commit-interval="100" /> </tasklet> </step></job> <bean id="myItemReader" class="...MyItemReader" />

<bean id="myItemProcessor"class="...MyItemProcessor" />

<bean id="myItemWriter" class="...MyItemWriter" />

Page 9: Spring batch overivew

9

Batch Applications with the Java Config

@Bean public ItemReader<Person> reader() { FlatFileItemReader<Person> reader = return reader;

@Bean public ItemProcessor<Person, Person> processor() { return new PersonItemProcessor();

@Bean public ItemWriter<Person> writer(DataSource dataSource) { JdbcBatchItemWriter<Person> writer = new JdbcBatchItemWriter<Person>(); ... return writer;

@Bean public Step step1(StepBuilderFactory stepBuilder,

ItemReader<Person> reader, ItemWriter<Person> writer, ItemProcessor<Person, Person> processor) { return stepBuilder.get("step1") .<Person, Person>chunk(10) .reader(reader) .processor(processor) .writer(writer) .build(); }

Page 10: Spring batch overivew

● Application developers have clear, reusable interfaces for constructing batch style applications. 

● Job writers have a powerful expression language for how to execute the steps of a batch execution.

● Solution integrators have a runtime API for initiating and controlling batch execution.

● a programming model

● a job specification language

● a batch runtime

● Spring Batch make available a framework for building, deploying, and running batch applications. Spring Batch has influenced JSR 352 and it addresses three critical concerns: 

Page 11: Spring batch overivew

11

Batch Applications for the Java Platform

Batch Applications for the Java Platform, known also as JSR-352, offers application developers a model for developing robust batch processing systems. The core of this programming model is a development pattern borrowed from Spring Batch, coined the Reader-Processor-Writer pattern, in which developers are encouraged to embrace a Chunk-oriented processing standard.

Page 12: Spring batch overivew

12

Batch Programming Artifact Overview

JSR 352 – Codifies key batch programming constructs‐ Reader, Processor, Writer, Listener, more...‐ Btch runtime orchestrate flow based on well known patterns

Page 13: Spring batch overivew

Chunk-Oriented Processing

● Input-output can be grouped together● Input collects Items before outputting:

Chunk-Oriented Processing● Optional ItemProcessor

Delegate business logicChunk with size N

Page 14: Spring batch overivew

14

Batch Usage Patterns

Page 15: Spring batch overivew

JobLauncher

JobLauncher

start()

JobExecution

Job

execute()

Business

ExitStatus

Client

With ExitStatus.COMPLETED or FAILED

doStuff()

Done

Page 16: Spring batch overivew

16

More Readers and Writers

● Spring Batch provides many implementations of ItemReader and ItemWriter, e.g.

– Flat files

– XML

– JDBC: cursor & driving query

– Hibernate

– JMS

● Some simple jobs can be implemented with off-the-shelf components

Page 17: Spring batch overivew

Run Tier is concerned with the scheduling and launching of the application. A vendor product is typically used in this tier to allow time-based and interdependent scheduling of batch jobs as well as providing parallel processing capabilities.Job Tier is responsible for the overall execution of a batch job. It sequentially executes batch steps, ensuring that all steps are in the correct state and all appropriate policies are enforced.Application Tier contains components required to execute the program. It contains specific modules that address the required batch functionality and enforces policies around a module execution (e.g., commit intervals, capture of statistics, etc.)Data Tier provides the integration with the physical data sources that might include databases, files, or queues. Note: In some cases the Job tier can be completely missing and in other cases one job script can start several batch job instances.

Page 18: Spring batch overivew

General Principles and Guidelines● A batch architecture typically affects on-line architecture

and vice versa. Design with both architectures and environments in mind using common building blocks when possible.

● Simplify as much as possible and avoid building complex logical structures in single batch applications.

● Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs).

● Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.

● Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:

Page 19: Spring batch overivew

19

General Principles and Guidelines

● Allocate enough memory at the beginning of a batch application to avoid time-consuming reallocation during the process.

● Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.

● Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.

● Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.

Page 20: Spring batch overivew

Questions ?

Page 21: Spring batch overivew

21

Reference

● http://docs.spring.io/spring-batch/batch-principles-gu

● http://docs.spring.io/spring-batch/faq.html

● http://docs.spring.io/spring-batch-core/index.html

● http://docs.spring.io/spring-batch-admin/reference/re

● http://spring.io/guides/gs/batch-processing/

https://github.com/spring-projects/spring-batchhttps://github.com/spring-guides/gs-batch-processing

Page 22: Spring batch overivew

Spring Batch Admin

● Sub project of Spring Batch● Provides Web UI and RESTFul interface to

manage batch processes

http://static.springsource.org/spring-batch-admin/index.html

● Manager, Resources, Sample WAR– Deployed with batch job(s) as single app to be able to

control & monitor jobs

– Or monitors external jobs only via shared database

Page 23: Spring batch overivew

Home Page

Page 24: Spring batch overivew

Registered Jobs

Page 25: Spring batch overivew

Launching Jobs

Page 26: Spring batch overivew

Details for Job Execution

Page 27: Spring batch overivew
Page 28: Spring batch overivew

28

General Principles and Guidelines

● There are a great many extension points in Spring Batch for the framework developer (as opposed to the implementor of business logic). Clients are expected to create their own more specific strategies that can be plugged in to control things like commit intervals (CompletionPolicy), rules about how to deal with exceptions (ExceptionHandler), and many others.

● Generally you can expect anything at the top level of the source tree in packages org.springframework.batch.* to be public, but not necessarily sub-classable. Extending the concrete implementations of most strategies is discouraged in favour of a composition or forking approach. If your code can use only the interfaces from Spring Batch, that gives you the greatest possible portability.

Page 29: Spring batch overivew

29

General Principles and Guidelines

● A specific implementation of the Step deals with the concern of breaking apart the business logic and sharing it efficiently between parallel processes or processors (see PartitionStep ).

● There are a number of technologies that could play a role here. The essence is just a set of concurrent remote calls to distributed agents that can handle some business processing.

● One implementation that we have had some experience with is a set of remote web services handling the business processing. We send a specific range of primary keys for the inputs to each of a

number of remote calls.