1
Supporting Dynamic Migration in Tightly Coupled Grid
Applications
Liang ChenQian Zhu
Gagan Agrawal Computer Science & Engineering
The Ohio State University
2
Introduction-Motivation– Grid resources vary frequently– Tightly coupled applications in Grid
• v.s. bag of tasks• Pipelined applications and streaming applications• Features:
– Dependencies– Run-longing– Large volumes of data transfer between tasks (stages)
– Dynamically allocating new resources and migrating applications to the new resources improve performance
3
Introduction-Challenges– Checkpointing is a classic method to support d
ynamic migration• A snapshot of system’s running state• Transmit to a remote site• Restore execution context and restart processes
– Pros of checkingpointing• Maybe transparent to applications
– Cons of checkingpointing• Platform dependent• Inefficient
4
Introduction-Our approach– Typical processing structure of a data streaming application:
...
while(true){
read_data_from_streams();process_data();accumulate_intermediate_results();reset_auxiliary_structures();
}...
– Our approach is based on Light-weight Summary Structure (LSS)
Data structure storing summary information is Light-weight summary structure
Others are Auxiliary structures
5
Introduction - Contribution Proposed the notion of LSS that enables effic
ient process migration
Implemented application migration using LSS in the GATES middleware
Designed a dynamic resource allocation algorithm for pipeline processing on streaming data
Demonstrated an architecture for resource monitoring and allocation
Extensively evaluated the LSS implementation using 3 data stream applications
6
Middleware System Architecture
• Features of data steam– Data arrive continuously – Enormous volume and must be processed online
– Need to be processed in real-time– Data sources could be distributed
• The needs for processing distributed data streams– A middleware running in Grid– Allocate Grid resources– Provide self-adaptation function
7
Middleware System Architecture
– GATES (Grid-Based AdapTive Execution on Streams) middleware
• Use Globus Toolkit 3.0, built on OGSA
• Allows users to specify their algorithms implemented in Java
• Take care of plugging user-defined algorithms into the system and running them in Grid.
• Applications need be broken down into a number of pipelined stages
8
A B C
Stage A Stage B Stage C
:GATES services
:Stages of an application :Queues between Grid services
:Buffers for applications
Middleware System Architecture
Application
Stage A
Stage B
Stage C
9
Public class Second-Stage implements StreamProcessing{ … void work(buffer in, buffer out) {
… while(true) { DATA = GATES.getFromInputBuffer(in); Inter-Results = Processing(Data); GATES.putToOutputBuffer (out, Inter-Results); }
}}
System Architecture and Design
(GATES API Functions)
10
Roadmap Introduction
‒ Motivation for tight-coupled applications in Grid‒ Challenge and our approach
Middleware System Overview‒ Introduce the system architecture and design
Implementing Dynamic Migration Using LSS‒ Light-weight summary structure (LSS) and its
example‒ Advantages of utilizing LSS‒ LSS Implementation Detail‒ Architecture of dynamic resource allocation
scheme Evaluation
‒ Three distributed data stream applications‒ Memory usage of LSS‒ Efficient migration by using LSS‒ Processing accuracy and LSS migration
Related work Conclusion
11
Light-weight Summary Structure (LSS) & its
Example• LSS is a data structure that stores
summary information of processing• Auxiliary structures • An application calculates the average
value of all integer numbers in a stream– Two stage:
• the first is data source• the second calculates the sum and counts the
number of integers, ave=sum/count
– LSS would be the sum and the count– Auxiliary structures would be loop index
and other temporary variables
12
Advantages of using LSS
• Efficient, only LSS is migrated– Only “sum” and “count” migrate
• Not impact the accuracy of processing
• Support migration across heterogeneous platforms– “sum” and “count” are logic structures
• Reduce application developers’ efforts on making application capable of migration
13
An Example of LSS• LSS can be used to support dynamic
migration– GAETS provides an API function to
allocate memory to be LSS– An application stores summary
information to LSS– transmit only LSS at the end of the loop
to a new node– Restore the LSS at the new node
14
Public class Second-Stage implements StreamProcessing{ … void work(buffer in, buffer out) {
…
while(true) { DATA = GATES.getFromInputBuffer(in); Inter-Results = Processing(Data);
GATES.putToOutputBuffer (out, Inter-Results);
} }}
Application using LSS
LSS = Get a LSS from GATES
Accumulate Inter-Results to LSS
Reset all Auxiliary structures Inform GATES migration could be executed
16
Architecture of dynamic resource allocation scheme
• Using Information Service to collect resource information
• Apply dynamic resource allocation algorithm
• Advise and assist GATES services to migrate
17
Roadmap Introduction
‒ Motivation for tight-coupled applications in Grid‒ Challenge and our approach
Middleware System Overview‒ Introduce the system architecture and design
Implementing Dynamic Migration Using LSS‒ Light-weight summary structure (LSS) and its
example‒ How applications utilize LSS‒ LSS Implementation Detail‒ Architecture of dynamic resource allocation
scheme Evaluation
‒ Three distributed data stream applications‒ Memory usage of LSS‒ Efficient migration by using LSS‒ Processing accuracy and LSS migration
Related work Conclusion
18
Experimental Evaluation
• Evaluation– Three applications
• Counting sample– LSS stores intermediate top M frequently occurrin
g numbers• Clustream, clustering data points in streams
– LSS stores micro-clusters computed at the second stage
• Dist-Freq-Counting, finding frequent itemsets in distributed streams.
– LSS stores unprocessed itemsets
27
Experimental Evaluation
• LSS migration does not impact processing accuracy– The counting sample application was
used– Compared the average accuracy of
the processing results from the non-migration and the migration versions, they are 97.28% and 97.51% accurate
29
Related Work• Middleware for data stream processing
– Data cutter, Stampede– Differences: in a cluster, no self-adaptation, no specificall
y for real-time processing• Continuous query systems
– STREAM, dQUOB, TelegraphCQ, NiagraCQ– Differences: centralized, no adaptation supports
• Distributed continuous query systems– Aurora*, Medusa, Borealis– Differences: continuous queries, not in Grid environment
• In-Network aggregation in sensor network• Stream-based overlay networks
30
Conclusion LSS enables efficient migration for
distributed data stream applications The main observations from our
experiments – Enables efficient process migration; the size
of process state reduced by 30-120 times – Introduces a very small overhead– Significantly improve the performance of
long-running applications. – Our migration scheme does not impact the
accuracy of the processing.
32
Implementing Dynamic Migration Using LSS
... while(true) { ... //check if migration is needed
if(GATES.ifMigrationNeeded()) { GATES.migrate(lss); break; } }
Codes running atRemote Computing Node