balman stork cw09

26
Stork 1.0 and Beyond Data Scheduling for Largescale ll b Collaborative Science Mehmet Balman Louisiana State University, Baton Rouge, LA, USA Presented at Condor Week 2009 April 20-April 23, 2009

Upload: balmanme

Post on 12-Jun-2015

139 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Balman stork cw09

Stork 1.0 and Beyond

Data Scheduling for Large‐scale ll bCollaborative Science

Mehmet BalmanLouisiana State University, Baton Rouge, LA, USA

Presented at Condor Week 2009 April 20-April 23, 2009

Page 2: Balman stork cw09

Scheduling Data Placement JobsScheduling Data Placement Jobs

• Data Placement ActivitiesData Placement Activities

• Modular Architecture

– Data Transfer ModulesData Transfer Modules 

for specific protocols/services

• Throttle maximum transfer operations running

• Keep a log of data placement activities• Keep a log of data placement activities

• Add fault tolerance to data transfers

Page 3: Balman stork cw09

Job SubmissionJob Submission

[ dest_url = "gsiftp://eric1.loni.org/scratch/user/";arguments = ‐p 4 dbg ‐vb";src_url = "file:///home/user/test/";dap_type = "transfer";verify_checksum = true;verify_filesize = true;set_permission = "755" ;

i trecursive_copy = true;network_check = true;checkpoint_transfer = true;output = "user out";output =  user.out ;err = "user.err";log = "userjob.log";

]]

Page 4: Balman stork cw09

AgendaAgenda

• Error Detection and Error ClassificationError Detection and Error Classification

• Data Transfer OperationsD i T i– Dynamic Tuning 

– Prediction Service

– Job Aggregation

• Data Migration using Stork• Practical example in PetaShare Project

• Future Directions

Page 5: Balman stork cw09

Failure‐AwarenessFailure Awareness

• Dynamic Environment: • data transfers are prune to frequent failures

• what went wrong during data transfer?• No access to the remote resourcesNo access to the remote resources• Messages get lost due to system malfunction

• Instead of waiting failure to happen• Instead of waiting failure to happen• Detect possible failures and malfunctioning services• Search for another data server• Alternate data transfer service• Alternate data transfer service

• Classify erroneous cases to make better decisions

Page 6: Balman stork cw09

Error DetectionError Detection

• Use Network Exploration Techniques– Check availability of the remote service– Resolve host and determine connectivity failures– Detect available data transfers service

– should be Fast and Efficient not to bother system/network resources

• Error while transfer is in progress?– Error_TRANSFER

• Retry or not?• When to re‐initiate the transfer• Use alternate options?• Use alternate options?

Page 7: Balman stork cw09

Error ClassificationError Classification

•Recover from Failure

•Retry failed operation•Postpone scheduling of a failed operationsfailed operations

•Early Error DetectionI i i T f h•Initiate Transfer when erroneous condition recovered•Or use Alternate options

• Data Transfer Protocol not always return appropriate error codes

• Using error messages generated by the data transfer protocol

p

• A better logging facility and classification

Page 8: Balman stork cw09

Error ReportingError Reporting

Page 9: Balman stork cw09

Failure‐Aware SchedulingFailure Aware Scheduling

Scoop data  ‐ Hurricane Gustov Simulationsp

Hundreds of files (250 data transfer operation)

Small (100MB) and large files (1G, 2G

Page 10: Balman stork cw09

New Transfer ModulesNew Transfer Modules

• Verify the successful completion of the operation by y p p ycontrolling checksum and file size. 

f G idFTP S k f d l f• for GridFTP, Stork transfer module can recover from a failed operation by restarting from the last transmitted file. In case of a retry from a failure, scheduler informs the transfer module to recover and restart the transfer using the information from a rescue file created by the checkpoint‐enabled transfer module.checkpoint enabled transfer module.

• Replacing Globus RFT (Reliable File Transfer)

Page 11: Balman stork cw09

AgendaAgenda

• Error Detection and Error ClassificationError Detection and Error Classification

• Data Transfer OperationsD i T i– Dynamic Tuning 

– Prediction Service

– Job Aggregation

• Data Migration using Stork• Practical example in PetaShare Project

• Future Directions

Page 12: Balman stork cw09

Tuning Data TransfersTuning Data Transfers

• Latency Wall– Buffer Size Optimization– Parallel TCP Streams– Concurrent  Transfers

• User level end‐to‐end Tuning

P ll liParallelism

• (1) the number of parallel data streams connected to a data transfer service for increasing the utilization of network bandwidthservice for increasing the utilization of network bandwidth

• (2) the number of concurrent data transfer operations that are initiated at the same time for better utilization of system resourcesinitiated at the same time for better utilization of system resources.

Page 13: Balman stork cw09

Parameter EstimationParameter Estimation

• come up with a good estimation for the co e up t a good est at o o t eparallelism level

– Network statistics– Extra measurement– Historical data 

• Might not reflect the best possible current settings (Dynamic Environment)

Page 14: Balman stork cw09

Optimization ServiceOptimization Service

Page 15: Balman stork cw09

Dynamic TuningDynamic Tuning

Page 16: Balman stork cw09

Average Throughput using Parallel Streamsg g p g

Experiments in LONI (www.loni.org) environment ‐ transfer file to QB from Linux m/c

Page 17: Balman stork cw09

Dynamic Setting of Parallel StreamsDynamic Setting of Parallel Streams

Experiments in LONI (www.loni.org) environment ‐ transfer file to QB from IBM m/c

Page 18: Balman stork cw09

Dynamic Setting of Parallel StreamsDynamic Setting of Parallel Streams

Experiments in LONI (www.loni.org) environment ‐ transfer file to QB from Linux m/c

Page 19: Balman stork cw09

Job AggregationJob Aggregation

• data placement jobs are combined and processed as asingle transfer job.

• Information about the aggregated job is stored in the job queue andit is tied to a main job which is actually performing the transferoperation such that it can be queried and reported separately.operation such that it can be queried and reported separately.

• Hence, aggregation is transparent to the user

W h t f i t i ll• We have seen vast performance improvement, especiallywith small data files,

• simply by combining data placement jobs based on theird ti ti ddsource or destination addresses.

– decreasing the amount of protocol usage– reducing the number of independent network connections

Page 20: Balman stork cw09

Job AggregationJob Aggregation 

2000

2500

ec)

1000

1500

2000

time (se

single job at a time

2 parallel jobs

4 ll l j b

0

500

1000

total  4 parallel jobs

8 parallel jobs

16 parallel jobs

32 parallel jobs

0 10 20 30 40

max aggregation count

32 parallel jobs

Experiments on LONI (Louisiana Optical Network Initiative) :1024 transfer jobs from Ducky to Queenbee (rtt avg 5.129 ms) - 5MB data file per job

Page 21: Balman stork cw09

AgendaAgenda

• Error Detection and Error ClassificationError Detection and Error Classification

• Data Transfer OperationsD i T i– Dynamic Tuning 

– Prediction Service

– Job Aggregation

• Data Migration using Stork• Practical example in PetaShare Project

• Future Directions

Page 22: Balman stork cw09

PetaSharePetaShare

• Distributed Storage for Data Archive

• Global Namespace among distributed resources

• Client tools and interfaces• Pcommands• Petashell• Petafs• Windows Browser• Web Portal

• Spans among seven LouisianaSpans among seven Louisiana research institutions

• Manages 300TB of disk storage, 400TB of tape400TB of tape

Page 23: Balman stork cw09

Broader ImpactBroader Impact

Fast and Efficient Data Migration in PetaShareg

Page 24: Balman stork cw09

Future DirectionsFuture Directions

Stork: Central Scheduling Framework

f b l k

Stork: Central Scheduling Framework

• Performance bottleneck– Hundreds of jobs submitted to a single batch 

h d l kscheduler, Stork

• Single point of failure

Page 25: Balman stork cw09

Future DirectionsFuture Directions

Distributed Data Scheduling

• Interaction between data scheduler• Manage data activities with lightweight agents in each site

Distributed Data Scheduling

• Manage data activities with lightweight agents in each site• Better parameter tuning and reordering of data placement 

jobs

– Job Delegation – peer‐to‐peer data movement – data and server striping – make use of replicas for multi‐source downloads

Page 26: Balman stork cw09

Questions?Questions?

Team:

Tevfik Kosar kosar@cct lsu eduTevfik Kosar [email protected] Balman [email protected] Yin [email protected] "Jacob" Cheng [email protected]

www.petashare.org www.cybertools.loni.org www.storkproject.orgwww.cct.lsu.edu