a dynamic data grid replication strategy to minimize the data missed ming lei, susan vrbsky, xiaoyan...

A Dynamic Data Grid Replication Strategy to

Minimize the Data Missed

Ming Lei, Susan Vrbsky, Xiaoyan Hong

University of Alabama

Agenda

Background &Previous Work Motivation System Models Result Conclusion Future Work

Background Large scale geographically distributed syst

ems are becoming more and more popular Replication of data is the most common so

lution to improve file access time Dynamic behavior of Grid users makes it d

ifficult to make decisions concerning data replications to meet the system availability goal

Previous work: Several replica schemes compared for savin

g access latency and bandwidth – unlimited storage [Ranganathan, et al. 2002]

HotZone algorithm to minimize the client-to-replica latency [Szymaniak et al. 2005]

HBR - dynamic replica replication strategy to reduce data access time by avoiding networking congestion [Park et al. 2003]

Motivation: As bandwidth and computing capacity

have become relatively cheaper, the data access latency can drop dramatically

System reliability and availability becomes the focus

Any data file access failure can lead to an incorrect result or a job crash

People can tolerate a small delay but not any system unreliability

Motivation:

Replicate data to:Maximize system data availabilityAssume limited storage resourcesWithout sacrificing data access latency

Architecture:

System Model: Note that system level data availability is more i

mportant than an individual file’s availability Two new measurements proposed:

System File Missing Rate SFMR

number of files potentially unavailable number of all the files requested by all the jobs.

System Bytes Missing Rate SBMR

number of bytes potentially unavailable total number of bytes requested by all jobs.

System Model: Given a set of jobs, J = (j1, j2, j3…, jN), each j

ob will access one file set F= (f1,f2..fk) File must stored at a Storage Element (SE) File availability will depend on the SE availa

bility For any file, its availability is :

pi = 1-

k

i

seip1

)1(

1. SFMR =

2. SBMR=

Job requests can be converted to a series of file access operations

System Model:

n

i

i

n

i

k

j

j

k

P

1

1 1

)1(

n

i

k

j

j

n

i

k

j

j

js

SP

1 1

1 1

*)1(

SFMR =

SBMR=

The set O means the file accessing set. We assume the whole storage limit in the whole grid system is S, so we have:

≤S, Ci denotes the number of

copies of fi and S is the total storage available.

System Model:

||

)1(

Oo

Pi

i

oS

oSP

i

i

i

ii *)1(

1

*m

i i

i

C S

For each file access operation ri, at instant T, we associate it with an important variable Vi, which will be set to the number of times this file will be accessed in the future.

How to make such a value Vi (4 ways):

1. No Prediction : The Vi = 1 at any time.

2. Bio Prediction: Vi is based on the file access history to predict the value of the file by a binomial distribution.

3. Zipf Prediction: Vi is based on the file access history to predict the value of the file by a Zipf distribution.

4. Queue Prediction: The current job queue is used to predict the value of the file. If the queue is empty, this Queue Prediction function will work the same as No Prediction.

System Model:

To achieve the optimal the SFMR and SBMR, we have to maximize the following values:

and

If the file sizes are the same, SFMR = SBMR.

To better describe our scheme and algorithm, We introduce a weight value as:

Wi =(Pj * Vj) /(Cj *Sj)

System Model:

i

i

i VoP *

o

VSPi

iii **

Algorithm:MinDmr Optimizer ():1. if requested file fi exists in the site then continue2. if requested file fi does not exist in the site and site has

enough free space then retrieve fi from remote site and store it.

3. if requested file fi does not exist in the site and site does not have enough free space then

sort the files in current SE by the file weight Wi in ascending order.

fetch the files from the sorted file list in order and add it into the candidates list until the accumulative file size of the candidate files are greater than or equal to the requested file.

4. Replicate the file if the value gained by replicating the file fi > accumulative value loss by deleting the candidate file fj from the SE:

ΔPi *Vi > ∑ΔPj *Vj

Candicatesj

Simulation Setting

OptorSim : developed by the EU DataGrid Project to test dynamic replica schemes. Eco optimizer (economical model – file replicated if maximizes

profit of SE)

Simulation Configuration :

File Set Size : 200 Job Set Size : 10000;

File set per job : 3~20 File Size : 1G

Network Topology Setting:

Results -

System File Missing Rate

0.0000.0010.0010.0020.0020.0030.0030.0040.0040.0050.005

Replica Schemes

SF

MR

Sequential

Random

RandomWalkGaussianRandom Zipf

SFMR with varying replica optimizers

Results - Total Job Time

5290000

5300000

5310000

5320000

5330000

5340000

5350000

5360000

Replica Schemes

Jo

b T

ime (

in s

ecs)

The Total job time with sequential access

File Missing Rate

0.00000

0.00050

0.00100

0.00150

0.00200

0.00250

0.00300

0.00350

0.00400

0.00450

0.00500

Replica Schemes

SF

MR

Random

Shortest Queue

Access Cost

Queue Access Cost

SFMR with varying job schedulers

Results – System File Missing Rate

0

0.5

1

1.5

2

2.5

3

4 8 16 32 64 128 256

Job queue Length

SF

MR

*0.0

0001

SEQ AccessPattern

Zipf AccessPattern

Job total Time

52400000

52600000

52800000

53000000

53200000

53400000

53600000

4 8 16 32 64 128 256

Job Queue Length

To

tal Jo

b T

ime

SEQ AccessPattern

Zipf AccessPattern

SFMR with varying job queue length

Total Job Time with varying job queue length

Results – File Missing Rate

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

0.00800

0.00900

200 300 400 500 600

File Space

SF

MR

LFUEcoBioEcoZipfBioMinDmrZipfMinDmrMinDmrNoPredictionMinDmrQueuePred

Missing Rate Gap (SBMR-SFMR)

Missing Rate Gap(SBMR-SFMR)

1.54

50.47

13.14

0.13

24.13

0.24 0.170.00

10.00

20.00

30.00

40.00

50.00

60.00

Replica Scheme

Mis

sin

g R

ate

*0

.00

00

1

SFMR with sequential access pattern

Conclusion

Proposed two metrics of data availability to evaluate the reliability of the system data in the Data Grid system

Discussed how we model the system availability problem Developed four prediction-based replica optimizers with t

he assumption that the Grid storage space is limited Presented our replica greedy algorithm that treats the ho

t and cold data file differently and uses a weighting factor for the replacement scheme.

Simulation results indicate our new strategies will outperform all others overall in terms of data availability

Future Work:

When the file size is not unique size, how to enhance our

scheme to differentiate the system file missing rate and system bytes missing rate

Work on new measurements to evaluate the job missing rate

Design new scheme and prediction function to minimize the new measurements

a dynamic data grid replication strategy to minimize the data missed ming lei, susan vrbsky, xiaoyan...

Documents

data file access failure

file access history

file fi

fk file

file sizes

data access time

grid system

system level data availability