1 workload analysis of globus gridftp nicolas kourtellis joint work with:lydia prieto, gustavo...

16

Click here to load reader

Upload: eric-allison

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

3 Server to server transfers => Duplicate Report of the same transfer -Criteria to identify duplicates: 1) Window of 5 records 2) Complementary stor_or_retr code (0 or 1) 3) Number of bytes, buffer size, block size same 4) Transfer time (end-start) within 1sec difference 5) Start (or end) time within 60sec difference 6) For more than one matching records, pick the smallest difference in transfer time.  16.8 million server-to-server transfers ~5.75 million records: self transfers (same hostname for source/destination) Metrics Project Dataset (Cont.)

TRANSCRIPT

Page 1: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

1

Workload Analysis of Globus’ GridFTP

Nicolas Kourtellis

Joint Work with: Lydia Prieto,Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser

University of South Florida&

Argonne National Laboratory

Page 2: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

2

Metrics Project Dataset• Start with ~137.5 million records (Jul’05 - Mar’07)• ~22.8 million records with size: ≤0• ~1000 records with buffer size: <0• ~3.9 million records for directory listing• ~4,600 records with invalid hostnames/IPs (e.g.

/[B@89712e)• ~11.4 million records from identified ANL-Teragrid

testings• ~16.8 million records: identified duplicate reports• ~5.75 million records: self transfers (same

hostname for source/destination)• => In the end: ~77.2 million records or ~56.2%!

Page 3: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

3

• Server to server transfers =>Duplicate Report of the same transfer

- Criteria to identify duplicates:1) Window of 5 records2) Complementary stor_or_retr code (0 or 1)3) Number of bytes, buffer size, block size same4) Transfer time (end-start) within 1sec difference5) Start (or end) time within 60sec difference6) For more than one matching records, pick the smallest

difference in transfer time.16.8 million server-to-server transfers

• ~5.75 million records: self transfers (same hostname for source/destination)

Metrics Project Dataset (Cont.)

Page 4: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

4

Results (1): Transfer Size Distribution

Notes:1) 1st Peak: 16MB - 32MB, ~13 million records.2) 2nd Peak: 512B - 1KB (low transfer size), ~7.4 million records.3) 3rd Peak: 0-2B, ~5,2 million records.4) Maximum bucket: 8TB-16TB, 45 records.5) GB region buckets: ~255,000 records.

0

5

10

0 2 4 8 16 32 64 128

256

512 1 2 4 8 16 32 64 128

256

512 1 2 4 8 16 32 64 128

256

512 1

Bytes KBytes MBytes Transfer size

% o

f tot

al

Valid Transfers ANL Testings Self Testings Bogus IP Transfers

Page 5: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

5

Results (2): Buffer Size Distribution

Notes:1) 60% from the original table: 0B.2) Most commonly used: 16 – 128KB.3) Maximum bucket: 1-2GB, 92 records.

0

1

2

3

4

5

6

7

8

9

10

4 8 16 32 64 128 256 512 1 2 4 8 16 32KBytes MBytes

Buffer Size

% o

f tot

al

Page 6: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

6

Results (3): Average Bandwidth Distribution

Notes:1) Peak: 128-256Mbps, ~7.7 million records2) Most common: 4Mbps - 1Gbps of average bandwidth (58%)

0

2

4

6

8

10

12

0 2 4 8 16 32 64 128

256

512 1 2 4 8 16 32 64 128

256

512 1 2 4 8 16 32 64 128

256

512 1 2 4 8 16 32

bps Kbps Mbps GbpsAverage Bandwidth

% o

f tot

al

Page 7: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

7

Results (4): Number of Streams Distribution

Notes:1) ~70% of the transfers used 1 stream!! 2) Only ~20% of the transfers used 4 streams (suggested number by ANL’s

website),3) Total 10% of the user base used other numbers of streams4) Maximum of the CDF: 1010 streams(!!)

Page 8: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

8

Results (5): Average Bandwidth VS Number of Streams

Notes:1) Only transfer sizes ≥1GB2) Bandwidth increase of a factor 2 or 3, for streams > 10.3) Bandwidth ceiling at 800 - 900Mbps, after ~32 streams => Gbps infrastructure

0

100

200

300

400

500

600

700

800

900

1000

1 10 100 1000

Number of Streams (log10)

Ave

rage

Ban

dwid

th /

Mbp

s

Page 9: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

9

Results (6): Number of Stripes Distribution

• Summary of results:

• 1 Stripe: 99.5% of the transfers!• 2-31 Stripes: 0.5% of the transfers!

Page 10: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

10

Results (7): User and organization evolution over time

Notes:1) Continuing Increase on the evolution of the user and organization population.2) Forecasts: 43 new IPs and 14 organizations (domains) per month.

Page 11: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

11

Results (8): Geographical Characterization

Notes:1) USA: 78.4% or ~50.8 million transfers and 82.9% or ~1.3 PB in volume.2) Some activity from Canada, Taiwan, Japan and Spain (~14 million transfers

and 346TB in volume).

Page 12: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

12

Results (9): Server to Server Transfers (a)

Notes:1) ~257,000 transfers per month2) Growth rate of ~27,000 transfers per month.

- - - # of Transfers ___ Volume _ _ Linear fitting

Page 13: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

13

Results (10): Server to Server Transfers (b)

Notes:1) The small % of Inter-Domain transfers but respective high % in Volume2) The opposite for Intra-Domain (InterIP) transfers3) High reporting of Self Transfers (more than 1/3).

21.7%

39.5% 38.8%

72.2%

19.7%

8.2%

0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%

InterDomain InterIP SelfTransfers

Category

%

% # Transfers % Volume

Page 14: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

14

Results (11): Year Round Comparison for the Number of transfers (per month)

Comment: The ratio decreases as time goes by, suggesting a stabilizing trend, on the # of transfers.

Year - Round Comparison GridFTP - Total Number of Transfers

7.98.7

8.1

5.23.8

2.41.2

14.3

0123456789

101112131415

08-05/08-06 09-05/09-06 10-05/10-06 11-05/11-06 12-05/12-06 01-06/01-07 02-06/02-07 03-06/03-07

Month - Year

Rat

io

Page 15: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

15

Results (12): Year Round Comparison for the Volume of transfers (per month)

Comment: The ratio decreases as time goes by, suggesting a stabilizing trend, on the Volume of transfers. There is an unexplainable (?) dramatic increase on the Dec 06!

Year - Round Comparison GridFTP - Total Volume

1.8 1.9

9.611.9

15.0

5.73.4

49.3

0

5

10

15

20

25

30

35

40

45

50

08-05/08-06 09-05/09-06 10-05/10-06 11-05/11-06 12-05/12-06 01-06/01-07 02-06/02-07 03-06/03-07

Month - Year

Rat

io

Page 16: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South

16

DISCUSSIONOpen Questions:1) How can the functionalities of GridFTP be explored more for:

a) Better Performance (e.g. speedup of transfers streams, stripes etc)?b) Better Utilization of Resources (e.g. bandwidth, storage etc)?=> Tutorials, suggestions, solutions-tools-applications to users.

2) How can the system evolution and usage analysis be useful for:a) Prediction and provisioning of the Globus Grid's resourcesb) Designing new benchmarks for evaluation of Grids’ resources like data transfer components and for more realistic simulations.c) Why aren’t the big players of GridFTP (CERN etc) reported? (Version or component?)d) How much does the version of the component affect the results?e) Bottom Line: Are these data logs representative of the GridFTP population?f) If not, then what component, would have representative logs, even with limited logging?

3) How can we improve the Usage Statistics (Metrics) Collection system?a) Efficient to the vast number of reports in the future (e.g. daily summaries?)b) Robust (to attacks, bogus data etc)c) Apply reporting of bugs in the user system (in a form of live feedback)?d) Add more details to the reports (like source AND destination(?))e) Eliminate bugs in the reporting:

i) Duplicationii) FTP response code (with conjunction to (c)),iii) Zero & Negative Buffer Size values (crosscheck the values used and the values

allowed by the system)iv) Time Fields sometimes inconsistent,v) Use of the IP field, indexes on the DB to speedup the analysis.vi) Change in the schema of the DB. (monthly differences etc).