1 workload analysis of globus gridftp nicolas kourtellis joint work with:lydia prieto, gustavo...
DESCRIPTION
3 Server to server transfers => Duplicate Report of the same transfer -Criteria to identify duplicates: 1) Window of 5 records 2) Complementary stor_or_retr code (0 or 1) 3) Number of bytes, buffer size, block size same 4) Transfer time (end-start) within 1sec difference 5) Start (or end) time within 60sec difference 6) For more than one matching records, pick the smallest difference in transfer time. 16.8 million server-to-server transfers ~5.75 million records: self transfers (same hostname for source/destination) Metrics Project Dataset (Cont.)TRANSCRIPT
![Page 1: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/1.jpg)
1
Workload Analysis of Globus’ GridFTP
Nicolas Kourtellis
Joint Work with: Lydia Prieto,Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser
University of South Florida&
Argonne National Laboratory
![Page 2: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/2.jpg)
2
Metrics Project Dataset• Start with ~137.5 million records (Jul’05 - Mar’07)• ~22.8 million records with size: ≤0• ~1000 records with buffer size: <0• ~3.9 million records for directory listing• ~4,600 records with invalid hostnames/IPs (e.g.
/[B@89712e)• ~11.4 million records from identified ANL-Teragrid
testings• ~16.8 million records: identified duplicate reports• ~5.75 million records: self transfers (same
hostname for source/destination)• => In the end: ~77.2 million records or ~56.2%!
![Page 3: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/3.jpg)
3
• Server to server transfers =>Duplicate Report of the same transfer
- Criteria to identify duplicates:1) Window of 5 records2) Complementary stor_or_retr code (0 or 1)3) Number of bytes, buffer size, block size same4) Transfer time (end-start) within 1sec difference5) Start (or end) time within 60sec difference6) For more than one matching records, pick the smallest
difference in transfer time.16.8 million server-to-server transfers
• ~5.75 million records: self transfers (same hostname for source/destination)
Metrics Project Dataset (Cont.)
![Page 4: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/4.jpg)
4
Results (1): Transfer Size Distribution
Notes:1) 1st Peak: 16MB - 32MB, ~13 million records.2) 2nd Peak: 512B - 1KB (low transfer size), ~7.4 million records.3) 3rd Peak: 0-2B, ~5,2 million records.4) Maximum bucket: 8TB-16TB, 45 records.5) GB region buckets: ~255,000 records.
0
5
10
0 2 4 8 16 32 64 128
256
512 1 2 4 8 16 32 64 128
256
512 1 2 4 8 16 32 64 128
256
512 1
Bytes KBytes MBytes Transfer size
% o
f tot
al
Valid Transfers ANL Testings Self Testings Bogus IP Transfers
![Page 5: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/5.jpg)
5
Results (2): Buffer Size Distribution
Notes:1) 60% from the original table: 0B.2) Most commonly used: 16 – 128KB.3) Maximum bucket: 1-2GB, 92 records.
0
1
2
3
4
5
6
7
8
9
10
4 8 16 32 64 128 256 512 1 2 4 8 16 32KBytes MBytes
Buffer Size
% o
f tot
al
![Page 6: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/6.jpg)
6
Results (3): Average Bandwidth Distribution
Notes:1) Peak: 128-256Mbps, ~7.7 million records2) Most common: 4Mbps - 1Gbps of average bandwidth (58%)
0
2
4
6
8
10
12
0 2 4 8 16 32 64 128
256
512 1 2 4 8 16 32 64 128
256
512 1 2 4 8 16 32 64 128
256
512 1 2 4 8 16 32
bps Kbps Mbps GbpsAverage Bandwidth
% o
f tot
al
![Page 7: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/7.jpg)
7
Results (4): Number of Streams Distribution
Notes:1) ~70% of the transfers used 1 stream!! 2) Only ~20% of the transfers used 4 streams (suggested number by ANL’s
website),3) Total 10% of the user base used other numbers of streams4) Maximum of the CDF: 1010 streams(!!)
![Page 8: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/8.jpg)
8
Results (5): Average Bandwidth VS Number of Streams
Notes:1) Only transfer sizes ≥1GB2) Bandwidth increase of a factor 2 or 3, for streams > 10.3) Bandwidth ceiling at 800 - 900Mbps, after ~32 streams => Gbps infrastructure
0
100
200
300
400
500
600
700
800
900
1000
1 10 100 1000
Number of Streams (log10)
Ave
rage
Ban
dwid
th /
Mbp
s
![Page 9: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/9.jpg)
9
Results (6): Number of Stripes Distribution
• Summary of results:
• 1 Stripe: 99.5% of the transfers!• 2-31 Stripes: 0.5% of the transfers!
![Page 10: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/10.jpg)
10
Results (7): User and organization evolution over time
Notes:1) Continuing Increase on the evolution of the user and organization population.2) Forecasts: 43 new IPs and 14 organizations (domains) per month.
![Page 11: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/11.jpg)
11
Results (8): Geographical Characterization
Notes:1) USA: 78.4% or ~50.8 million transfers and 82.9% or ~1.3 PB in volume.2) Some activity from Canada, Taiwan, Japan and Spain (~14 million transfers
and 346TB in volume).
![Page 12: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/12.jpg)
12
Results (9): Server to Server Transfers (a)
Notes:1) ~257,000 transfers per month2) Growth rate of ~27,000 transfers per month.
- - - # of Transfers ___ Volume _ _ Linear fitting
![Page 13: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/13.jpg)
13
Results (10): Server to Server Transfers (b)
Notes:1) The small % of Inter-Domain transfers but respective high % in Volume2) The opposite for Intra-Domain (InterIP) transfers3) High reporting of Self Transfers (more than 1/3).
21.7%
39.5% 38.8%
72.2%
19.7%
8.2%
0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%
InterDomain InterIP SelfTransfers
Category
%
% # Transfers % Volume
![Page 14: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/14.jpg)
14
Results (11): Year Round Comparison for the Number of transfers (per month)
Comment: The ratio decreases as time goes by, suggesting a stabilizing trend, on the # of transfers.
Year - Round Comparison GridFTP - Total Number of Transfers
7.98.7
8.1
5.23.8
2.41.2
14.3
0123456789
101112131415
08-05/08-06 09-05/09-06 10-05/10-06 11-05/11-06 12-05/12-06 01-06/01-07 02-06/02-07 03-06/03-07
Month - Year
Rat
io
![Page 15: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/15.jpg)
15
Results (12): Year Round Comparison for the Volume of transfers (per month)
Comment: The ratio decreases as time goes by, suggesting a stabilizing trend, on the Volume of transfers. There is an unexplainable (?) dramatic increase on the Dec 06!
Year - Round Comparison GridFTP - Total Volume
1.8 1.9
9.611.9
15.0
5.73.4
49.3
0
5
10
15
20
25
30
35
40
45
50
08-05/08-06 09-05/09-06 10-05/10-06 11-05/11-06 12-05/12-06 01-06/01-07 02-06/02-07 03-06/03-07
Month - Year
Rat
io
![Page 16: 1 Workload Analysis of Globus GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South](https://reader038.vdocuments.net/reader038/viewer/2022100505/5a4d1b697f8b9ab0599b2614/html5/thumbnails/16.jpg)
16
DISCUSSIONOpen Questions:1) How can the functionalities of GridFTP be explored more for:
a) Better Performance (e.g. speedup of transfers streams, stripes etc)?b) Better Utilization of Resources (e.g. bandwidth, storage etc)?=> Tutorials, suggestions, solutions-tools-applications to users.
2) How can the system evolution and usage analysis be useful for:a) Prediction and provisioning of the Globus Grid's resourcesb) Designing new benchmarks for evaluation of Grids’ resources like data transfer components and for more realistic simulations.c) Why aren’t the big players of GridFTP (CERN etc) reported? (Version or component?)d) How much does the version of the component affect the results?e) Bottom Line: Are these data logs representative of the GridFTP population?f) If not, then what component, would have representative logs, even with limited logging?
3) How can we improve the Usage Statistics (Metrics) Collection system?a) Efficient to the vast number of reports in the future (e.g. daily summaries?)b) Robust (to attacks, bogus data etc)c) Apply reporting of bugs in the user system (in a form of live feedback)?d) Add more details to the reports (like source AND destination(?))e) Eliminate bugs in the reporting:
i) Duplicationii) FTP response code (with conjunction to (c)),iii) Zero & Negative Buffer Size values (crosscheck the values used and the values
allowed by the system)iv) Time Fields sometimes inconsistent,v) Use of the IP field, indexes on the DB to speedup the analysis.vi) Change in the schema of the DB. (monthly differences etc).