![Page 1: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/1.jpg)
Dynamic Data Access to theGT/CERCS Linux Mirror Site
Mohamed MansourMatthew Wolf
Karsten Schwan
![Page 2: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/2.jpg)
HPGC - IPDPS 2004 2
Motivation
• Testing (benchmarking) high performancedistributed streaming applications– Scientific domain
– Enterprise applications
![Page 3: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/3.jpg)
HPGC - IPDPS 2004 3
Scientific Data Stream
MolecularDynamics
Bondscalculate bondsand radial dist.
openGLVisualization
server
Co-ordinates
openGL triangulardata
co-ordinates +bonds
Radial dist. data Service
Data Channel
![Page 4: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/4.jpg)
HPGC - IPDPS 2004 4
Application Specific Workloads
• Margo Seltzer et. al. [1999] - Test andevaluate systems with realistic workloads– Avoid over designing the system
– Provide rigorous insights into systemcapabilities
![Page 5: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/5.jpg)
HPGC - IPDPS 2004 5
Goal
• Understand user interactions with largestreaming data repositories– Analyze ftp traces of GT/CERCS mirror site
• A tool to replay such workloads– StreamGen workload generation tool
![Page 6: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/6.jpg)
HPGC - IPDPS 2004 6
Example
Bondscalculate bondsand radial dist.
openGLVisualization
server#1
openGL triangulardata
co-ordinates +bonds
Radial dist. data
openGLVisualization
server#2
openGL triangulardata
StreamPerf loadgenerator
Service
Data Channel
![Page 7: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/7.jpg)
HPGC - IPDPS 2004 7
Outline
• Overview and definitions
• Method of analysis
• Results
• Summary
• Q&A
![Page 8: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/8.jpg)
HPGC - IPDPS 2004 8
file_xxxx.rpm
file_xxxx.rpm
Non-Striped Trafficfile_xxxx.rpm
![Page 9: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/9.jpg)
HPGC - IPDPS 2004 9
Striped Traffic – DownloadAccelerators
file_xxxx.rpm
file_xxxx.rpm
file_xxxx.rpm
![Page 10: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/10.jpg)
HPGC - IPDPS 2004 10
Traffic Traces
file_xxxx.rpm
file_xxxx.rpm
file_xxxx.rpm
GT CERCSLinux Mirror
![Page 11: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/11.jpg)
HPGC - IPDPS 2004 11
file_xxxx.rpm
file_xxxx.rpm
bytestotal
bytesdownloadedfactorstriping
_
__ =
+( )=factorstriping _
Striping Factor
![Page 12: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/12.jpg)
HPGC - IPDPS 2004 12
file_xxxx.rpm file_xxxx.rpm
Striping Factor – Examples
%100_ =factorstriping
file_xxxx.rpm file_xxxx.rpm
%45_ =factorstriping
![Page 13: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/13.jpg)
HPGC - IPDPS 2004 13
Method of Analysis
• Reconstruct user sessions from xferlogtraces
• Metadata, site heuristics and assumptions– Limit of two concurrent connections per host
– ls-lr files with relative path information
– Idle timeout of 2 hours
![Page 14: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/14.jpg)
HPGC - IPDPS 2004 14
User SessionsRedhat 7.1 - Traffic Histogram (bin size = 1 day)
0
100
200
300
400
500
600
700
0 100 200 300 400 500 600 700
Time (days)
Ses
sio
ns
Non-striped traffic Striped traffic
![Page 15: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/15.jpg)
HPGC - IPDPS 2004 15
Striping Factor Distribution
0
5
10
15
20
25
30
0 10 20 30 40 50 60 70 80 90 100
Fraction of data downloaded from GA TECH server (%)
Fra
cti
on
of
req
uest
train
s (
%)
SuSE 7.3
SuSE 8.0
SuSE 8.1
![Page 16: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/16.jpg)
HPGC - IPDPS 2004 16
Single File Domination
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
RedHat
7.1
RedHat
7.2
RedHat
7.3
RedHat
8.0
SuSE 7
.3
SuSE 8
.0
SuSE 8
.1
Debian
Pot
ato
Debian
Woo
dy
Striped
Non-Striped
![Page 17: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/17.jpg)
HPGC - IPDPS 2004 17
Single File Distribution(striped)
Redhat 7.3 - single file downloads - parallel download
1
10
100
1000
1 10 100 1000 10000 100000 1E+06 1E+07 1E+08 1E+09
Downloaded data (bytes)
Fre
qu
ency
of
do
wn
load
s
![Page 18: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/18.jpg)
HPGC - IPDPS 2004 18
Single File Distribution(non-striped)
Redhat 7.3 - single file downloads - no download accelerator
1
10
100
1000
10000
1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09
Downloaded data (bytes)
Fre
qu
en
ce
of
do
wn
loa
ds
![Page 19: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/19.jpg)
HPGC - IPDPS 2004 19
Results
• Strong similarity between striped and non-stripedbehavior– Correlation factor between 70% and 98%
• Download accelerators are common– Only 20-25% of users do not use them
• Striping factor uniformly distributed over the range of 10-90%
• 7-25% ‘null’ requests• Requesting a single file is the most common pattern
– Download accelerators exhibit distinctive access patterns
![Page 20: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/20.jpg)
HPGC - IPDPS 2004 20
Contributions
• Traffic traces– Reconstructed from real traces
• StreamGen – a library to generatestreaming workloads– Derived from httperf
– Replays traffic traces, or generate statisticalpatterns
![Page 21: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/21.jpg)
HPGC - IPDPS 2004 21
Future Directions
• More in-depth analysis of striped behavior– Modified FTP server to collect offset data
• Use traces as realistic traffic models
![Page 22: Dynamic Data Access to the GT/CERCS Linux Mirror Site](https://reader034.vdocuments.net/reader034/viewer/2022051716/58a034751a28ab9f4a8c611a/html5/thumbnails/22.jpg)
HPGC - IPDPS 2004 22
References
• V. Oleson, K. Schwan, G. Eisenhauer, B. Plale, C. Pu, and D. Amin.“Operational information systems - an example from the airline industry.”In First Workshop on Industrial Experiences with Systems Software(WIESS)
• Matthew Wolf and Zhongtang Cai and Weiyun Huang and KarstenSchwan, “SmartPointers: personalized scientific data portals in yourhand.” In Proc. of the 2002 ACM/IEEE conference on Supercomputing,Baltimore, Maryland, 2002, pp. 1-16
• Margo Seltzer, David Krinsky, Keith Smith and Xiaolan Zhang, “The Casefor Application-Specific Benchmarking”, In Proceedings of the 1999Workshop on Hot Topics in Operating Systems, Rico, AZ, 1999
• D. Mosberger and T. Jin, “httperf: A tool for measuring web serverperformance”, WISP, ACM, Madison, WI, June 1998, pp. 59-67
• http://www.cc.gatech.edu/~mansour