looking at the server-side of p2p systems
DESCRIPTION
Looking at the Server-side of P2P Systems. Yi Qiao , Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University www.cs.northwestern.edu. What is the Server-side?. No architecture distinction between “client” and “server” for a P2P system - PowerPoint PPT PresentationTRANSCRIPT
Looking at the Server-side of P2P Systems
Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda
Department of Computer Science
Northwestern University
www.cs.northwestern.edu
2
What is the Server-side?
No architecture distinction between “client” and “server” for a P2P system
Heterogeneity of peers– Some peers act more like servers – Server Side– Some act more like clients – Client Side
Server-side is important for P2P performance– Little attention has been given
3
Outline
Background and Motivation– Why scheduling the server-side?
Traces Collection and Study
Scheduling Methodology
Evaluation
Conclusions
4
Background
Peers in a P2P data-sharing system– Example - Gnutella– Query, query answer – Phase 1– download, upload – Phase 2– Role as a client
• Send queries, downloading objects
– Role as a server• Answer queries, uploading objects• Little research attention
5
Background (Cont.)
“Shark Tale” ?
Peer 3 got it!“Taxi” ?
No idea!P1
Query
Query Reply
Query
Query Reply
Phase 1: Queries and query replies in the P2P file-sharing system
P2
P4
P3
6
Background (Cont.)
P2
Phase 2: Download/Upload shared files
Little attention given to the server-side so far…
Give me “Taxi”
Job Queue
Give me “Shark Tale”
P4
P3P1
7
Motivation
Server-side is a key performance bottleneck of P2P data-sharing system– 80% of download requests get rejected due to
saturation of server capacity [Saroiu 2002]
• User-limited capacity, particularly, number of server threads
– 50% of all objects downloads take more than one day [Gummadi 2003]
Our goal– Server load characterization and analysis– New scheduling policies to shorten average
response time for each download
8
Challenge
Introduction of SRPT into web server scheduling has been very successful, but are more tricky for P2P server side…
• Requests are often not for whole objects• P2P servers are conservative with resource consumption• Popular P2P servers often operate under overloaded
conditions• Fetch-at-most-once behavior makes object popularity
NOT Zipf distribution [Gummadi 2003]
New scheduling policies based on P2P’s own characteristics are needed
9
Outline
Background and Motivation– Why scheduling the server-side?
Traces Collection and Study
Scheduling Methodology
Evaluation
Conclusions
10
Trace Collection and Study
Trace Collection Methodology– Build “honey pots”
• Passive monitoring of query strings• Download hot contents based on query popularity
– Run “honey pots”• Make collected objects available to the community• Record incoming download requests
– Arrival time, object name, requested size, downloaded size, service time, …
– Findings reported here based Gnutella traces
11
Traces in the Study
Different connection type, server thread number, shared object number, request number
Connection Type
Number of Threads
Number of Objects
Number of Requests
100Mbps Ethernet
200 1,533 300,000
100Mbps Ethernet
100 1,533 150,000
100Mbps Ethernet
50 500 80,000
Cable Modem 20 1,533 40,000
12
Server Workload
Distribution of job interarrival time?
Distribution of job size?
What is the performance bottleneck?– Why scheduling?
13
Job Interarrivals
9943.02 R
Job interarrivals can be well modeled by an exponential distribution– Coefficient of
determination – Almost straight line in
the semi-log plot
9943.02 R
14
Job Arrivals are Independent
Effectively nil– Jobs arrivals are independent of each
other– Significant difference with web server
15
Job Sizes
Three different job sizes– Full object size– Requested data chunk size
• Unique for P2P server• A request typically only for a small chunk size
– Served data chunk size• Unique for P2P server• Abort transfer, switch to another one• Known only after job is done
16
Job Sizes (Cont.)
Three different job sizes– Differs by several orders of magnitude– Approximated by Bounded Pareto
distribution
Object Size
Served Chunk Size
Requested Chunk Size
17
Server Resource Utilization
Resource utilization are conservative– Only run at background of normal computers– Set upper-bound for
• Number of server threads
• Aggregate bandwidth usage for upload
– For our busiest honey-pot• 1.2% to 20.0% CPU utilization
• Up to 20MBytes memory usage
– Bottleneck resource• The set of server threads for uploading
18
Given the total number of concurrent jobs that a server can take, how to schedule incoming
jobs so that the mean response time is minimized?
Our Scheduling Problem
19
Outline
Background and Motivation– Why scheduling the server-side?
Traces Collection and Study
Scheduling Methodology
Evaluation
Conclusions
20
Scheduling Policies
Shortest Remaining Processing Time (SRPT)– Always choose the process with the shortest
remaining processing time to serve
First-Come-First-Served (FCFS)– Serve incoming download requests based on
arrival order– Used by Gnutella for its job scheduling
Processor Sharing (PS)– Each job gets equal amount of service time in turn
21
SRPT
Studied since the 1960s [Schrage 1968]
Used for various applications– Packet network scheduling [Bux 1983]
– Scheduling for web servers [Harchol-Balter 2001]
Optimal for mean response time of jobs for a general G/G/1 queuing system
Problem– In most cases, service time is unknown until the
job is done
22
SRPT for P2P Servers
Main Challenge– How to estimate service time for a request is not
that clear!File size / Requested Chunk size / Served chunk
size?
One possible approach– Use request chunk size as the scheduling metric
• SRPT-CS – Uses requested chunk size
Two optimal approaches– Use served chunk size as the scheduling metric
• SRPT-SS – Uses served chunk size– Ideal SRPT
How well can they do?
23
Approximating ideal SRPT
Depends on the correlations between Requested Chunk Size, Served Chunk Size and Service time
But these correlations are weak
Why?– Client can exit anytime during transmission– Client can switch to other servers for a data chunk– Bandwidth bottlenecks exist somewhere else
Stats Service Time Service Chunk Size
Requested Chunk Size
Service Time 1.0000 0.7023 0.2833
Served Chunk Size 0.7023 1.0000 0.2339
Requested Chunk Size 0.2833 0.2339 1.0000
24
Outline
Background and Motivation– Why scheduling the server-side?
Traces Collection and Study
Scheduling Methodology
Evaluation
Conclusions
25
Evaluation
Evaluation Setup– Using a general purpose queuing simulator – Various scheduling policies– Trace driven simulations
• Queue capacity 500• System load between 0.1 and 10• Time slice of 0.01 seconds for PS scheduling
Metric– Mean response time– Rejection rate– Mean slowdown
26
Improved Mean Response Time
FCFS
PS
SRPT-CSSRPT-SS
SRPT
Ideal SRPT is the best
SRPT-CS does much better than FCFS and PS
27
With Lowest Rejection Rate
SRPT-based scheduling policies actually reject less jobs than FCFS and PS
SRPT-CS & SRPT-SS
SRPT
FCFS
28
Without Compromising Fairness
SRPT-based scheduling policies don’t starve large jobs
Mean slowdown for 10% largest jobs
0
5
10
15
20
25
30
FCFS PS SRPT-CS SRPT-SS SRPT
29
Summary
Server-side of P2P is critical to overall system performanceNot much can be learned from web server schedulingSRPT-based scheduling policies can help– Lowest mean response time– Lowest rejection rate– Without compromising fairness
Chunk size is a reasonable estimator for service time– SRPT-CS outperforms FCFS and PS
30
Ongoing Work
Large performance gaps between SRPT-CS, SRPT-SS, and SRPT– Only SRPT-CS can be directly implemented– Possible solution – predicting served chunk size
and service time using time series analysis
Traces representativeness
Performance in real implementation
Cooperative downloading/uploading?
Better estimator
31
For more information
www.aqualab.cs.northwestern.edu
Please also see our related workDong Lu, Huanyuan Sheng, Peter Dinda. "Size-Based
Scheduling Policies with Inaccurate Scheduling Information”. In Proc. of MASCOTS, 2004.
Dong Lu, Peter A. Dinda, Yi Qiao, Huanyuan Sheng and Fabián E. Bustamante. “Applications of SRPT Scheduling with Inaccurate Information”. in Proc. of MASCOTS, 2004.