looking at the server-side of p2p systems

Looking at the Server-side of P2P Systems

Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda

Department of Computer Science

Northwestern University

www.cs.northwestern.edu

2

What is the Server-side?

No architecture distinction between “client” and “server” for a P2P system

Heterogeneity of peers– Some peers act more like servers – Server Side– Some act more like clients – Client Side

Server-side is important for P2P performance– Little attention has been given

3

Outline

Background and Motivation– Why scheduling the server-side?

Traces Collection and Study

Scheduling Methodology

Evaluation

Conclusions

4

Background

Peers in a P2P data-sharing system– Example - Gnutella– Query, query answer – Phase 1– download, upload – Phase 2– Role as a client

• Send queries, downloading objects

– Role as a server• Answer queries, uploading objects• Little research attention

5

Background (Cont.)

“Shark Tale” ?

Peer 3 got it!“Taxi” ?

No idea!P1

Query

Query Reply

Query

Query Reply

Phase 1: Queries and query replies in the P2P file-sharing system

P2

P4

P3

6

Background (Cont.)

P2

Phase 2: Download/Upload shared files

Little attention given to the server-side so far…

Give me “Taxi”

Job Queue

Give me “Shark Tale”

P4

P3P1

7

Motivation

Server-side is a key performance bottleneck of P2P data-sharing system– 80% of download requests get rejected due to

saturation of server capacity [Saroiu 2002]

• User-limited capacity, particularly, number of server threads

– 50% of all objects downloads take more than one day [Gummadi 2003]

Our goal– Server load characterization and analysis– New scheduling policies to shorten average

response time for each download

8

Challenge

Introduction of SRPT into web server scheduling has been very successful, but are more tricky for P2P server side…

• Requests are often not for whole objects• P2P servers are conservative with resource consumption• Popular P2P servers often operate under overloaded

conditions• Fetch-at-most-once behavior makes object popularity

NOT Zipf distribution [Gummadi 2003]

New scheduling policies based on P2P’s own characteristics are needed

9

Outline




Evaluation

Conclusions

10

Trace Collection and Study

Trace Collection Methodology– Build “honey pots”

• Passive monitoring of query strings• Download hot contents based on query popularity

– Run “honey pots”• Make collected objects available to the community• Record incoming download requests

– Arrival time, object name, requested size, downloaded size, service time, …

– Findings reported here based Gnutella traces

11

Traces in the Study

Different connection type, server thread number, shared object number, request number

Connection Type

Number of Threads

Number of Objects

Number of Requests

100Mbps Ethernet

200 1,533 300,000

100Mbps Ethernet

100 1,533 150,000

100Mbps Ethernet

50 500 80,000

Cable Modem 20 1,533 40,000

12

Server Workload

Distribution of job interarrival time?

Distribution of job size?

What is the performance bottleneck?– Why scheduling?

13

Job Interarrivals

9943.02 R

Job interarrivals can be well modeled by an exponential distribution– Coefficient of

determination – Almost straight line in

the semi-log plot

9943.02 R

14

Job Arrivals are Independent

Effectively nil– Jobs arrivals are independent of each

other– Significant difference with web server

15

Job Sizes

Three different job sizes– Full object size– Requested data chunk size

• Unique for P2P server• A request typically only for a small chunk size

– Served data chunk size• Unique for P2P server• Abort transfer, switch to another one• Known only after job is done

16

Job Sizes (Cont.)

Three different job sizes– Differs by several orders of magnitude– Approximated by Bounded Pareto

distribution

Object Size

Served Chunk Size

Requested Chunk Size

17

Server Resource Utilization

Resource utilization are conservative– Only run at background of normal computers– Set upper-bound for

• Number of server threads

• Aggregate bandwidth usage for upload

– For our busiest honey-pot• 1.2% to 20.0% CPU utilization

• Up to 20MBytes memory usage

– Bottleneck resource• The set of server threads for uploading

18

Given the total number of concurrent jobs that a server can take, how to schedule incoming

jobs so that the mean response time is minimized?

Our Scheduling Problem

19

Outline




Evaluation

Conclusions

20

Scheduling Policies

Shortest Remaining Processing Time (SRPT)– Always choose the process with the shortest

remaining processing time to serve

First-Come-First-Served (FCFS)– Serve incoming download requests based on

arrival order– Used by Gnutella for its job scheduling

Processor Sharing (PS)– Each job gets equal amount of service time in turn

21

SRPT

Studied since the 1960s [Schrage 1968]

Used for various applications– Packet network scheduling [Bux 1983]

– Scheduling for web servers [Harchol-Balter 2001]

Optimal for mean response time of jobs for a general G/G/1 queuing system

Problem– In most cases, service time is unknown until the

job is done

22

SRPT for P2P Servers

Main Challenge– How to estimate service time for a request is not

that clear!File size / Requested Chunk size / Served chunk

size?

One possible approach– Use request chunk size as the scheduling metric

• SRPT-CS – Uses requested chunk size

Two optimal approaches– Use served chunk size as the scheduling metric

• SRPT-SS – Uses served chunk size– Ideal SRPT

How well can they do?

23

Approximating ideal SRPT

Depends on the correlations between Requested Chunk Size, Served Chunk Size and Service time

But these correlations are weak

Why?– Client can exit anytime during transmission– Client can switch to other servers for a data chunk– Bandwidth bottlenecks exist somewhere else

Stats Service Time Service Chunk Size

Requested Chunk Size

Service Time 1.0000 0.7023 0.2833

Served Chunk Size 0.7023 1.0000 0.2339

Requested Chunk Size 0.2833 0.2339 1.0000

24

Outline




Evaluation

Conclusions

25

Evaluation

Evaluation Setup– Using a general purpose queuing simulator – Various scheduling policies– Trace driven simulations

• Queue capacity 500• System load between 0.1 and 10• Time slice of 0.01 seconds for PS scheduling

Metric– Mean response time– Rejection rate– Mean slowdown

26

Improved Mean Response Time

FCFS

PS

SRPT-CSSRPT-SS

SRPT

Ideal SRPT is the best

SRPT-CS does much better than FCFS and PS

27

With Lowest Rejection Rate

SRPT-based scheduling policies actually reject less jobs than FCFS and PS

SRPT-CS & SRPT-SS

SRPT

FCFS

28

Without Compromising Fairness

SRPT-based scheduling policies don’t starve large jobs

Mean slowdown for 10% largest jobs

0

5

10

15

20

25

30

FCFS PS SRPT-CS SRPT-SS SRPT

29

Summary

Server-side of P2P is critical to overall system performanceNot much can be learned from web server schedulingSRPT-based scheduling policies can help– Lowest mean response time– Lowest rejection rate– Without compromising fairness

Chunk size is a reasonable estimator for service time– SRPT-CS outperforms FCFS and PS

30

Ongoing Work

Large performance gaps between SRPT-CS, SRPT-SS, and SRPT– Only SRPT-CS can be directly implemented– Possible solution – predicting served chunk size

and service time using time series analysis

Traces representativeness

Performance in real implementation

Cooperative downloading/uploading?

Better estimator

31

For more information

www.aqualab.cs.northwestern.edu

Please also see our related workDong Lu, Huanyuan Sheng, Peter Dinda. "Size-Based

Scheduling Policies with Inaccurate Scheduling Information”. In Proc. of MASCOTS, 2004.

Dong Lu, Peter A. Dinda, Yi Qiao, Huanyuan Sheng and Fabián E. Bustamante. “Applications of SRPT Scheduling with Inaccurate Information”. in Proc. of MASCOTS, 2004.

looking at the server-side of p2p systems

Documents