mor harchol-balter carnegie mellon university
DESCRIPTION
Scheduling Your Network Connections. Mor Harchol-Balter Carnegie Mellon University. Joint work with Bianca Schroeder. FCFS. jobs. jobs. PS. SRPT. jobs. Q: Which minimizes mean response time?. “size” = service requirement. load r < 1. Q: Which best represents - PowerPoint PPT PresentationTRANSCRIPT
1
Mor Harchol-BalterCarnegie Mellon University
Joint work with Bianca Schroeder
2
“size” = service requirement
load < 1
jobs SRPT
jobs
jobs PS
FCFS
Q: Which minimizes mean response time?
3
“size” = service requirement
jobs SRPT
jobs
load < 1
jobs PS
FCFS
Q: Which best represents scheduling in web servers ?
4
IDEAHow about using SRPT instead of PS in web servers?
Linux 0.S.
WEBSERVER(Apache)
client 1
client 2
client 3
“Get File 1”
“Get File 2”
“Get File 3”
Internet
5
Many servers receive mostly static web requests.
“GET FILE”
For static web requests, know file size
Approx. know service requirement of request.
Immediate Objections 1) Can’t assume known job size
2) But the big jobs will starve ...
6
Outline of Talk
1) “Analysis of SRPT Scheduling: Investigating Unfairness”
2) “Size-based Scheduling to Improve Web Performance”
3) “Web servers under overload: How scheduling can help”
THEORY
IMPLEMENT
www.cs.cmu.edu/~harchol/
7
THEORY SRPT has a long history ...
1966 Schrage & Miller derive M/G/1/SRPT response time:
1968 Schrage proves optimality
1979 Pechinkin & Solovyev & Yashkov generalize
1990 Schassberger derives distribution on queue length
BUT WHAT DOES IT ALL MEAN?
8
THEORYSRPT has a long history (cont.)1990 - 97 7-year long study at Univ. of Aachen under Schreiber SRPT WINS BIG ON MEAN!
1998, 1999 Slowdown for SRPT under adversary: Rajmohan, Gehrke, Muthukrishnan, Rajaraman, Shaheen, Bender, Chakrabarti, etc. SRPT STARVES BIG JOBS!
Various o.s. books: Silberschatz, Stallings, Tannenbaum: Warn about starvation of big jobs ...
Kleinrock’s Conservation Law: “Preferential treatment given to one class of customers is afforded at the expense of other customers.”
9
Unfairness Question
SRPT
PS?
?
Let =0.9. Let G: Bounded Pareto(= 1.1, max=1010)
Question: Which queue does biggest job prefer?
10
THEORYOur Analytical Results (M/G/1):
SRPT
PS
All-Can-Win Theorem:Under workloads with heavy-tailed (HT) property,ALL jobs, including the verybiggest, prefer SRPT to PS,provided load not too close to 1.
Almost-All-Win-Big Theorem:Under workloads with HT property, 99% of all jobs perform orders of magnitude better under SRPT.
I SRPT
Counter-intuitive!
11Berkeley Unix process CPU lifetimes [HD96]
Fraction of jobs with CPU duration > x
Duration (x secs)
log-log plot
Pr{Life > x} = 1x
What’s Heavy-Tail?
12
What’s the Heavy-Tailproperty?
20 , ~ } { Pr xxXDefn: heavy-tailed distribution:
Many real-world workloads well-modeled by truncated HT distribution.
Key property: HT Property:
“Largest 1% of jobs comprise half the load.”
13
THEORYOur Analytical Results (M/G/1):
SRPT
PS
All-Can-Win Theorem:Under workloads with heavy-tailed (HT) property,ALL jobs, including the verybiggest, prefer SRPT to PS,provided load not too close to 1.
Almost-All-Win-Big Theorem:Under workloads with HT property, 99% of all jobs perform orders of magnitude better under SRPT.
I SRPT
Counter-intuitive!
14
THEORYOur Analytical Results (M/G/1):
All-distributions-win-thm:If load < .5, for every job size distribution,ALL jobs prefer SRPT to PS.
Bounding-the-damage Theorem:For any load, for every job size distribution, for every size x,
PSSRPT xTExTE )]([1)]([
15
What does SRPT mean within a Web server?
• Many devices: Where to do the scheduling?
• No longer one job at a time.
IMPLEMENT From theory to practice:
16
Server’s Performance BottleneckIMPLEMENT
5
Linux 0.S.
WEBSERVER(Apache)
client 1
client 2
client 3
“Get File 1”
“Get File 2”
“Get File 3”
Rest ofInternet ISP
Site buyslimited fractionof ISP’s bandwidth
We model bottleneck by limiting bandwidth on server’s uplink.
17
Network/O.S. insides of traditional Web server
Sockets take turnsdraining --- FAIR = PS.
WebServer
Socket 1
Socket 3
Socket 2Network Card
Client1
Client3
Client2BOTTLENECK
IMPLEMENT
18
Network/O.S. insides of our improved Web server
Socket corresponding to filewith smallest remaining datagets to feed first.
WebServer
Socket 1
Socket 3
Socket 2Network Card
Client1
Client3
Client2
priorityqueues.
1st
2nd3rd
S
M
L
BOTTLENECK
IMPLEMENT
19
Experimental Setup
Implementation SRPT-based scheduling: 1) Modifications to Linux O.S.: 6 priority Levels 2) Modifications to Apache Web server 3) Priority algorithm design.
Linux 0.S.
123
APACHEWEB
SERVER
Linux
123
200
Linux
123
200
Linux
123
200
switch
WAN
EM
UW
AN E
MU
WAN
EM
U
20
Experimental Setup
APACHEWEB
SERVER
Linux 0.S.
123
Linux
123
200
Linux
123
200
Linux
123
200
switch
WAN
EM
UW
AN E
MU
WAN
EM
U
Trace-based workload: Number requests made: 1,000,000Size of file requested: 41B -- 2 MBDistribution of file sizes requested has HT property.
FlashApache
WAN EMUGeographically-dispersed clients
10Mbps uplink100Mbps uplinkSurgeTrace-basedOpen systemPartly-open
Load < 1Transient overload
+ Other effects: initial RTO; user abort/reload; persistent connections, etc.
21
Preliminary Comments
• Job throughput, byte throughput, and bandwidth utilization were same under SRPT and FAIR scheduling.
• Same set of requests complete.
• No additional CPU overhead under SRPT scheduling. Network was bottleneck in all experiments.
APACHEWEB
SERVER
Linux 0.S.
123
Linux
123
200
Linux
123
200
Linux
123
200
switch
WAN
EM
UW
AN E
MU
WAN
EM
U
22
Load
FAIR
SRPTMea
n R
espo
nse
Tim
e (s
ec)
Results: Mean Response Time
.
.
.
.
.
.
23
FAIR
SRPT
Load
Mea
n Sl
owdo
wn
Results: Mean Slowdown
24
Percentile of Request Size
Mea
n R
espo
nse
time
(s)
FAIR
SRPT
Load =0.8
Mean Response Time vs. Size Percentile
25
• SRPT scheduling yields significant improvements in Mean Response Time at the server.
• Negligible starvation.
• No CPU overhead.
• No drop in throughput.
Summary so far ...
26
More questions …
• So far only showed LAN results. Are the effects of SRPT in a WAN as strong?
• So far only showed load < 1. What happens under SRPT vs. FAIR when the server runs under transient overload? -> new analysis -> implementation study
27
WAN EMU resultsPropagation delay has additive effect.Reduces improvement factor.
FAIR
SRPT
28
WAN EMU resultsLoss has quadratic effect.Reduces improvement factor a lot.
FAIR
SRPT
29
WAN results Geographically-dispersed clients
Load 0.9
Load 0.7
30
Zzzzzzzzzz...
Personunder
overload
Overload – 5 minute overview
31
Q: What happens under overload?A: Buildup in number of connections.
FAIR
SRPT
Q: What happens to response time?
32
Web server under overload
Clients
SYN-queueWhen reach SYN-queue limit, server drops all connection requests.
Server
SYN-queue ACK-queue Apache-processes
33
Transient Overload
34
Transient Overload - Baseline
Mean response time SRPTFAIR
35
Transient overloadResponse time as function of job
size
small jobswin big!
big jobsaren’t hurt!
FAIR
SRPT
WHY?
36
Baseline CaseWAN propagation delays
WAN loss
Persistent ConnectionsInitial RTO valueSYN CookiesUser Abort/ReloadPacket LengthRealistic Scenario
WAN loss + delay
RTT: 0 – 150 ms
Loss: 0 – 15%
Loss: 0 – 15%RTT: 0 – 150 ms,
0 – 10 requests/conn.
RTO = 0.5 sec – 3 secON/OFF
Abort after 3 – 15 sec, with 2,4,6,8 retries.
Packet length = 536 – 1500 Bytes
RTT = 100 ms; Loss = 5%; 5 requests/conn.,RTO = 3 sec; pkt len = 1500B; User abortsAfter 7 sec and retries up to 3 times.
FACTORS
37
Transient Overload - Realistic
Mean response timeFAIR SRPT
38
SRPT scheduling is a promising solution for reducing mean response time seen by clients, particularly when
the load at server bottleneck is high.
SRPT results in negligible or zero unfairness to large requests.
SRPT is easy to implement.
Results corroborated via implementation and analysis.
Conclusion