bufferbloat - dark buffers in the...

50
Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you should be paranoid too Jim Gettys Bell Labs April 25, 2011 [email protected], [email protected]

Upload: others

Post on 24-Feb-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

BufferbloatDark Buffers in the Internet

or: how I became paranoid, and you should be paranoid too

Jim Gettys

Bell Labs

April 25, 2011

[email protected], [email protected]

Page 2: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

If I have seen a little further, it is because I am on the shoulder of giants- Isaac Newton

This is a personal history – but I've uncovered only a few pieces of this puzzle, and assembled the puzzle. It's not a pretty picture.

Many more pieces, and much more important pieces, come to me from many other people, including Rich Woundy, Dave Clark, Dave Reed, Van Jacobson, Nick Weaver, Vern Paxson, and many others some of whom I do not even know... My apologies if I have overlooked your contributions to solving the puzzle.

Page 3: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

“The Internet is slow today, Daddy!”

There have been puzzle pieces you may be familiar with, once you know the picture of the puzzle I assemble in this talk

Few of the pieces described here will surprise anyone familiar with networking

The picture is, however, surprising

You've probably heard “The Internet is Slow today” for years. I've heard this for years from my family. I even tried to debug my network many times and generated service calls. Every time I would go look, the network stopped mis-behaving.

April, 2010 – trivial bandwidth/latency test (scp & ping simultaneously) clearly demonstrated a problem: poor latency during continuous data transfers: 1-2 seconds latency, with very rapidly varying 1-2 seconds jitter

In following up in July, I first thought Comcast Powerboost might be the problem (though I should have known better)

Page 4: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Some History of buffer queue management in routers

During the 1980's and 1990's, Internet traffic grew at a phenomenal rate

Internet routers in most were commonly congested all over the Internet

These routers had sufficient memory that their buffers would fill under load, causing poor latency

Research in AQM (Active Queue Management) continued, resulting in the development of RED (Floyd and Jacobson, 1993) and other AQM algorithms, that had become widely deployed in Internet routers by the late 1990's

AQM algorithms signal TCP and other protocols to slow down by dropping packets, signaling congestion, keeping buffer queues from filling and adding latency

The situation eased by the late 1990's, once enough fiber had been deployed and routers caught up with the demand, and RED was deployed

Most people even aware of the issue thought the problem was “solved”; I sure did...

Page 5: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

TCP Congestion Control and Avoidance

TCP will fill any buffer just before the choke point on a path

TCP's (usually unstated) design assumption is that a congested network will generate timely notification of congestion, by packet loss or ECN (Explicit Congestion Notification, not to date deployed). To do otherwise destroys TCP and other congestion avoiding protocol's control loops

We judge other Internet transport protocols by whether they are “at least as good” as TCP at avoiding congestion

What happens if TCP's timeliness assumption is violated by a lot?

What happens if packet loss is avoided by buffering? Dropping a packet is evil, isn't it? But....

Some small amount of timely packet loss is not only normal, it is essential for packet networks correct operation when congested!

Page 6: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

© Alcatel-Lucent 2010, 2011

CPE

3g carriernetwork

Internet

CPE

hotel

MOE Peru

RNC

Network Map – Home to MIT Co-Lo center

CMTS

Page 7: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Lunch with Comcast... July 15 2010 - Many puzzle pieces

Lunch with Rich Woundy of Comcast provided many, many, puzzle pieces that fell into place later

Suggests the “big buffers” problem, which he been chasing on suggestion of Dave Clark for over two years, but had had no proof of the problem

I could drop back to DOCSIS 2, and the differences between DOCSIS 2 and DOCSIS 3 could be used to rule out TurboBoost

RED (the most commonly available form of AQM, Active Queue Management) is often not enabled in major networks. AQM is often distrusted due to the need to tune RED, which has made many network operators averse to it. Some network managers run RED or other AQM, some run without

ECN (Explicit Congestion Notification) is blocked in some networks – Dave Oran later explained to me that ECN packets crash many (old) home routers

Pointer to ICSI netalyzr

Page 8: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

The Smoking Gun – bufferbloat is killing latency in the Internet

“We rushed into the captain's cabin . . . there he lay with his brains smeared over the chart of the Atlantic . . . while the chaplain stood with a smoking pistol in his hand at his elbow.”

Sherlock Holmes story, The Gloria Scott, by Arthur Conan Doyle

Page 9: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

dslreports smokeping service observing my home link – 7/16

RTT of this path is less than 10ms!

rsync of X Consortium archives from my house to expo.x.org.

“good behavior” are when I suspended the copy to get work done.

I decided to do a simple scp and ping, and take a packet trace and see what was up

Page 10: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Bursts of really, really ill behavior of TCP: what the f*** is going on!

Page 11: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Plots looks like no TCP behavior I've ever seen....

Bursts of duplicate acks; bursts of retransmits; lots of SACK's; excessive packet drops – on long timescales

Cable move at my house due to lightning means that technician tests JG's cable at home end, and Comcast has technicians check my cable at CMTS end. Nominal cable (once my interior TV wiring was removed, anyway); as good Cable service ever gets outside of a lab

Others later reproduce the behavior: Partha, at Georgia Tech, Nick Weaver at ICSI, and in labs

Page 12: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

“Typical” tcptrace plot of a perfect TCP session

Yellow: Receive WindowRed: Instantaneous outstanding data

Blue: average outstanding dataGreen: weighted average outstanding data

Page 13: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

HUH? .... Half a megabyte in flight over a 10ms path??? Spikes???

Yellow: Receive WindowRed: Instantaneous outstanding data

Blue: average outstanding dataGreen: weighted average outstanding data

Page 14: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

HUH? ..... WTF?

Throughput graph

Page 15: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

HUH? ..... WTF?

RTT graph

Page 16: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

HUH? ..... WTF?

Red: Instantaneous outstanding data Blue: average outstanding dataGreen: weighted average outstanding data

Page 17: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

© Alcatel-Lucent 2010, 2011

CPE

3G carriernetwork

Internet

CPE

hotel

MOE Peru

RNC

The Plot Thickens....... My Inlaw's FIOS service.

Page 18: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

The Plot Thickens....... My Inlaw's FIOS service.

My inlaws wired FIOS service in Summit, New Jersey, 25/25 service, 7/30/2010

1/4Mbyte outstanding data on a 20ms path? 200Ms latency?

Page 19: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

And thickens.... Wireless side of the FIOS home router

Wireless side of inlaw's FIOS service, Verizon provided router, > 400ms, 9% loss

Page 20: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

I called the fire chiefs/consulting detectives for help

Some of you are real TCP experts; I'm not...

Dave Clark, Dave Reed, Vern Paxon, Greg Chesson, Van Jacobson, and Dick Sites have all looked this over, and agree with the conclusions

Van Jacobson says there are timestamps in my data which proves the case for bufferbloat, since both ends were recent enough Linux systems and implement TCP timestamps by default

What is more, the bloated buffers are defeating congestion avoidance since the buffers do not allow timely notification

Traffic classification (QOS) can not help you. These are stupid devices. Even if they did classify, that would only move where and when the pain occurs.

This then begs the question: how common and widespread is bufferbloat?

Page 21: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

"Netalyzr: Illuminating Edge Network Neutrality, Security, and Performance"C. Kreibich, N. Weaver, B. Nechaev, and V. Paxson

Arrow direction is increasing latencyNote: telephony standards for latency are only 150ms!!!

This data is a lower bound on the severity of the broadband bufferbloat problem; there was a bug in netalyzr causing it to sometimes fail to fill buffers and therefore detect buffer sizes

This data mixes wireless and wired traffic, so may be contaminated with home router bufferbloat

Uplink Downlink

Green diagonal line == .5 second latency black diagonal line == 4 second latency

Page 22: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Triggers – example applications that saturate a path

Uploads and downloads – uploads may often be worse than downloads due to details of hardware and asymmetric provisioning, but some examples include...

YouTube uploads, Crash dump uploads

Email with large attachments

BitTorrent

File copies/backup

CIFS (Samba)

TCP-based IPSEC tunnels can totally lose

Downloads of movies to disk

Video teleconferencing

Web browsing many web sites, particularly image heavy sites like YouTube, Google Images, Bing, etc.

Page 23: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

What happens when a network is slow due to bufferbloat? - protocols fail due to both packet loss & high latency timeouts

Once a network/link exhibits high latency and bad packet loss, other critical, statistically insignificant but mission critical packets can't do their jobs

DNS – adding 100's ms or seconds of latency to lookups kills web browser performance and losses cause lookup failures

ARP - relies on timely resolution to find other devices on your network

DHCP - if these packets are lost or excessively delayed, machines can't get on the network

RA and ND - essential for IPv6 functioning

VOIP - needs about a single packet per 10ms flow in order to be good, and less than 30ms jitter.

Gamers - will get fragged more often with latencies above their twitch factor

Responsiveness of all network applications, web or otherwise, suffers

What protocols do you run that will die with timeouts & excessive packet loss?

Page 24: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Paranoia sets in

ICSI has proven the broadband edge is broken!

But...

I think bufferbloat is (almost)

everywhere...

Page 25: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

During my tests, I caught my home router doing terrible things!

8 second latency over a path of less than 10ms! Much worse than my broadband only test

scp of files from my house to expo.x.org, while running speedtest.net

Recent commercial home router, 50/10 Comcast service, 9/10/2010

Page 26: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

© Alcatel-Lucent 2010, 2011

CPE

3g carriernetwork

Internet

CPE

hotel

MOE Peru

RNC

The Plot Thickens....... Home router experiments

Page 27: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

© Alcatel-Lucent 2010, 2011

Home router bufferbloat led me to host bufferbloat

Eight second latencies? Insanity... What is going on?

I replicated this problem on several different modern home routers

Given the experience with broadband, and the ICSI data, I had become seriously paranoid: you should be too!

Eventually, I installed OpenWRT on a router, so I could understand better

I had realized that Linux's transmit queue might be a problem

I set the txqueuelen knob to zero on the router, but nothing happened...

More hair lost, and then I realized: the queue is on my laptop in the upload direction!

And on the home router in the download direction.

Any time broadband's bandwidth exceeds wireless transfer rates at home, the bottleneck becomes the device/home router hop

this is an increasing problem, that I already suffer frequently from due to chimneys in my

house + high broadband speed

Page 28: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Host bufferbloat, and your home router

Since home routers are usually using general purpose operating systems under the covers (e.g. Linux), the problem is on both sides of your wireless link

Buffers hide in multiple places in modern OS's + hardware (Linux, Macintosh, Windows alike) – more about this later

Let's do a simple calculation, presuming 10Mbps actual data transfer rate:

256 packets is of order 3,000,000 bits. So here's 1/3 of a second (one way)

What happens at a busy conference, where your “fair share” might be 100Kbps? 30 seconds: applications (and people) timeout entirely....

And your excessive packet loss rate induced by bufferbloat is increasing failures further

Page 29: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

But you may say, why don't I see bufferbloat on ethernet?

You DO see bufferbloat trivially on Mac OSX and Linux on 100Mbps ethernet:

Use a test program capable of saturating the connection....

The network is slower than your 1Gbps NIC, so buffers build on your machine

The driver ring buffers are/appear to be about 256 packets in size on modern hardware:

you get ~10ms of latency with simple file copies

Windows is “interesting”... you get good latency, relative to Mac OSX and Linux

The fastest any version of Windows will run (by default) is about 85Mbps; so the wire is

faster than the OS, and therefore the OS buffers never fill

A Microsoft tech note explains that to get better multimedia experience, Windows

bandwidth shapes outgoing traffic

I think Microsoft noticed ethernet bufferbloat, but did not fully understand it, and implemented a pragmatic band-aid

Page 30: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

© Alcatel-Lucent 2010, 2011

CPE

3g carriernetwork

Internet

CPE

hotel

MOE Peru

RNC

Where does host bufferbloat hurt?

Page 31: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Aggregate network behavior – Back to the Future

What happens if a network has buffers “all over”? Such as any wired network without AQM enabled in its routers... Classic congestion of the 1990's, yet again

Wireless: first seen in satellite links in the '80s.

Base station protocol adaptation, to cover error correction in high error environments

Static buffers used to cover radio bandwidth variation

Wireless protocols themselves (802.11, 3g)

Buffers start working in an aggregate fashion

Latency will go up when loaded if there are buffers all over; but you won't observe much packet drop - Expect a diurnal (daily) pattern: people timeout before packets do

This is exactly what Dave Reed reported in 2009 on end-to-end interest in 3G networks

I've observed up to 6 second latency: Dave Reed has observed up to 30 seconds!

Bufferbloat is now known to exist in some (all?) RNC's, and observed in back haul networks

Back-haul networks are also failing to run RED or other AQM when they should

Page 32: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Conversations with Van Jacobson

Over ten years ago, Kathie Nichols walked into Van's office one afternoon, and showed him that RED has two bugs

Van & Kathie twice failed to get the paper with RED's problems and fixes published

Van says RED has several problems; these problems mean that RED requires tuning, and the 100 or more papers about RED tuning in the last decade confirm this.

Ergo, network operator's reluctance to enable RED is understandable, even if their fears are (usually, but not completely) excessive: RED must be used carefully!

A 1999 draft of RED in a Different Light draft did escape; a finished paper should be available soon (I hope). Warning: old draft nRED algorithm there still has a bug

Van warns that time based behavior in Internet gear could also cause congestion and instability. Note that a number of recent broadband technologies clearly have this property. Van wishes one could pace packet transmission but hardware support is lacking - Bunching of initially well paced packets has also been observed already. Synchronization and global resonance does occur!

Page 33: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

© Alcatel-Lucent 2010, 2011

CPE

3G carriernetwork

Internet

CPE

hotel

MOE Peru

RNC

Aggregate bufferbloat locations

Head end

Page 34: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Fat Subnets: e.g. 802.11 or many other wireless technologies

Concrete example, 25 802.11 nodes, each with a single packet buffer, trying to transmit to an access point.

Some nodes are far away, and the AP adapts down to, say 2Mbps?

You have 25 * 1500 bytes of buffering; this is > .15 seconds excluding any overhead, if everything goes well.

What happens if:

You buffer 20 packets instead of 1 on each node? 200? or 1000?

You keep trying to retransmit packets in the name of “reliability”?(some MAC's are known to try to transmit up to 255 times; 8 times in common)

And, in the name of “reliability”, any inherently unreliable multicast/broadcast traffic drops the radio bandwidth to minimum?

You then try to run WDS or 802.11s, which both forward packets and/or respond to any multicast (e.g. ARP) with routing messages?

Page 35: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

What do you get? An ugly, very complicated and hard to debug sight

Conferences using 802.11

Schools

Hotels

Some network operator's networks

Painful personal example/failure: OLPC's network melted under load

Do you know what happens in Mongolia, where the teachers have the children all open their laptops at the same minute in the morning?

Net result: no packets would get through

Well before absolute meltdown, our applications failed due to timeouts

We had several problems including mesh routing problems; but we missed bufferbloat entirely in our failure analysis

Page 36: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Buffers are only detectable when they are next to a saturated bottleneck – at all other times they are “dark”

Your hosts, in your applications, and in socket buffers and network layers

Your MAC itself may have packet buffers internally;

Network device drivers themselves

Your network interface's ring buffer potentially buffers large numbers of packets, added to hardware years ago to

hide x86 SMM (system management mode) behavior and for marketing

The VM system your OS may be running on top of may add yet more layers

Your wireless access, at both ends

3g has buffers for fragmentation reassembly (how big?); I don't know about LTE

802.11 has similar issues: long packet delays destroy timely notification

Your switch fabric (8 ms/switch at 1GBPS): how many hops, how congested?

Your home router – potentially megabytes

Your CPE/cable modem/FIOS box – potentially megabytes

The head-ends of those connections (e.g. DSLAM, CMTS, etc.)

Each and every router in your path, and the line cards in those routers

Speed of light in glass and vacuum

Page 37: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Other Locations of Bufferbloat

Corporate networks

Smokeping made me suspect my corporate network wasn't right: latency spikes were present

ALU corporate network example: sophisticated classification is present, but no AQM enabled;

when we convert to Windows 7 and other systems, this becomes an immediate problem, as all

hosts will be able to saturate all network links

Satellite Links

I have observed > 20 second RTT's on links to the MOE in Peru, for example

Example Middlebox

Tunnel devices

Detected in my company's IPSEC infrastructure, at the firewall complex where the tunnels land

Firewall relays?

Haven't looked; but I expect so; if not in the OS, then in the relay applications

Elsewhere?

Encryption buffers?

Page 38: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

© Alcatel-Lucent 2010, 2011

CPE

3G carriernetwork

Internet

CPE

hotel

MOE Peru

RNC

Where do bloated buffers hide?

Page 39: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Web Browser/servers – TCP initial congestion window changes

HTTP/1.1 spec prohibited more than 2 connections to the same path: current chrome, Firefox, IE may open up to 6 (or sometimes many more) connections to the same server complex. Initial in-flight packets are therefore at least 12 data packets in current browsers; yet more possible if the TCP connection lasts long enough for the window to open further

Proposal is currently before the IETF to increase TCP's initial congestion window to 8-10 packets

Microsoft accidentally had a load balancer that set the initial window to infinity; Google has upped its initial window to 8 packets; Linux has just upped its initial window in its last release

The “impulse” into the network when visiting a web page is easily > 100 packets

> 1 megabit of data may easily be transmitted almost simultaneously, to arrive *SPLAT* at your home bottleneck, which is between 1 and 50 megabits in bandwidth. I've observed >150ms latency @ my 50Mbps broadband bottleneck, where no classification (QOS) or fair queuing is deployed; multiply accordingly – multi second delays are easily possible: what about VOIP? Gaming? Other users?

The consequences in the developing parts of the world may be even more extreme!

Are these changes a good idea? I think – NOT! I'd like to be able to use my Internet connection for applications other than solely web surfing, without complaints from my family, strangely...

Replace HTTP TCP (ab)use with some sanity: e.g. SPDY, HTTP/1.1 pipelining

Page 40: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Network Neutrality Implications of Bufferbloat

I believe (part of) BitTorrent's problems were incorrectly diagnosed. Consumer ISP's may believe control of applications is an existential issue

Buffers were already grossly excessive then, and uplinks were much slower

Windows XP does not enable TCP Window scaling, so it was not obvious what was happening as it then takes multiple TCP connections to saturate most links

Carrier's telephony currently has a major quality advantage over SIP VOIP or Skype

I do not believe it intentional: ISP's get the service calls due to bufferbloat and everyone who builds equipment has been making the same mistake

Innovation is at risk

Reliable low latency applications (games, hosted desktops, immersive teleconferencing) have become impossible to deploy

Fixing bufferbloat is essential to innovation in the Internet

Page 41: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Mitigation/solution depends on where the buffers are

High-end home routers sometimes have QOS knobs that can immediately partially mitigate the damage (see following two slides)

Can't deal with dynamic bandwidth availability (e.g. TurboBoost, TOD)

Can't deal with possible upstream buffers in devices like DSLAM or CMTS's

In networks not running with AQM, “just” turn it on

Education may be hard in some organizations – RED has its problems

Improved AQM algorithms need research and deployment

I'm trying to extricate Van & Kathie's RED paper; it should be available soon

Hosts need both immediate mitigation and better queue management solutions

Home routers are often Linux based; but classic RED won't work to solve 802.11 or 3g bufferbloat due to wireless's highly variable bandwidth: SFB, RED Light or other algorithms are needed

Page 42: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

dslreports smokeping service observing my home link – Before

RTT of this path is less than 10ms!

Scp of X Consortium archives from my house to expo.x.org.

The periods of “good behavior” are when I suspended the copy to get work done.

Page 43: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

dslreports smokeping service observing my home link – After

QOS knobs in router set empirically – stable latency at all loadsFrom > 1 second, to less than 10 milliseconds jitter and latency – 100 times better!(DSL reports observation site is 20ms away...)

Page 44: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Mitigation/solution depends on where the buffers are

Market solutions

Tests to make the problem visible

“gamer” routers/modems/CPE that implement queue management

Existing plant may or may not be fixable at all, or allow only partial mitigation

Set buffers to something reasonable and small, if firmware update is possible: e.g. bandwidth delay product, where delay is based on geography

Smarter algorithms between head end and CPE equipment – may require modifying standards in this area (e.g. DOCSIS, others)

LEDBAT algorithms might be useful for routers to detect their broadband connection's queues are growing to implement smart bandwidth shaping...

Hybrid: SOC's with existing silicon may be able to solve the problem: AQM everywhere!

New silicon

Page 45: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

And the bufferbloat problem is getting worse...

Generation follows generation of hardware; downward compatibility without thought gets you worse behavior each generation (e.g. ethernet, 802.11, or DOCSIS evolution)

More and applications saturate the edge networks

Windows XP is finally being retired: everything else already trivially saturate edge links

Memory cost is now zero for most products: you can't buy small enough chips for the “right” size buffers. And “right” depends on delay, and bandwidth and number of flows; and bandwidth may depend on radio propagation. There is no single “right answer”.

Sizing buffers on measured RTT, rather than first principles: self fulfilling prophecy

Since there are only bandwidth tests available, if a vendor wants “on” to an ISP's “approved hardware list”, and if they have an underlying problem they can paper over by increasing buffer sizes, what do you think they do?

This market behavior is absolutely known to be occurring

We must must change bandwidth marketing wars to bandwidth/latency wars.

What is the best figure of merit? Better be something simple. We have the opportunity to suggest an answer

Page 46: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

What should the IETF do?

Update RFC 2309 – “The RED Manifesto”, given the history and additional knowledge

BCP on router buffering – research shows conventional wisdom that more is better in routers is almost exactly backwards! (e.g. Nick McKeown's research)

Education – break down layered thinking!

Standardization & testing of AQM algorithms – particularly for the edge & hosts

Advice on applicability of AQM algorithms – needs to be a living document(s) for both for guidance of network operators and those building equipment and systems

Start mitigating bufferbloat while we work on actual solutions that “just work”

How to get ECN actually debugged and turned on in the net?

The HTTP group need to talk to transport to sort out the issues around # of simultaneous TCP connections with the initial congestion window change proposal, in light of bufferbloat and the desire to run other protocols simultaneously over our broken Internet of the present

Replace HTTP (I said this over 10 years ago!); at least get pipelining deployed.What to do about the failure of HTTP pipelining due to lack of sequence numbers in HTTP to enable out of order delivery and deal with buggy implementations?

Page 47: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

What puzzle pieces do Microsoft, Apple and Google hold?

Operating system bufferbloat

Mitigate immediately and then solve bufferbloat using AQM

Handsets – Cellular and Wifi alike

Middleboxes(?) (e.g. IPsec tunnels, wireless routers)

Market pressure on vendors

Browsers

Urgently develop/deploy technologies as SPDY & HTTP/1.1 pipelining

Do not further increase #connections and consider rollback of recent changes – most of the world does not have broadband

Major web sites & Networks (e.g. GoogleWiFi, customer services)

Rethink ICW changes: making a bad situation worse

Urgent Research – e.g. AQMs? Can we deploy ECN now? Everywhere? Or Somewhere?

Page 48: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Status – April, 2011

Publication: my blog, MH & IETF talksSoon: IEEE Internet Computing editorial, ACM Queue case study -> CACM

Bufferbloat.net is available for projects & mailing lists (> 220 current subs.)

SFB queue discipline is now upstream in latest Linux; eBDP has been reimplemented and both need testing: RED Light, soon...

John Linville is maintaining a “debloat-testing” tree for Linux

Initial builds of OpenWRT for debloat testing just available: much help needed testing: WNDR3700 is the initial home router target

Cable industry is mitigating bufferbloat: DOCSIS spec has been modified and CMTS's will be able to control cable modem buffering soon

Tons of work to do! E.g. Unified buffer management, drivers, testing frameworks on many platforms, working AQM everywhere: fix your pieces

Please come help, at a minimum, you can help by testing home routers!

Page 49: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Professor Moriarty is poisoning the Internet with the deadly Bufferfish...

Questions?

Page 50: Bufferbloat - Dark Buffers in the Internetmirrors.bufferbloat.net/Talks/BayBloat04262011/BayBloat.pdf · Bufferbloat Dark Buffers in the Internet or: how I became paranoid, and you

Bufferbloat, April 25, 2011 © Alcatel-Lucent 2010, 2011

Questions?

Remember, we are all in this bloat together!

Please come help before we sink!

My Blog – http://gettys.wordpress.com

Bufferbloat.net – http://bufferbloat.net

http://lists.bufferbloat.net

http://trac.tools.ietf.org/group/irtf/trac/wiki/ICCRG

This talk – http://mirrors.bufferbloat.net/Talks/BayBloat

Full length version of this talk:

http://mirrors.bufferbloat.net/Talks/BellLabs01192011/