randy h. katz computer science division electrical engineering and computer science department

24
1 Quality of Service vs. Any Service at All 10th IEEE/IFIP Conference on Network Operations and Management Systems (NOMS 2006) Vancouver, BC, Canada April 2006 Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department University of California, Berkeley Berkeley, CA 94720-1776

Upload: gili

Post on 25-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Quality of Service vs. Any Service at All 10th IEEE/IFIP Conference on Network Operations and Management Systems (NOMS 2006) Vancouver, BC, Canada April 2006. Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department University of California, Berkeley - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

1

Quality of Service vs.Any Service at All

10th IEEE/IFIP Conference on Network Operations and Management Systems

(NOMS 2006)Vancouver, BC, Canada

April 2006

Randy H. KatzComputer Science Division

Electrical Engineering and Computer Science DepartmentUniversity of California, Berkeley

Berkeley, CA 94720-1776

Page 2: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

2

Networks Under Stress

Page 3: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

3

= 60% growth/year

Vern Paxson, ICIR, “Measuring Adversaries”

Page 4: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

4

= 596% growth/year

Vern Paxson, ICIR, “Measuring Adversaries”

“Background”Radiation

--Dominates

traffic in manyof today’snetworks

Page 5: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

5

Network Protection

• Internet reasonably robust to point problems like link and router failures (“fail stop”)

• Successfully operates under a wide range of loading conditions and over diverse technologies

• During 9/11/01, Internet worked well, under heavy traffic conditions and with some major facilities failures in Lower Manhattan

Page 6: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

6

Network Protection

• Networks awash in illegitimate traffic: port scans, propagating worms, p2p file swapping

– Legitimate traffic starved for bandwidth– Essential network services (e.g., DNS, NFS)

compromised

• Need: active management of network services to achieve good performance and resilience even in the face of network stress

– Self-aware network environment– Observing and responding to traffic changes– Sustaining the ability to control the network

Page 7: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

7

Berkeley Experience

• Campus Network– Unanticipated traffic renders the network

unmanageable – DoS attacks, latest worm, newest file sharing protocol

largely indistinguishable--surging traffic– In-band control is starved, making it difficult to

manage and recover the network

• Department Network– Suspected DoS attack against DNS– Poorly implemented spam appliance overloads DNS– Difficult to access Web or mount file systems

Page 8: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

8

Why and HowNetworks Fail

• Complex phenomenology of failure• Traffic surges break enterprise networks• “Unexpected” traffic as deadly as high net

utilization– Cisco Express Forwarding: random IP addresses --> flood route

cache --> force traffic thru slow path --> high CPU utilization --> dropped router table updates

– Route Summarization: powerful misconfigured peer overwhelms weaker peer with too many router table entries

– SNMP DoS attack: overwhelm SNMP ports on routers– DNS attack: response-response loops in DNS queries generate

traffic overload

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 9: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

9

TechnologyTrends

• Integration of servers, storage, switching, and routing – Blade Servers, Stateful Routers,

Inspection-and-Action Boxes (iBoxes)

• Packet flow manipulations at L4-L7– Inspection/segregation/accounting of traffic– Packet marking/annotating

• Building blocks for network protection– Pervasive observation and statistics collection– Analysis, model extraction, statistical correlation and causality

testing– Actions for load balancing and traffic shaping

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Load Balancing Traffic Shaping

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 10: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

10

Generic Network Element Architecture

InterconnectionFabric

Inp

ut

Port

s

Outp

ut

Port

s

Buffers

Buffers

Buffers

“Tag”Mem

CPCPCPAP

ActionProcessor

CPCPCPCP

ClassificationProcessor

Rules &Programs

Page 11: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

11

Active Network Elements

• Server Edge• Network Edge• Device Edge

NetworkEdge

ServerEdge

DeviceEdge

Server Load BalancingStorage Nets

NAT, Access ControlNetwork-Device Configuration

Firewall, IDSTraffic Shaper

iBox

iBox

iBox

Page 12: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

12

iBoxes: Observe, Analyze, Act

Inspection-and-Action Boxes:Deep multiprotocol packet inspectionNo routing; observation & markingPolicing points: drop, fence, block

AccessTier

AccessTier

AccessTier

R R

R R

Internet orWAN Edge

RDistribution

Tier

I

EE

EE

EE

EE

EE

EE

UserEnd Hosts

ServerEnd Hosts

Network ServicesEnd Hosts

I

I

I R I

R

Enterprise Network Architecture

Page 13: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

13

Observe-Analyze-Act

• Control exercised, traffic classified, resources allocated• Statistics collection, prioritizing, shaping, blocking, …• Minimize/mitigate effects of attacks & traffic surges• Classify traffic into good, bad, and ugly (suspicious)

– Good: standing patterns and operator-tunable policies– Bad: evolves faster, harder to characterize– Ugly: cannot immediately be determined as good or bad

• Filter the bad, slow the suspicious, preserve for the good– Sufficient to reduce false positives– Suspicious-looking good traffic may be slowed down, but won’t be blocked

Page 14: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

14

InternetEdge

PC

AccessEdge

MS

FSSpamFilter

DNS

Server Edge

Scenario

DistributionTier

Page 15: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

15

ObservedOperational Problems

• User visible services:– NFS mount operations time out– Web access also fails intermittently due to time outs

• Failure causes:– Independent or correlated failures?– Problem in access, server, or Internet edge?– File server failure?– Internet denial of service attack?

Page 16: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

16

Network Dashboard

b/wconsumed

time

Gentle risein ingressb/w

FS CPUutilization

time

No unusualpattern

MS CPUutilization

time

Mail trafficgrowing

DNS CPUutilization

time

Unusualstep jump/DNS xactrates

AccessEdge b/wconsumed

time

Declinein accessedge b/w

InWeb

OutWeb

Email

Page 17: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

17

Network Dashboard

b/wconsumed

time

FS CPUutilization

time

MS CPUutilization

time

DNS CPUutilization

time

AccessEdge b/wconsumed

time

Gentle risein ingressb/w

No unusualpattern

Mail trafficgrowing

Unusualstep jump/DNS xactrates

Declinein accessedge b/w

InWeb

OutWeb

Email

CERT Advisory!DNS Attack!

Page 18: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

18

Observed Correlations

• Mail traffic up• MS CPU utilization up

– Service time up, service load up, service queue longer, latency longer

• DNS CPU utilization up– Service time up, request rate up,

latency up

• Access edge b/w down

Causality no surprise!

How doesmail trafficcause DNSload?

Page 19: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

19

Run ExperimentShape Mail Traffic

MS CPUutilization

time

Mail trafficlimited

DNS CPUutilization

time

DNS down

AccessEdge b/wconsumed

time

Accessedge b/wreturns

Root cause:

Spam appliance --> DNS lookups to verify sender domains;

Spam attack hammers internal DNS, degrading other services: NFS, Web

InWeb

OutWeb

Email

Page 20: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

20

Policies and ActionsRestore the Network

• Shape mail traffic– Mail delay acceptable to users?– Can’t do this forever unless mail is filtered at the

Internet edge

• Load balance DNS services– Increase resources faster than incoming mail rate– Actually done: dedicated DNS server for Spam

appliance

• Other actions? Traffic priority, QoS knobs

Page 21: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

21

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Analysis

• Root causes difficult to diagnose– Transitive and hidden causes

• Key is pervasive observation– iBoxes provide the needed infrastructure– Observations to identify correlations– Perform active experiments to “suggest” causality

Page 22: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

22

Many Challenges

• Policy specification: how to express? Service Level Objectives?

• Experimental plan– Distributed vs. centralized development– Controlling the experiments … when the network is stressed– Sequencing matters, to reveal “hidden” causes

• Active experiments– Making things worse before they get better– Stability, convergence issues

• Actions– Beyond shaping of classified flows, load balancing, server

scaling?

Page 23: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

23

Implications for Network Operations and Management

• Processing-in-the-Network is real• Enables pervasive monitoring and actions• Statistical models to discover correlations

and to detect anomalies• Automated experiments to reveal causality• Policies drive actions to reduce network

stress

Page 24: Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department

24

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Networks Under Stress