switching 97
TRANSCRIPT
-
8/2/2019 Switching 97
1/63
-
8/2/2019 Switching 97
2/63
What is it all about?
How do we move traffic from one part of the network to another?
Connect end-systems to switches, and switches to each other
Data arriving to an input port of a switch have to be moved toone or more of the output ports
-
8/2/2019 Switching 97
3/63
Types of switching elements
Telephone switches
switch samples
Datagram routers
switch datagrams
ATM switches
switch ATM cells
-
8/2/2019 Switching 97
4/63
Classification
Packet vs. circuit switches
packets have headers and samples dont
Connectionless vs. connection oriented
connection oriented switches need a call setup
setup is handled in control planeby switch controller
connectionless switches deal with self-containeddatagrams
Connectionless
(router)
Connection-oriented
(switching system)Packetswitch
Internet router ATM switching system
Circuitswitch
Telephone switchingsystem
-
8/2/2019 Switching 97
5/63
Other switching element functions
Participate in routing algorithms
to build routing tables
Resolve contention for output trunks
scheduling
Admission control
to guarantee resources to certain streams
Well discuss these later
Here we focus on pure data movement
-
8/2/2019 Switching 97
6/63
Requirements
Capacity of switch is the maximum rate at which it can moveinformation, assuming all data paths are simultaneously active
Primary goal: maximize capacity
subject to cost and reliability constraints
Circuit switch must reject call if cant find a path for samples
from input to output
goal: minimize call blocking
Packet switch must reject a packet if it cant find a buffer to store
it awaiting access to output trunk goal: minimize packet loss
Dont reorderpackets
-
8/2/2019 Switching 97
7/63
A generic switch
-
8/2/2019 Switching 97
8/63
Outline
Circuit switching
Packet switching
Switch generations
Switch fabrics
Buffer placement
Multicast switches
-
8/2/2019 Switching 97
9/63
Circuit switching
Moving 8-bit samples from an input port to an output port
Recall that samples have no headers
Destination of sample depends on timeat which it arrives at theswitch
actually, relative order within a frame
Well first study something simpler than a switch: a multiplexor
-
8/2/2019 Switching 97
10/63
Multiplexors and demultiplexors
Most trunks time division multiplex voice samples
At a central office, trunk is demultiplexed and distributed toactive circuits
Synchronous multiplexor
N input lines
Output runs N times as fast as input
-
8/2/2019 Switching 97
11/63
More on multiplexing
Demultiplexor
one input line and N outputs that run N times slower
samples are placed in output buffer in round robin order
Neither multiplexor nor demultiplexor needs addressing
information (why?)
Can cascade multiplexors
need a standard
example: DS hierarchy in the US and Japan
-
8/2/2019 Switching 97
12/63
Inverse multiplexing
Takes a high bit-rate stream and scatters it across multipletrunks
At the other end, combines multiple streams
resequencing to accommodate variation in delays
Allows high-speed virtual links using existing technology
-
8/2/2019 Switching 97
13/63
A circuit switch
A switch that can handle N calls has N logical inputs and Nlogical outputs
N up to 200,000
In practice, input trunks are multiplexed
example: DS3 trunk carries 672 simultaneous calls
Multiplexed trunks carry frames= set of samples
Goal: extract samples from frame, and depending on position inframe, switch to output
each incoming sample has to get to the right output line and theright slot in the output frame
demultiplex, switch, multiplex
-
8/2/2019 Switching 97
14/63
Call blocking
Cant find a path from input to output
Internal blocking
slot in output frame exists, but no path
Output blocking
no slot in output frame is available
Output blocking is reduced in transitswitches
need to put a sample in one of severalslots going to the desirednext hop
-
8/2/2019 Switching 97
15/63
Time division switching
Key idea: when demultiplexing, position in frame determinesoutput trunk
Time division switching interchanges sample position within aframe: time slot interchange (TSI)
-
8/2/2019 Switching 97
16/63
How large a TSI can we build?
Limit is time taken to read and write to memory
For 120,000 circuits
need to read and write memory once every 125 microseconds
each operation takes around 0.5 ns => impossible with current
technology
Need to look to other techniques
-
8/2/2019 Switching 97
17/63
Space division switching
Each sample takes a different path thoguh the swithc,depending on its destination
-
8/2/2019 Switching 97
18/63
Crossbar
Simplest possible space-division switch
Crosspointscan be turned on or off
For multiplexed inputs, need a switching schedule(why?)
Internally nonblocking but need N2 crosspoints
time taken to set each crosspoint grows quadratically
vulnerable to single faults (why?)
-
8/2/2019 Switching 97
19/63
Multistage crossbar
In a crossbar during each switching time only one crosspoint perrow or column is active
Can save crosspoints if a crosspoint can attach to more thanone input line (why?)
This is done in a multistage crossbar
Need to rearrange connections every switching time
-
8/2/2019 Switching 97
20/63
Multistage crossbar
Can suffer internal blocking
unless sufficient number of second-level stages
Number of crosspoints < N2
Finding a path from input to output requires a depth-first-search
Scales better than crossbar, but still not too well
120,000 call switch needs ~250 million crosspoints
-
8/2/2019 Switching 97
21/63
Time-space switching
Precede each input trunk in a crossbar with a TSI
Delay samples so that they arrive at the right time for the spacedivision switchs schedule
-
8/2/2019 Switching 97
22/63
Time-space-time (TST) switching
Allowed to flip samples both on input and output trunk
Gives more flexibility => lowers call blocking probability
-
8/2/2019 Switching 97
23/63
Outline
Circuit switching
Packet switching
Switch generations
Switch fabrics
Buffer placement
Multicast switches
-
8/2/2019 Switching 97
24/63
Packet switching
In a circuit switch, path of a sample is determined at time ofconnection establishment
No need for a sample header--position in frame is enough
In a packet switch, packets carry a destination field
Need to look up destination port on-the-fly
Datagram
lookup based on entire destination address
Cell
lookup based on VCI
Other than that, very similar
-
8/2/2019 Switching 97
25/63
Repeaters, bridges, routers, and gateways
Repeaters: at physical level
Bridges: at datalink level (based on MAC addresses) (L2)
discover attached stations by listening
Routers: at network level (L3)
participate in routing protocols
Application level gateways: at application level (L7)
treat entire network as a single hop
e.g mail gateways and transcoders
Gain functionality at the expense of forwarding speed
for best performance, push functionality as low as possible
-
8/2/2019 Switching 97
26/63
Port mappers
Look up output port based on destination address
Easy for VCI: just use a table
Harder for datagrams:
need to find longest prefix match e.g. packet with address 128.32.1.20
entries: (128.32.*, 3), (128.32.1.*, 4), (128.32.1.20, 2)
A standard solution: trie
-
8/2/2019 Switching 97
27/63
Tries
Two ways to improve performance
cache recently used addresses in a CAM move common entries up to a higher level (match longer strings)
-
8/2/2019 Switching 97
28/63
Blocking in packet switches
Can have both internal and output blocking
Internal
no path to output
Output
trunk unavailable
Unlike a circuit switch, cannot predict if packets will block (why?)
If packet is blocked, must either buffer or drop it
-
8/2/2019 Switching 97
29/63
Dealing with blocking
Overprovisioning
internal links much faster than inputs
Buffers
at input or output
Backpressure
if switch fabric doesnt have buffers, prevent packet from entering
until path is available
Parallel switch fabrics
increases effective switching capacity
-
8/2/2019 Switching 97
30/63
Outline
Circuit switching
Packet switching
Switch generations
Switch fabrics
Buffer placement
Multicast switches
-
8/2/2019 Switching 97
31/63
Three generations of packet switches
Different trade-offs between cost and performance
Represent evolution in switching capacity, rather than in
technology
With same technology, a later generation switch achieves greater
capacity, but at greater cost
All three generations are represented in current products
-
8/2/2019 Switching 97
32/63
First generation switch
Most Ethernet switches and cheap packet routers
Bottleneck can be CPU, host-adaptor or I/O bus, depending
-
8/2/2019 Switching 97
33/63
Example
First generation router built with 133 MHz Pentium
Mean packet size 500 bytes
Interrupt takes 10 microseconds, word access take 50 ns
Per-packet processing time takes 200 instructions = 1.504 s
Copy loopregister
-
8/2/2019 Switching 97
34/63
Second generation switch
Port mapping intelligence in line cards
ATM switch guarantees hit in lookup cache
Ipsilon IP switching assume underlying ATM network
by default, assemble packets
if detect a flow, ask upstream to send on a particular VCI, andinstall entry in port mapper => implicit signaling
-
8/2/2019 Switching 97
35/63
Third generation switches
Bottleneck in second generation switch is the bus (or ring)
Third generation switch provides parallel paths (fabric)
-
8/2/2019 Switching 97
36/63
Third generation (contd.)
Features
self-routing fabric
output buffer is a point of contention
unless we arbitrateaccess to fabric
potential for unlimited scaling, as long as we can resolve contentionfor output buffer
-
8/2/2019 Switching 97
37/63
Outline
Circuit switching
Packet switching
Switch generations
Switch fabrics
Buffer placement
Multicast switches
-
8/2/2019 Switching 97
38/63
Switch fabrics
Transfer data from input to output, ignoring scheduling andbuffering
Usually consist of links and switching elements
-
8/2/2019 Switching 97
39/63
Crossbar
Simplest switch fabric
think of it as 2N buses in parallel
Used here for packetrouting: crosspoint is left open longenough to transfer a packet from an input to an output
For fixed-size packets and known arrival pattern, can computeschedule in advance
Otherwise, need to compute a schedule on-the-fly (what doesthe schedule depend on?)
-
8/2/2019 Switching 97
40/63
Buffered crossbar
What happens if packets at two inputs both want to go to sameoutput?
Can defer one at an input buffer
Or, buffer crosspoints
-
8/2/2019 Switching 97
41/63
Broadcast
Packets are tagged with output port #
Each output matches tags
Need to match N addresses in parallel at each output Useful only for small switches, or as a stage in a large switch
-
8/2/2019 Switching 97
42/63
Switch fabric element
Can build complicated fabrics from a simple element
Routing rule: if 0, send packet to upper output, else to loweroutput
If both packets to same output, buffer or drop
-
8/2/2019 Switching 97
43/63
Features of fabrics built with switching elements
NxN switch with bxb elements has elements withelements per stage
Fabric is self routing
Recursive
Can be synchronous or asynchronous
Regular and suitable for VLSI implementation
logbN N b/
-
8/2/2019 Switching 97
44/63
Banyan
Simplest self-routing recursive fabric
(why does it work?) What if two packets both want to go to the same output?
output blocking
-
8/2/2019 Switching 97
45/63
Blocking
Can avoid with a buffered banyan switch
but this is too expensive
hard to achieve zero loss even with buffers
Instead, can check if path is available before sending packet
three-phase scheme
send requests
inform winners
send packets
Or, use several banyan fabrics in parallel
intentionally misroute and tag one of a colliding pair
divert tagged packets to a second banyan, and so on to k stages
expensive
can reorder packets
output buffers have to run k times faster than input
-
8/2/2019 Switching 97
46/63
Sorting
Can avoid blocking by choosing order in which packets appearat input ports
If we can
present packets at inputs sorted by output
remove duplicates remove gaps
precede banyan with a perfect shuffle stage
then no internal blocking
For example, [X, 010, 010, X, 011, X, X, X] -(sort)->
[010, 011, 011, X, X, X, X, X] -(remove dups)->[010, 011, X, X, X, X, X, X] -(shuffle)->[010, X, 011, X, X, X, X, X]
Need sort, shuffle, and trap networks
-
8/2/2019 Switching 97
47/63
Sorting
Build sorters from merge networks
Assume we can merge two sorted lists
Sort pairwise, merge, recurse
-
8/2/2019 Switching 97
48/63
Merging
-
8/2/2019 Switching 97
49/63
Putting it together- Batcher Banyan
What about trapped duplicates?
recirculate to beginning or run output of trap to multiple banyans (dilation)
-
8/2/2019 Switching 97
50/63
Effect of packet size on switching fabrics
A major motivation for small fixed packet size in ATM is ease ofbuilding large parallel fabrics
In general, smaller size => more per-packet overhead, but morepreemption points/sec
At high speeds, overhead dominates! Fixed size packets helps build synchronous switch
But we could fragment at entry and reassemble at exit
Or build an asynchronous fabric
Thus, variable size doesnt hurt too much
Maybe Internet routers can be almost as cost-effective as ATMswitches
-
8/2/2019 Switching 97
51/63
Outline
Circuit switching
Packet switching
Switch generations
Switch fabrics
Buffer placement
Multicast switches
-
8/2/2019 Switching 97
52/63
Buffering
All packet switches need buffers to match input rate to servicerate
or cause heavy packet loses
Where should we place buffers?
input in the fabric
output
shared
-
8/2/2019 Switching 97
53/63
Input buffering (input queueing)
No speedup in buffers or trunks (unlike output queued switch)
Needs arbiter Problem: head of line blocking
with randomly distributed packets, utilization at most 58.6%
worse with hot spots
-
8/2/2019 Switching 97
54/63
Dealing with HOL blocking
Per-output queues at inputs
Arbiter must choose one of the input ports for each output port
How to select?
Parallel Iterated Matching
inputs tell arbiter which outputs they are interested in
output selects one of the inputs
some inputs may get more than one grant, others may get none
if >1 grant, input picks one at random, and tells output
losing inputs and outputs try again Used in DEC Autonet 2 switch
-
8/2/2019 Switching 97
55/63
Output queueing
Dont suffer from head-of-line blocking
But output buffers need to run much faster than trunk speed(why?)
Can reduce some of the cost by using the knockoutprinciple
unlikely that all N inputs will have packets for the same output
drop extra packets, fairly distributing losses among inputs
-
8/2/2019 Switching 97
56/63
Shared memory
Route only the header to output port
Bottleneck is time taken to read and write multiported memory
Doesnt scale to large switches But can form an element in a multistage switch
-
8/2/2019 Switching 97
57/63
Datapath: clever shared memory design
Reduces read/write cost by doing wide reads and writes
1.2 Gbps switch for $50 parts cost
-
8/2/2019 Switching 97
58/63
Buffered fabric
Buffers in each switch element
Pros
Speed up is only as much as fan-in
Hardware backpressure reduces buffer requirements
Cons
costly (unless using single-chip switches)
scheduling is hard
-
8/2/2019 Switching 97
59/63
Hybrid solutions
Buffers at more than one point
Becomes hard to analyze and manage
But common in practice
-
8/2/2019 Switching 97
60/63
Outline
Circuit switching
Packet switching
Switch generations
Switch fabrics
Buffer placement Multicast switches
-
8/2/2019 Switching 97
61/63
Multicasting
Useful to do this in hardware
Assume portmapper knows list of outputs
Incoming packet must be copied to these output ports
Two subproblems
generating and distributing copes
VCI translation for the copies
-
8/2/2019 Switching 97
62/63
Generating and distributing copies
Either implicit or explicit
Implicit
suitable for bus-based, ring-based, crossbar, or broadcast switches
multiple outputs enabled after placing packet on shared bus
used in Paris and Datapath switches
Explicit
need to copy a packet at switch elements
use a copynetwork
place # of copies in tag
element copies to both outputs and decrements count on one ofthem
collect copies at outputs
Both schemes increase blocking probability
-
8/2/2019 Switching 97
63/63
Header translation
Normally, in-VCI to out-VCI translation can be done either atinput or output
With multicasting, translation easier at output port (why?)
Use separate port mapping and translation tables
Input maps a VCI to a set of output ports
Output port swaps VCI
Need to do two lookups per packet