ece 526 network processing systems design

47
ECE 526 Network Processing Systems Design Lecture 3-6: Implementation Models

Upload: macha

Post on 23-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

ECE 526 Network Processing Systems Design. Lecture 3-6: Implementation Models. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ECE 526  Network Processing Systems Design

ECE 526 Network Processing Systems Design

Lecture 3-6: Implementation Models

Page 2: ECE 526  Network Processing Systems Design

ECE 526 2

A rather small set of key concepts is enough. Only by learning the essence of each topic, and by carrying along the least amount of mental baggage at each step, will the student emerge with a good overall understanding of the subject!

- Carver Mead and Lynn Conway

Ning Weng

Page 3: ECE 526  Network Processing Systems Design

ECE 526 3

Goals of Models• Before we can play with improvements, we must know “the

rules of the game”.• Algorithmics uses four separate areas: protocols, architecture,

OS and algorithms• Question: How can a logic designer understand protocols and

how can an algorithm designer understand hardware• Answer:

Use simple models that have explanatory and predictive power

Leave details to the specialist

Ning Weng

Page 4: ECE 526  Network Processing Systems Design

ECE 526 4

Outline of Models• Networking protocols• Hardware• Architecture• Operating systems

Ning Weng

Page 5: ECE 526  Network Processing Systems Design

ECE 526 5

Transport Models

• Presents illusion of two shared queues in each direction despite unreliable network using seq # and retransmission

• Connection between local queues named ports at IP addresses

Ning Weng

Page 6: ECE 526  Network Processing Systems Design

ECE 526 6

Routing Protocols

• Forwarding: router determines next hop/router based on forwarding table to send a packet. Speed-critical.

• Routing: routers work together to populate/build forwarding table RIP (Exchange distance estimates) OSPF (Link state) BGP (Path vector, based on policy)

Ning Weng

Page 7: ECE 526  Network Processing Systems Design

ECE 526 7

Abstract Model of Protocols• Interfaces: used by applications or higher level protocols• Packet (message) format: IP,TCP, UDP• Typical protocol operations

Data copy: from application to kernel; from input link to output link via switch

Demultiplexing Looking up state: longest prefix match Set timers and receive alarms

Ning Weng

Page 8: ECE 526  Network Processing Systems Design

ECE 526 8

Measures• Throughput: number of jobs per second

Focus of industry, ISPs wish to maximize• Latency: time to complete a job

E.g. time for a packet to go from one end toanother (RTT)

Ideal interactive time 100ms (today 250 msec across US)

Ning Weng

Page 9: ECE 526  Network Processing Systems Design

ECE 526 9

Performance facts• Backbone link speeds:

Optical carrier (OC)-n → n x 51.8Mbits OC48 ~ 2.5 Gbps, OC192~10Gbps

• TCP and web dominance : Web (70%) andTCP (90%)

• Small transfers: 50% of accessed files <50Kbytes

• Poor locality: 1M concurrent type of packets in backbone. 5 packets to same destination

• Latencies large, e.g. 100 ms for wide area• Wire speed forwarding

half of packets 40 bytes, many 1500 bytes Takes 8 ns to send 40 bytes at 40 Gbps

Ning Weng

Page 10: ECE 526  Network Processing Systems Design

ECE 526 10

Case study: iSCSI• Large data centers connect disks and computers with a SAN

(Storage area network) to enable disk sharing. Expensive today.• Single SCSI command can READ: 10Mbytes from a remote

disk• iSCSI over TCP/IP: must implement TCP in hardware, add

iSCSI header with length.• SCSI messages can be issued out of order, but TCP does not

allow!!!

Case study: the protocol features affect greatly on application performance

Ning Weng

Page 11: ECE 526  Network Processing Systems Design

ECE 526 11

Models• Networking (protocols)• Hardware• Architecture• Operating systems

Ning Weng

Page 12: ECE 526  Network Processing Systems Design

ECE 526 12

Hardware• Combinatorial logic

─ From transistors to gates─ Timing and power─ High level building blocks (standard cells)

• Memories─ Registers, SRAM, DRAM─ Fancy memory tricks (page mode, interleaved)

Ning Weng

Page 13: ECE 526  Network Processing Systems Design

ECE 526 13

Combinatorial logic

• A function from digital inputs to outputs─ Wires, transistors, resistors, capacitors─ Gates: AND, NAND, NOT, etc. (60 ps)─ More complex blocks: adders, multiplexers

• Different designs for same Boolean function offer different trade-offs.

─ Time for the output to stabilize─ Number of transistors─ Power dissipation P=CV2F

Ning Weng

Page 14: ECE 526  Network Processing Systems Design

ECE 526 14

Priority encoder timing

• Input I and outputs O are N-bit vectorssuch that─ O[j] = 1 iff I[j] = 1 and I[k] = 0, for all k<j.─ Find first one, represent in one-hot notation

• Design 1: Implement using N AND gates where each gate takes all 1 to N inputs─ O(N2) appears to take O(1) time.

Ning Weng

Page 15: ECE 526  Network Processing Systems Design

ECE 526 15

Priority encoder timing• Design 2: Every output bit requires the AND of

compliment of first j-1 bits─ Construct recursively P[j] = P[j-1]. ~I[j]─ Using N 2-input AND gates in series─ O(N) transistors but takes O(N)

Summary: Design 1 fast and fat; Design 2 slow and lean

• Design 3: Tradeoff design.─ Idea: use balanced binary tree to compute P[j] and combine partial

results.

Ning Weng

Page 16: ECE 526  Network Processing Systems Design

ECE 526 16

Reduction using standard cells

• Build four input mux from 2 input muxes.Ning Weng

Page 17: ECE 526  Network Processing Systems Design

ECE 526 17

Crossbar scheduler• PPE: like a priority encoder but with rotating

priority.─ Find first 1 beyond P

• Arises in switch arbitration• Design 1:

─ Use a barrel shifter to shift I to the left by P bits.

─ Apply PE

─ Shift back by P bits.• Two shifts + PE ~ 3 log N gate delays.

─ Each barrel shift using tree of 2-muxes ~ log (N)

Ning Weng

Page 18: ECE 526  Network Processing Systems Design

ECE 526 18

Crossbar scheduler• Smarter scheme due to Gupta/McKeown• First check if any bit set at or

after P• If yes, apply PE on bits after P• If no, apply PE on bits before P• Used in Tiny Tera and Abrizio• Tested using TI libraries

2x fast and 3x less area For a 32 port router.

Ning Weng

Page 19: ECE 526  Network Processing Systems Design

ECE 526 19

Memories• Router forwarding done using logic but packets

stored in memories.• Memory are significant bottlenecks

─ Slower than logic delays

• Different subsystems require different types─ 200ms worth of packet buffering (DRAM)─ 1Mbyte of fast SRAM for forwarding tables

• Need different models

Ning Weng

Page 20: ECE 526  Network Processing Systems Design

ECE 526 20

Registers

• Need to store bit such that it stays indefinitely (absence of writes/power failures)

• Register is ordered collection of flip-flops• Access from logic to a register ~ 0.5-1ns

Ning Weng

Page 21: ECE 526  Network Processing Systems Design

ECE 526 21

SRAM

• N registers addressed by log N bits Addresses.• Self refreshing.• Needs a decoder to translate A to unary so

add decode delay.• On chip SRAM 1-2ns, Off-chip 5-10ns.

Ning Weng

Page 22: ECE 526  Network Processing Systems Design

ECE 526 22

DRAM

• SRAM flip-flop needs 5 transistors.• DRAM needs one transistor plus capacitor

(so more dense)• Leakage fixed by refresh.• Slower because output not driven by the

power supply as in SRAM• 40-60ns

Ning Weng

Page 23: ECE 526  Network Processing Systems Design

ECE 526 23

Memories

• Page mode DRAM (latency 40-60ns)• Direct decoding hard: use divide and conquer

Decode in stages, row first and column next Reduces decoder from O(N) to O(√N) gates

• Accesses within rows do not require Row address strobe.

Ning Weng

Page 24: ECE 526  Network Processing Systems Design

ECE 526 24

Interleaved DRAMs

• Increase DRAM throughput (same latency)• While Bank 1 works, send addresses to Bank 2, etc.• SDRAM uses 2 banks, RDRAM uses 16 banks.• If consecutive accesses to different banks, bandwidth

increased by a factor of B

Ning Weng

Page 25: ECE 526  Network Processing Systems Design

ECE 526 25

Example: Pipelined flow id lookup

• Lookup 1M 96-bit packet ID for accounting using binary tree in 128nsec (OC-48)

• SRAM infeasible. DRAM too slow (20 accesses)• Use interleaved memory + pipeline• Direct RDRAM runs at 800MHz, read access 60ns• 216 too small. So use RAMBUS page mode to retrieve two

96-bit keys + three 20bit pointers in one 256 bit access.Ning Weng

Page 26: ECE 526  Network Processing Systems Design

ECE 526 26

Pin count limitations matter• Example: Router with five 10GBps links• Overall buffering - 200ms * 50Gbps• Want to use DRAM

Bw required 2 * 50 = 100 Gbits/sec + overhead ~ 200 Gbps• Use single RDRAM with 16 banks

Peak bw of 1.6 Gbytes or 13Gbps• Accessing each RDRAM requires 64 pins for data, 25

for address/control ~90. 200 Gbps memory requires 16 RDRAMs ~ 1440 pins

• Chips have pin limitations (typically <1000pins)• Thus requires at least one more chip.

Ning Weng

Page 27: ECE 526  Network Processing Systems Design

ECE 526 27

Real world: chip design process

• Box architect partitions functions between chips.• Design teams write specifications, logic designers

write RTL using Verilog.• Synthesize gates to generate circuits.• Timing checks to change design.• Finally chip tapes out, manufactured, yield inspected.• Easy to add features before RTL, second spins are

expensive.

Ning Weng

Page 28: ECE 526  Network Processing Systems Design

ECE 526 28

Next Lecture• Networking (protocols)• Hardware

• Device architecture

• Operating systems

Ning Weng

Page 29: ECE 526  Network Processing Systems Design

Architecture Models• Optimizing network performance requires optimizing

all data paths.─ Through the internals of source node, sink node and routers

• End node architectures optimized for general computation.

• Router architectures for Internet communication.

Page 30: ECE 526  Network Processing Systems Design

End Node Architecture

• Main memory (1GB, 50-80nsec, L2 cache, on chip SRAM registers)

• Direct mapped caches Cache lines, spatial locality, page mode

• Memory mapped I/O, DMA, I/O bus versus processor bus.

Page 31: ECE 526  Network Processing Systems Design

Alternative endnode architecture

• Packets from network Mem2.• Processor Reads Mem1.• Switch can alternate the accesses• E.g., infiniband as a replacement for PCI

Page 32: ECE 526  Network Processing Systems Design

What a Router Looks Like

Page 33: ECE 526  Network Processing Systems Design

Router Architectures• Major tasks:

Lookup Classification Switching Queuing

• Less time-critical Header verification Checksum Route computation

Page 34: ECE 526  Network Processing Systems Design

Generic Router Architecture

Slide courtesy of Nick McKeown

Page 35: ECE 526  Network Processing Systems Design

Generic Router Architecture

Page 36: ECE 526  Network Processing Systems Design

Network Processors• General purposes processors optimized for network

traffic Motivated by unpredictable router tasks.

• Often just multiple processors that work on different packets at the same time. Intel IXP has 6 processors with 5.6ns clock.

• Use multithreading to quickly switch contexts IXP has 4 contexts.

• Special purpose instructions for address lookup and other functions.

Page 37: ECE 526  Network Processing Systems Design

Why NPUs seem like a good idea• What makes a CPU appealing for a PC.

Flexibility: Supports many applications Time to market: Allows quick introduction of

new applications. Future proof: Supports as-yet unthought of

applications• No-one would consider using fixed function ASICs for a PC

Page 38: ECE 526  Network Processing Systems Design

Why NPUs seem like a good idea• What makes a NPU appealing

Flexibility: Protocols and standards change. Time to market: Saves 18months building an ASIC. Code re-use. Future proof: New protocols emerge. Less risk: Bugs more easily fixed in s/w.

• Surely no-one would consider using fixed function ASICs for new networking equipment?

Page 39: ECE 526  Network Processing Systems Design

Network Processors: Load-balancing

Incoming packets dispatched to:

1. Idle processor, or

2. Processor dedicated to

packets in this flow (to

prevent mis-sequencing), or

3. Special-purpose processor

for flow, e.g. security,

transcoding, application-

level processing.

Page 40: ECE 526  Network Processing Systems Design

Network Processors: Pipelining

Processing broken down into (hopefully balanced) steps,Each processor performs one step of processing.

Page 41: ECE 526  Network Processing Systems Design

Models• Networking (protocols)• Hardware• Device architecture• Operating systems

Page 42: ECE 526  Network Processing Systems Design

Operating System• Routers usually have a lightweight OS

─ Different from traditional Oses• End-node performance dependent on OS• Abstractions

─ Uninterrupted Computation: No Interrupts─ Infinite Memory: just an illusion.─ Simple I/O: Avoid dealing directly with devices (simple writes/reads)

Page 43: ECE 526  Network Processing Systems Design

Uninterrupted communication

• Underlying mechanisms Context switching Scheduling Protection

• Flavors of “process” - increasing complexity Interrupt handlers, threads, processes

Page 44: ECE 526  Network Processing Systems Design

Receiver livelock in BSD

• Keep processing packets only to discard because apps never run.

• Latency depends on process Interrupts (10us), context switch (10-100us)

Page 45: ECE 526  Network Processing Systems Design

Infinite memory

• Via virtual memory Page mapping (avoids finding contiguous locations). Demand paging (use more space than memory)

• Slow DRAM lookup avoided with fast TLB• Protection by allowing only OS to modify page tables.

Page 46: ECE 526  Network Processing Systems Design

Simple I/O using system calls

• For abstraction alone, I/O could be libraries.• For security, I/O handled by device drivers.• System calls, trap to kernel protection levels.• More expensive function call, because of privilege

escalation.

Page 47: ECE 526  Network Processing Systems Design

Next class…• Implementation principles.• Read chapter 3 and 4.• Quiz1