broadcom bcm5600 strataswitch - www …cs294-48/fa09/examples/yunsup-bcm56… · hot chips 8/17/99...
TRANSCRIPT
®
Hot Chips 8/17/99
Broadcom BCM5600StrataSwitch
Broadcom BCM5600StrataSwitch
A Highly Integrated Ethernet SwitchOn A Chip
Andrew Essen and James Mannos
Broadcom Corporation
®
Hot Chips 8/17/99
OutlineOutline
• Introduction
• Networking Basics
• Description of BCM5600
• Design Process
• Vital Statistics
• Conclusion
®
Hot Chips 8/17/99
BackgroundBackground
• A single modern processor can now saturate a LANwhich used to be easily shared by multiple users.
• Multimedia applications, especially those with real-time requirements, place a huge strain on bandwidth
®
Hot Chips 8/17/99
New Technology to the RescueNew Technology to the Rescue
• New Network Architectures - replace shared mediaand hubs with dedicated media and switches
• Provision for Increased Bandwidth - replace 10Mb/swith 100Mb/s (FE) and 1000Mb/s (Gig)
• New Protocols - L4 to L7 filtering, Class of Service(802.1p), Virtual LANS (802.1Q)
• Advancing Semiconductor Technology - DSM (< .25µ),System-on-chip architectures, and large amounts ofintegrated memory.
®
Hot Chips 8/17/99
Project GoalsProject Goals
Features Benefits
Swit ch On Chip - Lowest Cos t/Por t- Faster System Developmen t- Modules can be easily “ bolt ed” on
2 4 1 0/ 1 00 & 2 Gig Port sNon-blocking
- High Capaci ty- High Performance
Non-blocking L2/ L 3 Swit ch- Line speed swit ching and rout ing
Line speed L4-L7 f il tering. - Fast Filt er Processor & Flexible Rules Engine- Conten t Aware t raff ic classif ica tion based on any
combination of designat ed f ields.Advanced Fea tures - 4-levels o f COS (Class o f Service) .
- Support of Virtual LANs, Trunking, Flow Control,Mirroring, Tagging & Spanning Tree
- Elimination of Head o f Line Blocking- Broadcast/Mult icast
On board SRAM caches - Allows use of low-cost, ext ernal SDRAMPCI int er face - Flexible connect ivity and expandabili ty.
®
Hot Chips 8/17/99
What Makes a SOC so Tough?What Makes a SOC so Tough?
• Network traffic is chaotic and asynchronous
• Can have large differential flows of data
• Behavior often appears non-deterministic because ofpacket drop
• Differing packet sizes & formats along with a hugeamount of state - very difficult to verify
• 60M transistors, 1 MB SRAM, 4M+ gates.
®
Hot Chips 8/17/99
Traditional NetworkTraditional Network
Each L2 broadcast domain is connected by a router
(Before Switched LAN’s)
Layer 2 Domain
Another Layer 2 Domain
Routeroperates at L3
®
Hot Chips 8/17/99
Layer 2 SwitchingLayer 2 Switching
• Appears to end stations as though everybody is stillon a single shared media
• Lookup destination MAC address and forward packetto correct port
• Numerous packets can be “on the wire”simultaneously– Up to 52 with the BCM5600
• 26 full duplex ports
– Versus 1 with a traditional shared Ethernet
®
Hot Chips 8/17/99
Layer 3 SwitchingLayer 3 Switching
• End stations must be kept aware of routingconfiguration
• L3 Switching is performed when the L2 destination isthe router itself– Lookup IP destination in routing table
– Update L2 addresses– Decrement TTL– Forward packet to correct port
®
Hot Chips 8/17/99
Layer 4-7 ProcessingLayer 4-7 Processing
• Pattern matches result in actions– Kind of like a simple awk script– Actions include drop, change priority, change
destination, send to CPU
• Applications include:– Detect traffic type based on TCP port and
reprioritize– Support VoIP and multimedia applications such
as real-time video– Simple firewall by examining IP addresses
®
Hot Chips 8/17/99
StrataSwitch 24+2 SwitchStrataSwitch 24+2 Switch
Magnetics
LEDs
Octal PHY
Fiber Xcvror
RJ45+Magnetics
SERDESor
GMII Phy
RJ45 x 24Fiber Xcvr
orRJ45+Magnetics
SERDESor
GMII Phy
Magnetics
LEDs
Octal PHY
Magnetics
LEDs
Octal PHY
CPU
ORION SA
CMIC
EPIC EPIC EPIC GPIC
PMMU
GPIC
RMII
GMII
8B/1
0B128 bits/125 MHz
I2C Bus 100/400 KHz (very low bandwidth path)
PCIBridge
System Bus
SDRAM FlashSerialPort
PCI B
us32
bit
33/6
6 M
Hz
ORIONSDRAM
RMII
RMII
GMII 8B/10B
OptionalI2C
Components
Additional Silicon Required•3-5218•2-5400•Integrated Microcontroller•4-64 Mbytes SDRAM
®
Hot Chips 8/17/99
Simplified Block DiagramSimplified Block Diagram
All packets travel as cells along a very widecentral bus
Ingress0 Egress0
Ingress25 Egress25
Ingress1 Egress1
MMU + SRAM
PCI Controller
... ...
®
Hot Chips 8/17/99
Ingress DetailsIngress Details
Data Buffers
L2 Logic
L2 Tables
L3 Logic
L3 Tables
Filter Logic
Filter Tables
Internal Bus
MAC
8k entries 2k entries
128 entries
L2/L3 determines destination port(s)
L4-L7 processingPriority assignmentCell slicingDrop decision
®
Hot Chips 8/17/99
MMU DetailsMMU Details
SRAMPacket Buffer
Packet Queues
SDRAM Interface
4 - 64 Mbytes of SDRAM(2 Gbytes/s peak bandwidth)
Memory Allocation & Packet Storage Control
Packet Read
Control &
Memory Dealloc
Internal Bus
27 concurrent packetwrites, backpressurestatus, write buffer mgt
27 concurrent packetschedules, DRAM toSRAM prefetches, SRAMto Egress Reads
®
Hot Chips 8/17/99
Egress DetailsEgress Details
Data Buffers
Internal Bus
MAC
Packet reassembly & transmit
control
MMU requests
®
Hot Chips 8/17/99
PCI DetailsPCI Details
• 32 bit/ 66 MHz operation
• Also behaves as port 27– Contains mini- ingress & egress
– Allows easy processing of unusual packets• E.g., routing non- IP packets
®
Hot Chips 8/17/99
SimulationSimulation
• “Cycle accurate” C++ model
• Validate architecture– First implementation of initial concepts
– Try out various algorithms
– Find bottlenecks
• Optimize queue and buffer sizes– With regards to finite real estate
®
Hot Chips 8/17/99
Simulation ResultsSimulation Results
Wire speed operation with various packet sizes
pkt_64pkt_65
pkt_128pkt_129
pkt_256pkt_257
pkt_512pkt_513
pkt_1024pkt_1025
pkt_1500
Packet Size
0
0.2
0.4
0.6
0.8
1
1.2
Fo
rward
ing
Rate
24+2 Streaming to 24+2
®
Hot Chips 8/17/99
More ResultsMore Results
pkt_64pkt_65
pkt_128pkt_129
pkt_256pkt_257
pkt_512pkt_513
pkt_1024pkt_1025
pkt_1500
Packet Size
0
0.2
0.4
0.6
0.8
1
1.2
Fo
rward
ing
Rate
2 Gig Spraying to 20 Fast
pkt_64pkt_65
pkt_128pkt_129
pkt_256pkt_257
pkt_512pkt_513
pkt_1024pkt_1025
pkt_1500
Packet Size
00.1
0.20.3
0.4
0.50.6
0.7
Fo
rward
ing
Rate
24+2 Spraying to 24+2
Wire speed
Fast ports areoverloaded
®
Hot Chips 8/17/99
Simulated Packet FlowSimulated Packet Flow
Timeline of individual packets in simulation
®
Hot Chips 8/17/99
VerificationVerification
• Verification is complicated by…
– Multithreaded, asynchronous nature
– Long operations• 160k cycles for 1500B packet at 10 mbits
– Long time to steady state• Filling 64 Mbytes takes a while at 100 Hz
– Dropped or redirected packets may be correct
• Multiple solutions
– Extensive unit testing
– Full chip testing with automated checking
– Emulation
®
Hot Chips 8/17/99
CheckerChecker
Ingress
Ingress
Ingress
MMU
Egress
Egress
Egress
RTL/gates
PLIVerilog/C
Checker C++
Real-time pass/fail
®
Hot Chips 8/17/99
EmulationEmulation
• 50 KHz is a lot better than 50 Hz!– Allows software development
– Millions of packets per day
– Test environment is similar to that for silicon debug
Emulator
SmartBitstraffic
generator / analyzer
Host PC(w/RTOS)
Workstation
MII
PCI
MII
®
Hot Chips 8/17/99
Design TimelineDesign Timeline
6/98 - 7/99
6/98 Architecture Spec
8/98 Simulator Complete
10/98 RTL Integration
1/99 RTL Feature Complete
2/99 Begin Quickturn
5/99 Quickturn Stable
6/99 Tape-out
7/99 Switching Packets!
®
Hot Chips 8/17/99
Vital StatisticsVital Statistics
• 24+2 ports
• 60 million transistors
• 1MB of embeddedSRAM
• 133 MHz
• 0.25u 5 metal CMOS
• 2.5V core, 3.3V I/O
• 600 ball TBGA package
(Die Photo)
®
Hot Chips 8/17/99
SummarySummary
• First integration of 24FE + 2GE, line-speed, L2-L7switch on a single chip
• Enables convergence of voice, video, and data to thedesktop.
• First pass silicon in less than 12 months, read PCI IDin minutes, switching packets the next day!
• Good flow from solid spec through emulation wascrucial to the success.