block design review: lookup for ipv4 mr, lc ingress and lc egress

44
John DeHart [email protected] http://www.arl.wustl.edu/projects/techX Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

Upload: vladimir-dejesus

Post on 03-Jan-2016

52 views

Category:

Documents


6 download

DESCRIPTION

Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress. John DeHart [email protected] http://www.arl.wustl.edu/projects/techX. Revision History. 10/11/06 (JDD): Created 10/23/06 (JDD): Finished for presentation on 10/24/06 10/24/06 (JDD): Updates from comments during review. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

John [email protected]

http://www.arl.wustl.edu/projects/techX

Block Design Review:Lookup

forIPv4 MR, LC Ingress and LC Egress

Page 2: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

2 - John DeHart - 04/20/23

Revision History

10/11/06 (JDD):»Created

10/23/06 (JDD):»Finished for presentation on 10/24/06

10/24/06 (JDD):»Updates from comments during review.»Added more TCAM info»Added information on format of Database entry files

Page 3: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

3 - John DeHart - 04/20/23

Guidelines for Design Reviews Definition of interfaces In/Out Block diagram of module

» Including list of files where code for each block/module exists. Macros:

» List macros and files where they can be found» For each macro, provide a few lines of comments in the code that describes the

macro.» Document local and global registers used by macro.» Memory assumptions

What addresses are pre-defined, etc… Initialization of Memory Data Structures Control Blocks Details of memory accesses, xfer register usage, signal usage. Critical path Testing

» Develop a well defined acceptance test that convinces you that your block works» Document acceptance test

Pktgen “project” file? Known bugs Areas and suggestions for improvements.

Page 4: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

4 - John DeHart - 04/20/23

Contents

LookupRx TxQMParseHeaderFormat

SubstrDecap

LookupPhy IntRx

SwitchTxQM/SchdKey

ExtractHdr

Format

LookupKey

ExtractSwitch

RxPhy Int

TxQM/Schd

HdrFormat

SWITCH

Page 5: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

5 - John DeHart - 04/20/23

File locations Code

» src/applications/LC_Ingress/src/lookup/PL/lookup.uc» src/applications/LC_Egress/src/lookup/PL/lookup.uc» src/applications/IPv4_MR/src/lookup/PL/lookup.uc

Configuration and Database Entry Files:» src/applications/LC_Ingress/build/PL/LCI_config.txt

LC_Ingress_Database_64bKey_64bResult_BothQM.txt» src/applications/LC_Engress/build/PL/LCE_config.txt

LC_Egress_Database_24bKey_64bResult.txt» src/applications/IPv4_MR/build/PL/IPv4_config.txt

GM_Database_144Key_128bResult.txt IDT Includes

» src/IDT_NSE/data_plane_IXP2XXX/include/Iipc.uc Which then includes Iipc.h from same directory

IDT Simulation Library» Typical Installed location:

C:/IDT_NSE/simulation/windows/IDT75K234.dll» Repository location:

src/IDT_NSE/simulation/windows/IDT75K234.dll» Directions for adding simulation library to a simulation project:

Simulation menu: select Options Simulation Options window: select Foreign Model tab Foreign Model DLLs panel: click on New(Insert) icon Use locator to go to the Repository location listed above

Hit return after selecting dll file Add Instance information in bottom panel

Click in Instance Name box and enter: IDT75K234 Clink in Priority box and enter: 1 Click in Initialization String box and enter: IPv4_config.txt

Page 6: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

6 - John DeHart - 04/20/23

TCAM Documentation Docs are distributed sprinkled through the different installation

directories» We have gathered most of the important stuff here:

/project/techX/DataSheets/IDT» The following documents are located in the above directory

Datasheet: (Under non-disclosure)» 75K72234_datasheet.pdf

User Manual:» 75K72234_UserManual.pdf

Instruction Latency Application Note:» 75K72234_latency.pdf

SLAM: Simulation» IDT75K234SLAM_UsersManual.pdf

Dataplane Macros:» NSEDataPlaneMacroAPIGuide.pdf

IMS API:» IMS_API.pdf

Page 7: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

7 - John DeHart - 04/20/23

WU Macros LC Ingress:

» dl_nn_ring_init» dl_source_1ME_NN_4words» dl_sink_1ME_NN_4words

IPv4_MR:» dl_nn_ring_init» dl_source_1ME_NN_9words» dl_sink_1ME_NN_4words

LC Egress:» dl_nn_ring_init» dl_source_1ME_NN_4words» dl_sink_1ME_NN_5words

Diagnostics:» GetTimeStamp» CompareTimeStamps

Page 8: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

8 - John DeHart - 04/20/23

IDT Macros IipcStartTimestamp

» Does CAP read and write to set bit in MISC_CONTROL to start the timestamp counter. IipcFormContextFromCsrMeCtx

» Sets up the Context field for the TCAM command word based on the ME and context» 128 Contexts per LA-1 Interface

IipcMakeBase» Form the base address word for any instruction for this context» Address is 22 bit WORD address, covers 16 MByte address space

IipcMakeDirectInstruction» Form the command word for any of the 4 Direct instructions» Result of IipcMakeBase and IipcMakeDirectInstruction will be passed as the two address

parameters to sram[write]: sram[write, $w00, iipc_base_word, iipc_command_word, count]

IipcDelayUsingFutureCount(cycles)» Sets the Future Count register to this many cycles» Sets the Future Count Signal register» Ctx_arb on that signal

IipcSramRead» Performs and SRAM read until Done bit is set in result.» We don’t use this if any more. » We do the sram[read] ourselves now and check the done bit.

This allows us to more easily perform diagnostics and performance testing.

Page 9: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

9 - John DeHart - 04/20/23

Lookup Initialization and Control XScale utility to initialize NSE and Databases Control Plane and XScale mechanisms to read and write

TCAM entries while system is active.

Page 10: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

10 - John DeHart - 04/20/23

Lookup Miscellany Bugs: No known bugs Testing:

» Minimal testing done so far» Some simple functional tests to show distribution of packets across all output ports based on Key fields for

each of the three projects.» More complete test plan needed.

Still To Do:» Add information on how to configure Filters for Lookup engine.» Handle init_done signal from Rx» Turn on optimizer» Substrate only lookup for IPv4_MR GPENPE pkts» Add second database to IPv4 MR

DB1: GM/EM Database DB2: Route Lookup

» LD bit in Lookup Result» Clean up definition of DB Ids.» Consider making Lookup code one common file with #ifdef’s to differentiate» Consider removing #ifdef DONE_BIT_FIX code

Refers to a Done bit bug in the Dual Port QDR (which is what we have) I have not seen this bug mentioned anywhere else.

I have not witnessed any such bug and I have not enabled this code We’ll probably keep this code around, at least until we have done more thorough testing.

» Performance testing with both LC Ingress and LC Egress operating.» Performance testing with second IPv4 MR Database

Data Structures: None Performance

» Current analysis is with OPTIMIZER turned off! Turning it on should give immediate gains via branch and ctx_arb deferral slots.

» How does TCAM perform when both LC Ingress and LC Egress are operating?

Page 11: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

11 - John DeHart - 04/20/23

TCAM Entries in Simulation Four Parts to a TCAM Entry in simulation:

» dbindex Slot in database occupied by entry. Start at 0 Incremented by 1 for each entry Not dependent on size

» core What is matched against a provided key

» mask Indicates what part of the entry(core) has to match key supplied to give a hit

» data Results data

Configuration and Database Entry files»src/applications/LC_Ingress/build/PL/LCI_config.txt

LC_Ingress_Database_64bKey_64bResult_BothQM.txt

»src/applications/LC_Engress/build/PL/LCE_config.txt LC_Egress_Database_24bKey_64bResult.txt

»src/applications/IPv4_MR/build/PL/IPv4_config.txt GM_Database_144Key_128bResult.txt

Page 12: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

12 - John DeHart - 04/20/23

TCAM Entries in Simulation LC Ingress Database entry from file:

» src/applications/LC_Ingress/build/PL/ LC_Ingress_Database_64bKey_64bResult_BothQM.txt

{ dbindex 0x0;

core 0x51C0A80002110001; # SL Type: 0x5 # Port: 1 # IP DA=192.168.0.2 # IP Proto: 17 (UDP) # UDP DPort: 0x0001

# Exact Match everything, except wildcard Port mask 0xf0ffffffffffffff;

data 0x0001004A01100001; # VLAN(16b)=0x0001 # Stats_Index(16b)=74(0x4A) # DA=0x01 # Port=1 # QID=1}

Page 13: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

13 - John DeHart - 04/20/23

TCAM Entries in Simulation IPv4 MR Database entry from file:

» src/applications/IPv4_MR/build/PL/GM_Database_144Key_128bResult.txt{ dbindex 0x0; core 0x0AAA0002C0A84001C0A82002000100020011; # MR ID (VLAN) = 0x0AAA # UDP DPort=0x0002 # IP DA=192.168.64.1 # IP SA=192.168.32.02 # TCP/UDP SPort=0x0001 # TCP/UDP DPort=0x0002, # TCP_FLAGS_Proto=0x0011 (Proto=UDP, no TCP Flags) mask 0xffffffffffffffffffffffffffffffffffff; # Exact match everything data 0x0000003780FC99F95555666601000001; # Reserved(3b), Drop Bit(1b) # Reserved(12b) # Cntr_Index(16b)=55(0x37), # Tx IP DAddr=128.252.153.249, # Tx UDP Dport=0x5555 # Tx UDP SPort=0x6666 # DA=0x01, # Port=0 # QID=1}

Page 14: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

14 - John DeHart - 04/20/23

TCAM Entries in Simulation LC Egress Database entry from file:

» src/applications/LC_Egress/build/PL/LC_Egress_Database_24bKey_64bResult.txt

{ dbindex 0x0;

core 0x11000100; # IP Proto (8b) = 0x11 (UDP) # UDP SPort (16b) = 1 # Rsvd(8b) = 0

mask 0xffffffff; # Exact Match.

data 0x000101000021; # Rsvd(4b) = 0 # VLAN(12b)=0x001 # Rsvd(4b)=0 # Port(4b)=1 # Rsvd(4b) # QID(20b)=33 (0x00021) }

Page 15: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

15 - John DeHart - 04/20/23

Basics of TCAM Operation Instruction is given to TCAM as an sram write:

» Address bus gives instruction 4 Direct Instructions:

Lookup: This is all we use right now. MultiHit Lookup (MHL) or Simultaneous Multi-Database Lookup

– Which one is determined by a bit in a config register Preload Indirect: Uses data field to specify subinstruction

» Data bus gives: Subinstruction for Indirect instructions (There are 16 subinstructions) Data for all instructions

Our lookup keys go here.» Example: IPv4 MR Lookup (Key of 144 bits in 5 words):

Load xfer registers $w00, $w01, $w02, $w03, $w04 with the lookup key sram[ write, $w00, iipc_base_word, iipc_command_word, 5 ] More about iipc_base_word and iipc_command_word later 5: number of data words needed for key

Result is read back from Context’s Results Mailbox» This is an SRAM read, not a TCAM Read instruction.» Example: IPv4_MR Lookup result of 4 words:

sram[read, $r00, iipc_base_word, 0, 4]» Result is valid only when the high order bit of the first word in the mailbox is

set. So, multiple reads may be necessary We can predict the latency of the TCAM instruction

More about this later when we look at the macros used.

Page 16: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

16 - John DeHart - 04/20/23

LC Ingress Lookup

Main functions:»Perform TCAM Lookup»Pass Through Data:

Buf Handle IP Pkt Length and Ethernet Header Length

Single code path with possible loop around Result Read NN communication Uses 8 threads

LookupPhy IntRx

SwitchTxQM/SchdKey

ExtractHdr

Format

SWITCH

Page 17: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

17 - John DeHart - 04/20/23

LC Ingress: Lookup Block Interfaces

LookupPhy IntRx

SwitchTxQM/SchdKey

ExtractHdr

Format

SWITCHLookup Key[63-32] (32b)

Buf Handle(32b)IP Pkt

Length (16b)Reserved

(8b)

Lookup Key[ 31-0] (32b)

Buf Handle(32b)IP Pkt

Length (16b)

QID (20b)

VLAN (16b) Stats Index (16b)

DAddr(8b)

Port(4b)

Eth HdrLen (8b)

Reserved(8b)

Eth HdrLen (8b)

D_Addr[31:8] (24b)D_Addr[7:0]

(8b)

SL(4b)

Port(4b)

UDP DPort(16b)

Protocol(8b)

Lookup Key:

QID (20b)

VLAN (16b) Stats Index (16b)

DAddr(8b)

Port(4b)

Lookup Result:

Rsvd(4b)

Rsvd(4b)

Page 18: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

18 - John DeHart - 04/20/23

LC Ingress Lookup Block Diagram

Load Xfer Regs

Send Lookup Request

TimeStamp Delay

Read Result

Reformat OutputWait for prev ctx

Signal next ctx

NN Enqueue (4W)

Wait for prev ctx

Signal next ctx

NN Dequeue (4W)

init

signal

dl_sink()

dl_source()

SRAM Write: 2W

SRAM Read: 2W

mem access

Check Done Bit

ctx_swap

ctx_swap

15 cycles + 2 abort cycles

7 cycles + 2 abort cycles

1 cycles + 2 abort cycles

5 cycles + 0 abort cycles

12 cycles + 8 abort cycles

1 cycles +

2 abort cycles

Totals:

41 processing cycles

16 abort cycles

Page 19: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

19 - John DeHart - 04/20/23

IPv4 MR Lookup

LookupRx TxQMParseHeaderFormat

SubstrDecap

Main functions:»Perform TCAM Lookup»Pass Through Data:

Buf Handle IP Pkt Length and Offset Slice Data Ptr Exception Bits

Single code path with possible loop around Result Read NN communication Uses 8 threads

Page 20: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

20 - John DeHart - 04/20/23

IPv4 MR Lookup Block Interfaces

LookupDeMuxRx TxQMParseHeaderFormat

Lookup Key[111-80] DA (32b)

Buf Handle(32b)

IP Pkt Length (16b)IP Pkt Offset (16b)

Lookup Key[ 79-48] SA (32b)

Lookup Key[ 47-16] Ports (32b)Lookup Key

Proto/TCP_Flags[15- 0] (16b)

ExceptionBits (12b)

Lookup Key[143-112] Slice ID/Rx UDP DPort (32b)

LFlags(4b) Port

(4b)QID(20b)DA(8b)

Tx IP DAddr (32b)

Buf Handle(32b)

IP Pkt Length (16b)IP Pkt Offset (16b)

Cntr Index (16b)RSVd

(1b)

D(1b)

H(1b)

ExceptionBits (12b)

LD

(1b)

Rx UDP DPort(16b)Slice ID (VLAN) (16b)

Tx UDP SPort(16b)Tx UDP DPort (16b)

Slice Data Ptr (32b)Slice Data Ptr (32b)

Reserved (28b)

Code(4b) Reserved

(28b)Code(4b)

IP DAddr (32b)IP SAddr (32b)

SPort (16b)

Slice ID/Rx UDP DPort (32b)

Lookup Key (144b):

DPort (16b)Proto/TCP_Flags(16b)

Page 21: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

21 - John DeHart - 04/20/23

IPv4 MR Functional Block Results

As given to HF

Lookup Result (128b):

Stored in TCAM

Lookup Result (128b):

Port(4b)

QID(20b)DA(8b)

Tx IP DAddr (32b)

Cntr Index (16b)D1b

Reserved (11b)

Tx UDP SPort(16b)Tx UDP DPort (16b)

DONe1b

HIt

1b

MHIt

1b

Port(4b)

QID(20b)DA(8b)

Tx IP DAddr (32b)

Cntr Index (16b)D(1b)

Exception Bits (12b)

Tx UDP SPort(16b)Tx UDP DPort (16b)

RSVd

(1b)

HIt

(1b)

LD

(1b)

TC

AM

Sta

tus

Bit

s

LD1b

Lookup Key (144b):

IP DAddr (32b)IP SAddr (32b)

SPort (16b)

Slice ID/Rx UDP DPort (32b)

DPort (16b)Proto/TCP_Flags(16b)

Page 22: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

22 - John DeHart - 04/20/23

IPv4 MR Lookup Block Diagram

Load Xfer Regs

Send Lookup Request

TimeStamp Delay

Read Result

Reformat OutputWait for prev ctx

Signal next ctx

NN Enqueue (9W)

Wait for prev ctx

Signal next ctx

NN Dequeue (9W)

init

signal

dl_sink()

dl_source()

SRAM Write: 5W

SRAM Read: 4W

mem access

Check Done Bit

ctx_swap

ctx_swap

25 cycles + 2 abort cycles

7 cycles + 2 abort cycles

1 cycles + 2 abort cycles

5 cycles + 0 abort cycles

17 cycles + 8 abort cycles

2 cycles +

2 abort cycles

Totals:

57 processing cycles

16 abort cycles

Page 23: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

23 - John DeHart - 04/20/23

LC Egress Lookup

Main functions:»Perform TCAM Lookup»Pass Through Data:

Buf Handle IP Pkt Length and Ethernet Header Length IP Destination Address

Single code path with possible loop around Result Read NN communication Uses 8 threads

LookupKey

ExtractSwitch

RxPhy Int

TxQM/Schd

HdrFormat

SWITCH

Page 24: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

24 - John DeHart - 04/20/23

LC Egress: Lookup Block Interfaces

LookupKey

ExtractSwitch

RxPhy Int

Tx QM/SchdHdr

Format

SWITCH

Buf Handle(32b)

IP DAddr (32b)

Buf Handle(32b)

IP DAddr (32b)

Lookup Result [63-32] (32b)

Lookup Result [31-0] (32b)

IP PktLength (16b)

Reserved(8b)

Eth HdrLen (8b)

IP PktLength (16b)

Reserved(8b)

Eth HdrLen (8b)

Lookup Key – UDP SPort (16b)

Lookup KeyIP Proto

(8b)

Reserved (8b)

Lookup Key:Lookup Result:

UDP SPort(16b)

Protocol(8b)

Reserved(8b)

QID (20b)

VLAN (12b)Stats Index (16b)

Port(4b)

Rsvd(4b)

Rsvd(4b)

Rsvd(4b)

Page 25: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

25 - John DeHart - 04/20/23

LC Egress Lookup Block Diagram

Load Xfer Regs

Send Lookup Request

TimeStamp Delay

Read Result

Reformat OutputWait for prev ctx

Signal next ctx

NN Enqueue (5W)

Wait for prev ctx

Signal next ctx

NN Dequeue (4W)

init

signal

dl_sink()

dl_source()

SRAM Write: 1W

SRAM Read: 2W

mem access

Check Done Bit

ctx_swap

ctx_swap

14 cycles + 2 abort cycles

7 cycles + 2 abort cycles

1 cycles + 2 abort cycles

5 cycles + 0 abort cycles

13 cycles + 8 abort cycles

3 cycles +

2 abort cycles

Totals:

43 processing cycles

16 abort cycles

Page 26: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

Performance

Page 27: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

27 - John DeHart - 04/20/23

Packet Sizes

Ethernet VLAN Header 18B

Substrate Header

IPv4 Header 20B

UDP Header 8B

Metanet Frame

GPE to MPE n

IPv4 Header 20B

UDP Header 8B

Payload n

Ethernet Pad 0

Ethernet FCS 4B

Total 78B + internal + payload

Ethernet IFS 12B

Total Physical 90B + internal + payload

Page 28: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

28 - John DeHart - 04/20/23

Cycle Budget (min eth packets) To hit 5 Gb rate:

» 76B per min IPv4 packet (64 min Eth + 12B IFS)» 1.4Ghz clock rate» 5 Gb/sec * 1B/8b * packet/76B = 8.22 Mp/sec» 1.4Gcycle/sec * 1 sec/ 8.22 Mp = 170.3 cycles per packet» compute budget: 170 cycles» latency budget: (threads*170)

8 threads: 1360 cycles To hit 10 Gb rate:

» 76B per min IPv4 packet (64 min Eth + 12B IFS)» 1.4Ghz clock rate» 10 Gb/sec * 1B/8b * packet/76B = 16.44 Mp/sec» 1.4Gcycle/sec * 1 sec/ 16.44 Mp = 85.16 cycles per packet» compute budget: 85 cycles» latency budget: (threads*85)

8 threads: 680 cycles

Page 29: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

29 - John DeHart - 04/20/23

Cycle Budget (IPv4 MN packets) To hit 5 Gb rate:

» 90B per min IPv4 packet (78 min IPv4MN + 12B IFS)» 1.4Ghz clock rate» 5 Gb/sec * 1B/8b * packet/90B = 6.94 Mp/sec» 1.4Gcycle/sec * 1 sec/ 6.94 Mp = 201.7 cycles per packet» compute budget: 201 cycles» latency budget: (threads*201)

8 threads: 1608 cycles To hit 10 Gb rate:

» 90B per min IPv4 packet (78 min IPv4MN + 12B IFS)» 1.4Ghz clock rate» 10 Gb/sec * 1B/8b * packet/90B = 13.88 Mp/sec» 1.4Gcycle/sec * 1 sec/ 13.88 Mp = 100.86 cycles per packet» compute budget: 100 cycles» latency budget: (threads*100)

8 threads: 800 cycles

Page 30: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

30 - John DeHart - 04/20/23

TCAM Instruction Latency Analysis QDR Clock: 200 MHz, 5ns period TCAM core Clock: 200 MHz, 5ns period NPU Clock: 1400 MHz, 0.714 ns period

» 1 QDR cycle == 1 TCAM cycle == 7 NPU cycles TCAM Lookup Latencies:

» QDR xfer: 1 cycle per word in key» Instruction Fifo: constant 2 cycles» Synchronizer: constant 3 cycles» Execution Latency: fct(key width, output data width)

Table in IDT Latency Application Note» Re-Synchronizer: constant 1 cycle

Page 31: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

31 - John DeHart - 04/20/23

TCAM Instruction Latency Analysis IPv4 MR

» Key: 144 bit (5 words)» Output data: 128 bit» QDR Xfer: 5 cycles» Constants: 2 + 3 + 1 = 6 cycles» Execution Latency: 36 cycles» Total Latency: 47 TCAM cycles (235 ns) (329 NPU cycles)

LC Ingress» Key: 64 bit (2 words)» Output data: 64 bit» QDR Xfer: 2 cycles» Constants: 2 + 3 + 1 = 6 cycles» Execution Latency: 32 cycles» Total Latency: 40 TCAM cycles (200 ns) (280 NPU cycles)

LC Egress» Key: 24 bit (1 words)» Output data: 64 bit» QDR Xfer: 1 cycles» Constants: 2 + 3 + 1 = 6 cycles» Execution Latency: 34 cycles» Total Latency: 41 TCAM cycles (195 ns) (273 NPU cycles)

Page 32: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

32 - John DeHart - 04/20/23

TCAM Performance (Rates in M/sec)

LC_Egress

LC_Ingress

IPv4 MR

Page 33: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

33 - John DeHart - 04/20/23

TCAM Performance (Rates in M/sec)Lookup Size #LA-1 Words Core Size Assoc. Data Single LA-1

Max RateMax Core

RateAvg Shared Rate (Each of 2 LA-1s)

32 1 36 32 50 50 25

64 50 25

128 25 12.5

36 2 36 32 50 50 25

64 50 25

128 25 12.5

64 2 72 32 100 100 50

64 50 25

128 25 12.5

72 3 72 32 67 100 50

64 50 25

128 25 12.5

128 4 144 32 50 100 50

64 50 25

128 25 12.5

144 5 144 32 40 100 40

64 50 25

128 25 12.5

160 5 288 32 40 50 40

64 50 25

128 25 12.5

LC_Ingress

LC_Egress

IPv4 MR

Page 34: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

34 - John DeHart - 04/20/23

IPv4: Performance Snapshot

~610 Cycles

dl_

sou

rce

& X

fer

reg

lo

ads sram

write

Tim

esta

mp

Del

ay s

etu

p

Timestamp Delaysram read

dl_

sin

k ct

x_ar

b

IPv4 MR lookup» Unloaded

dl_

sin

k p

roce

ssin

g

Ctx_arb vs br_signal

optimization

Page 35: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

35 - John DeHart - 04/20/23

IPv4: Performance Snapshot

IPv4 MR lookup» Hack to Parse: loop and repeatedly call dl_sink with same buf_handle

Should guarantee that there is always something in NN ring for lookup to pick up

» Hack to HF : set dlNextBlock to IX_DROP Keep Tx from trying to transmit something bad.

34016– 33333= 683 Cycles

Write issued

At 33333

Write issued

At 34016

Page 36: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

36 - John DeHart - 04/20/23

LC_Ingress: Performance Snapshots

>=563 Cycles

LC Ingress lookup» unloaded

Page 37: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

37 - John DeHart - 04/20/23

LC_Ingress: Performance Snapshots

60494 – 59888 = 606 Cycles

LC Ingress lookup» Hack to KE stub: loop and repeatedly call dl_sink with same buf_handle

Should guarantee that there is always something in NN ring for lookup to pick up

» Hack to HF stub: set dl_next_block to IX_DROP Keep Tx from trying to transmit something bad.

Write issued

At 59888

Write issued

At 60494

Page 38: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

38 - John DeHart - 04/20/23

LC_Egress: Performance Snapshots

~560 Cycles

LC Egress lookup» Unloaded

Page 39: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

39 - John DeHart - 04/20/23

LC_Egress: Performance Snapshots

LC Egress lookup» Loaded with KE and HF hacks.

~610 Cycles

Page 40: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

40 - John DeHart - 04/20/23

Performance Summary Processing Cycles:

»LC Ingress:41»IPv4 MR: 57»LC Egress:43

Abort Cycles:»LC Ingress:16»IPv4 MR: 16»LC Egress:16

Latency Cycles:»LC Ingress: 560 – 57 = 503?»IPv4 MR: 610 – 73 = 537?»LC Egress: 560 – 59 = 501?

Expected performance»LC Ingress: 10Gb/s »IPv4 MR: 5Gb/s +»LC Egress: 10Gb/s

Page 41: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

41 - John DeHart - 04/20/23

Optimizations Possibilities May still be some code we can move out of processing

loop or at least between sram write or read and the ctx swap.

dl_sink has a possible improvement.» ctx_arb vs. br_signal/br_!signal

Page 42: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

42 - John DeHart - 04/20/23

Extra Slides

Page 43: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

43 - John DeHart - 04/20/23

Image Slide Template

Page 44: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

44 - John DeHart - 04/20/23

Text Slide Template