router architecture. december 21, 2015soc architecture2 network-on-chip information in the form of...

53
Router Architecture

Upload: kory-terry

Post on 21-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

Router Architecture

Page 2: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 2

Network-on-Chip

Information in the form of packets is routed via channels and switches from one terminal node to another

The interface between the interconnection network and the terminals (client) is called network interface

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

Page 3: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 3

Router Architecture

The discussion concentrates on a typical virtual-channel router

Modern routers are pipelined and work at the flit level

Head flits proceed through buffer stages that perform routing and virtual channel allocation

All flits pass through switch allocation and switch traversal stages

Most routers use credits to allocate buffer space

Page 4: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 4

A typical virtual channel router A routers functional blocks

can be divided into Datapath: handles storage

and movement of a packets payload Input buffers Switch Output buffers

Control Plane: coordinating the movements of the packets through the resources of the datapath Route Computation VC Allocator Switch Allocator

Page 5: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 5

A typical virtual channel router

Routing

VC Allocation

Output Port Allocation

Switch Allocation

VC Deallocation

Switching

Page 6: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 6

A typical virtual channel router

The input unit contains a set of flit

buffers Maintains the state for

each virtual channel G = Global State R = Route O = Output VC P = Pointers C = Credits

Routing

VC Allocation

Output Port Allocation

Switch Allocation

VC Deallocation

Switching

Page 7: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 7

Virtual channel state fields (Input)

Routing

VC Allocation

Output Port Allocation

Switch Allocation

VC Deallocation

Switching

Page 8: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 8

A typical virtual channel router

During route computation the output port for the packet is determined

Then the packet requests an output virtual channel from the virtual-channel allocator

Routing

VC Allocation

Output Port Allocation

Switch Allocation

VC Deallocation

Switching

Page 9: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 9

A typical virtual channel router

Flits are forwarded via the virtual channel by allocating a time slot on the switch and output channel using the switch allocator

Flits are forwarded to the appropriate output during this time slot

The output unit forwards the flits to the next router in the packet’s path

Routing

VC Allocation

Output Port Allocation

Switch Allocation

VC Deallocation

Switching

Page 10: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 10

Virtual channel state fields(Output)

Page 11: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 11

Packet Rate and Flit Rate

The control of the router operates at two distinct frequencies Packet Rate (performed once per packet)

Route computation Virtual-channel allocation

Flit Rate (performed once per flit) Switch allocation Pointer and credit count update

Page 12: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 12

The Router Pipeline

A typical router pipeline includes the following stages RC (Routing

Computation) VC (Virtual Channel

Allocation) SA (Switch Allocation) ST (Switch Traversal)

no pipeline stalls

Routing

VC Allocation

Output Port Allocation

Switch Allocation

VC Deallocation

Switching

Page 13: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 13

The Router Pipeline

Cycle 0 Head flit arrives and the

packet is directed to an virtual channel of the input port (G = I)

no pipeline stalls

Page 14: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 14

The Router Pipeline

Cycle 1 Routing computation Virtual channel state

changes to routing (G = R)

Head flit enters RC-stage First body flit arrives at

router

no pipeline stalls

Page 15: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 15

The Router Pipeline

Cycle 2: Virtual Channel Allocation Route field (R) of virtual

channel is updated Virtual channel state is

set to “waiting for output virtual channel” (G = V)

Head flit enters VA state First body flit enters RC

stage Second body flit arrives

at router

no pipeline stalls

Page 16: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 16

The Router Pipeline

Cycle 2: Virtual Channel Allocation The result of the routing

computation is input to the virtual channel allocator

If successful, the allocator assigns a single output virtual channel

The state of the virtual channel is set to active (G = A)

no pipeline stalls

Page 17: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 17

The Router Pipeline

Cycle 3: Switch Allocation All further processing is

done on a flit base Head flit enters SA stage Any active VA (G = A) that

contains buffered flits (indicated by P) and has downstream buffers available (C > 0) bids for a single-flit time slot through the switch from its input VC to the output VC

no pipeline stalls

Page 18: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 18

The Router Pipeline

Cycle 3: Switch Allocation If successful, pointer field

is updated Credit field is

decremented

no pipeline stalls

Page 19: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 19

The Router Pipeline

Cycle 4: Switch Traversal Head flit traverses the

switch

Cycle 5: Head flit starts traversing

the channel to the next router

no pipeline stalls

Page 20: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 20

The Router Pipeline

Cycle 7: Tail traverses the switch Output VC set to idle Input VC set to idle (G =

I), if buffer is empty Input VC set to routing

(G = R), if another head flit is in the buffer

no pipeline stalls

Page 21: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 21

The Router Pipeline

Only the head flits enter the RC and VC stages

The body and tail flits are stored in the flit buffers until they can enter the SA stage

no pipeline stalls

Page 22: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 22

Pipeline Stalls

Pipeline stalls can be divided into Packet stalls

can occur if the virtual channel cannot advance to its R, V, or A state

Flit stalls If a virtual channel is in active state and the flit cannot

successfully complete switch allocation due to Lack of flit Lack of credit Losing arbitration for the switch time slot

Page 23: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 23

Example for Packet Stall

Virtual-channel allocation stall

Head flit of A can first enter the VA stage when the tail flit of packet B completes switch allocation and releases the virtual channel

Page 24: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 24

Example for Flit Stalls

Switch allocation stall

Second body flit fails to allocate the requested connection in cycle 5

Page 25: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 25

Example for Flit Stalls

Buffer empty stall

Body flit 2 is delayed three cycles. However, since it does not have to enter the RC and VA stage the output is only delayed one cycle!

Page 26: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 26

Credits

A buffer is allocated in the SA stage on the upstream (transmitting) node

To reuse the buffer, a credit is returned over a reverse channel after the same flit departs the SA stage of the downstream (receiving) node

When the credit reaches the input unit of the upstream node the buffer is available can be reused

Page 27: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 27

Credits

The credit loop can be viewed by means of a token that Starting at the SA stage of

the upstream node Traveling downwards with

the flit Reaching the SA stage at

the downstream node Returning upstream as a

credit

Page 28: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 28

Credit Loop Latency

The credit loop latency tcrt, expressed in flit times, gives a lower bound on the number of flit buffers needed on the upstream size for the channel to operate with full bandwidth

tcrt in flit times is given bytcrt = tf + tc + 2Tw + 1

Flit pipeline delay

Credit pipeline delay

One-way wire delay

Page 29: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 29

Credit Loop Latency

If the number of buffers available per virtual channel is F, the duty factor of the channel will be

d = min (1, F/ tcrt)

The duty factor will be 100% as long as there are sufficient flit buffers to cover the round trip latency

Page 30: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 30

Credit Stall

White: upstream pipeline stages Grey: downstream pipeline stages

tf = 4tc = 2Tw = 2=>tcrt = 11

Virtual Channel Router with 4 flit buffers

tf tftftf tc tcTW TWTWTW

Credit Transmit

Credit Update

tcrt

Page 31: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 31

Flit and Credit Encoding

(a) Flits and credits are sent over separated lines with separate width

(b) Flits and credits are transported via the same line. This can be done by

Including credits into flits Multiplexing flits and credits at phit level

Page 32: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

Network Interface

Slides are adapted from previous slides by

Ingo Sander and Axel Jantsch.

Page 33: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 33

Network-on-Chip

Information in the form of packets is routed via channels and switches from one terminal node to another

The interface between the interconnection network and the terminals (client) is called network interface

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

Page 34: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 34

Network-on-Chip

Information in the form of packets is routed via channels and switches from one terminal node to another

The interface between the interconnection network and the terminals (client) is called network interface

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

S

T

Page 35: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 35

Network Interface

Different terminals with different interfaces shall be connected to the network

The network uses a specific protocol and all traffic on the network has to comply to the format of this protocol

Switch

Network

NetworkInterface

TerminalNode

(Resource)

Page 36: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 36

Network Interface

The network interface plays an important role in a network-on-chip it shall translate between the terminal protocol and the

protocol of the network it shall enable the client to communicate at the speed of the

network it shall not further reduce the available bandwidth of the

network it shall not increase the latency imposed by the network

A poorly designed network interface is a bottleneck and can increase the latency considerably

Page 37: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 37

Network Interfaces

For message passging: symmetric Processor-Network Interface,

For shared memory: un-symmetric, load & store Processor-Network Interface Memory-Network Interface

Packet admission/ejection (line-fabric) Interface May reside in a switch or router Input queuing and output queuing

Page 38: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 38

Basci Functionality of Network Interfaces Packetization/depacketization

Network deliver packets. It does not know messages and transactions.

Sender side: packetization (messages to packets); Receiver side: de-packetization (packets to messages)

Multiplexing/demultiplexing Scheduling packets to be sent and receive Multiple threads running Sender: multiplexing; Receiver: de-multiplexing

Re-ordering A network servcie may not guarantee order

End-to-end flow control

Page 39: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 39

Network Interfaces for message passing

Two-register interface Register-mapped interface Descriptor-based interface Message reception

Page 40: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 40

Two-Register Interface For sending, the processor write

to a specific Net-out register For receiving, the processor

reads a specific Net-in register Pro:

Efficient for short messages Cons:

Inefficient for long messages Processor acts as DMA

controller Not safe, because, for longer

messages, the processor may block network resources forever

Network

Net out Net in

R0

R1

:R31

:

Page 41: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 41

Descriptor Based Interface The processor composes a

message in a set of dedicated message descriptor registers

Each descriptor contains An immediate value, or A reference to a processor

register, or A reference to a block of

memory A co-processor steps through

the descriptors and composes the messages

Safe because the network is protected from the processor’s SW

Immediate

END

RN

Addr

Length

R0

R1

:

:

RN

:

R31

:

:

:

:

Memory+

Send Start

Page 42: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 42

Receiving Messages

A co-processor or a dedicated thread is triggered upon reception of an incoming message

It unpacks the message and stores it in local memory

It informs the receiving task via an interrupt or a status register update

Page 43: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 43

Shared Memory Interfaces

The interconnection network is used to transmit memory read/write transactions between processors and memories

We will further discuss Processor-Network Interface Memory-Network Interface

Page 44: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 44

Processor-Network Interface

Request are stored in request register

Requests are tagged so that answer can be associated to request

In case of a cache miss requests are stored in MSHR (miss status holding register)

Page 45: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 45

Processor-Network Interface Uncacheable read request

would result in a pending read

After forming and transmitting the message status changes to read requested

When the network returns the message status changes to read complete

Completed MSHRs are forwarded to reply register, status changes to idle

Page 46: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 46

Processor-Network Interface Cache coherence protocols

change the operation of the processor-network interface

1. Complete cache lines are loaded into the cache

2. Protocol requires larger vocabulary

Exclusive read request Invalidation and updating

of cache lines3. Cache coherence protocol

requires interface to send messages and update state in response to received messages

Page 47: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 47

Memory-Network Interface

Interfaces receives memory request messages and sends replies

Messages received from the network are stored in the TSHR (transaction status holding register)

Page 48: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 48

Memory-Network Interface

Request queue is used to hold request messages, when all THSRs are busy

THSR tracks messages in same way as MHSR

Bank Control and Message Transmit Unit monitors changes in THSR

Page 49: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 49

Memory-Network Interface A read request initializes a

TSHR with status read pending

Subsequent memory access changes status to bank activated

Two cycles before first word is returned from memory bank, status is changed to read complete

Message transmit unit formats message and injects it into the network and the TSHR entry is marked idle

Requests can be handled out of order

Page 50: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 50

Memory-Network Interface

Cache coherence protocols can be implemented with this structure, however TSHR must be extended

Page 51: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 51

Packet Admission/Ejection (Line-Fabric) Interface

Network has higher bandwidth than the input and output lines, but links may be blocked due to congestion.

Packets aiming for different destinations come from the same input port.

Queues are needed to store packets that cannot enter the network because of congestion in the network cannot enter the terminal

Page 52: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 52

Packet Admission/EjectionInterface

Why parallel queues rather than a single FIFO? If there are traffic classes with different priorities, there

should be a queue for every traffic class high-priority traffic is not blocked by low-priority traffic Alleviate head-of-line blocking Implement an admission/ejection control policy based on

priority, rate etc.

Page 53: Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one

April 21, 2023 SoC Architecture 53

Summary

Network interfaces bridge processor and processor, processor and memory Messaing passing interfces Shared memory interfaces, complicated by cache

coherency. Packet admission and ejection interfaces at

the network boundary are also important to use the network better (higher throughput, lower latency).