special course in computer science: local networksusers.abo.fi/lpetre/localnet12/lecture9.pdfniu...

99
Lecture 9 9.5.2012 Special Course in Computer Science: Local Networks

Upload: others

Post on 16-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Lecture 9 9.5.2012

Special Course in Computer Science: Local Networks

Page 2: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Roadmap of the Course So far

Basic telecom concepts General study of LANs Local Networks

Ethernet Token bus Token ring ATM LAN Wi-Fi

LAN performance, connecting LANs, LAN management

Today Networks-on-Chip (NoC)

Next Åbo Akademi’s local network (Friday 11.5) – Jan Wennström Sensor networks (Wednesday 16.5 or Friday 18.5) – Maryam Kamali Efficiency in ÅA’s network, Bluetooth, Personal networks, etc

(remaining date)

Page 3: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Network-on-Chip (NoC) Why NoC OCP Standard Fundamentals of NoC Emerging NoC Paradigms

Page 4: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Why Microprocessor Critical for computer technology Connects computational engine to a memory system Multiple technologies Dedicated functional units added

ASICs (application specific integrated circuits) SoC born (System-on-Chip)

Became more complex On-chip communication difficult Hard for buses to maintain performance at reasonable cost

Design productivity affected too

Page 5: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

From simpler to more complex chips

Page 6: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

On-chip communication

Page 7: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Problems to solve Cores that perform different functions work at

different clock frequencies Graphics processing Digital signalling

Communication over larger area => latency Processor shrink but contain more transistor and

cores => interconnection more complex Power Performance

Page 8: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some driving forces: Scalability Bus technology not very scalable with respect to High performance Effective on-chip communication Reasonable cost

Segmenting buses Bus: long wire, globally clocked, stretching on whole chip Bus segments: separate groups of locally clocked, bundled

wires, BRIDGED with each other at each end Difficult for complex SoCs and multicore chips Need manual design for different segments suiting specific chip

architectures (core placements and configurations) Expensive and time-consuming

Page 9: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some driving forces Technical Issues Wires get closer to one another => physical problems

leading to bad bus performance Parasitic capacitance Unwanted storage of an electrical charge between close wires

Design Productivity gap Increasingly smaller transistors => more fit on chips

Page 10: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC in a nutshell First NoC research from Philips, 2001 Circuit-switching Could advantage e.g., high-performance computing Dedicated circuits => increased bandwidth efficiency No routing flexibility => network congestion: problem

Packet-switching NIU (network interface units) make the packets from

intercore data Routers and switches connect cores via copper wiring Routing algorithms and tables Routing: simple logic to save host-chip performance, power Host-chip limitations => small, energy-efficient, fast routers needed

Page 11: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC illustration

Page 12: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches
Page 13: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Networking techniques On-chip communication should be Very fast, latency-free, flexible => simple networking =>

trade-offs (fewer capabilities) TCP/IP does not work Too much latency

Open Core Protocol International Partnership OCP standard for on-chip communications

Proprietary approaches NoC Transaction and Transport protocol Universal data-translation protocol for intercore communication From Arteris chip designer

Page 14: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

OCP standard for on-chip communications

Page 15: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

OCP for on-chip communication Standard => not have to repeatedly define, verify,

document, support proprietary interface protocols Reusable IP cores Clearly delineated design boundaries Simplifies system verification and testing Boundary can be observed, controlled, validated

Optimized die area Any on-chip interconnect can be interfaced to OCP: Dedicated p2p communication (eg, pipelined signal

processing applications – video encoding) Simple slave-only applications (eg, slow peripheral

interfaces) High-performance, latency sensitive, multi-threaded

applications

Page 16: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

OCP Characteristics IP Core Simple peripheral core or high-performance

microprocessor or on-chip communication subsystem (wrapped on-chip bus)

OCP defined a point-to-point interface between 2 communicating entities One entity: master Can present commnads, is controlling

One entity: slave Responds to commands: accepts data or presents data

For p2p communication: 2 OCP instances

Page 17: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Wrapped Bus and OCP instances

Page 18: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Flexibility of OCP Several useful models of how existing cores

communicate with each other Pipelining Improves bandwidth and latency characteristics

Mutiple-cycle access models Signals are held static for several clock cycles to simplify timing

analysis and reduce implementation area Synchronous handshake signals Master and slave control when sugnals are allowed to change

Highly configurable interface Data flow, control, verification and test signals

Page 19: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some fundamental OCP concepts: Commands

Commands Read, Write, WriteNonPost, Broadcast, ReadExclusive,

ReadLinked, WriteConditional WriteNonPost Explicitly instructs slave not to post a write

Broadcast Master indicates it attempts to write to several/all remote targets

connected to the other side of a slave Synchronization between system initiators ReadExclusive: paired with Write/WriteNonPost, blocking

semantics ReadLinked: paired with WriteConditional: non-blocking

semantics

Page 20: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some fundamental OCP concepts: Address/Data

OCP address: 1 byte Many IP cores have data field widths much greater

than 1 byte Configurable data width Chosen data field width: word size of the OCP A word is the natural transfer unit of the block

OCP word sizes power-of-two and non-power-of-two (needed for 12-bit DSP core)

Page 21: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some fundamental OCP concepts: Pipelining, Response, Burst

Pipelining Of transfers Returning read data can be delayed after presentation of

respective pipelined request Response Slave can accept command request from master on one

cycle and respond in a later cycle Pipelining

Responses for Write commands or completing them immediately without explicit responses

Burst set of transfers linked together into a transaction having

defined address sequence and number of transfers Imprecise, precise, single

Page 22: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some fundamental OCP concepts: In-band Information, Tags

Core-specific information can be passed in-band in company with other information being exchanged for requests and responses, read and write data Exp: pass cacheable information

Tags Control the ordering of responses Links the response back to the original request Without tags

Slave must return responses in the order the requests were issued by the master

Writes must be committed in order With tags

responses can be returned out-of-order and write data can be committed out-of-order with respect to requests as long as the transactions target different addresses

Page 23: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some fundamental OCP concepts: Threads

Support for concurrency and out-of-order processing of transfers

Transactions within different threads no ordering requirements independent flow control from one another

Within a single thread of data flow OCP transfers remain ordered unless tags are in use

Threads and tags hierarchical Each thread has its own flow control Ordering within a thread either follows the request order

strictly or is governed by tags

Page 24: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some fundamental OCP concepts: Sideband signalling

Moving data between cores: central Other communication types also important Different types of control signalling Dedicated point-to-point data communication Notification of errors unrelated to address/data transfers

Reporting errors The error response code in the response field describes errors

resulting from OCP transfers that provide responses Write-type commands without responses cannot use the in-band

reporting mechanism Out-of band error fields signals report more generic sideband errors, including those

associated with posted write commands

Page 25: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Fundamentals of NoC

Page 26: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Introduction Network-on-chip (NoC) is a packet switched on-chip

communication network designed using a layered methodology “routes packets, not wires”

NoCs use packets to route data from the source to the destination PE via a network fabric that consists of

switches (routers) interconnection links (wires)

Page 27: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Introduction NoCs are an attempt to scale down the concepts of

largescale networks, and apply them to the embedded system-on-chip (SoC) domain

NoC Properties Regular geometry that is scalable Flexible QoS guarantees Higher bandwidth Reusable components

Buffers, arbiters, routers, protocol stack No long global wires (or global clock tree)

No problematic global synchronization GALS: Globally asynchronous, locally synchronous design

Reliable and predictable electrical and physical properties

Page 28: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Introduction ISO/OSI network protocol stack model

Page 29: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Direct Topologies

each node has direct point-to-point link to a subset of other nodes in the system called neighboring nodes

nodes consist of computational blocks and/or memories, as well as a NI block that acts as a router

e.g. Nostrum, SOCBUS, Proteo, Octagon as the number of nodes in the system increases, the total

available communication bandwidth also increases fundamental trade-off is between connectivity and cost

Page 30: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Most direct network topologies have an orthogonal

implementation, where nodes can be arranged in an n-dimensional orthogonal space routing for such networks is fairly simple e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon

2D mesh is most popular topology all links have the same length

eases physical design area grows linearly with the number

of nodes must be designed in such a way as to

avoid traffic accumulating in the center of the mesh

Page 31: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Torus topology, also called a k-ary n-cube, is an n-

dimensional grid with k nodes in each dimension k-ary 1-cube (1-D torus) is essentially a ring network with k nodes

limited scalability as performance decreases when more nodes

k-ary 2-cube (i.e., 2-D torus) topology is similar to a regular mesh except that nodes at the edges are connected

to switches at the opposite edge via wrap- around channels

long end-around connections can, however, lead to excessive delays

Page 32: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Folding torus topology overcomes the long link

limitation of a 2-D torus links have the same size

Meshes and tori can be extended by adding bypass links

to increase performance at the cost of higher area

Page 33: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Octagon topology is another example of a direct network

messages being sent between any 2 nodes require at most two hops

more octagons can be tiled together to accommodate larger designs by using one of the nodes is used as a bridge node

Page 34: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Indirect Topologies

each node is connected to an external switch, and switches have point-to-point links to other switches

switches do not perform any information processing, and correspondingly nodes do not perform any packet switching

e.g. SPIN, crossbar topologies

Fat tree topology nodes are connected only to the leaves of the tree more links near root, where bandwidth requirements are higher

Page 35: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology k-ary n-fly butterfly network

blocking multi-stage network – packets may be temporarily blocked or dropped in the network if contention occurs

kn nodes, and n stages of kn-1 k x k crossbar e.g. 2-ary 3-fly butterfly network

Page 36: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology (m, n, r) symmetric Clos network

three-stage network in which each stage is made up of a number of crossbar switches

m is the no. of middle-stage switches n is the number of input/output

nodes on each input/output switch r is the number of input and output

switches e.g. (3, 3, 4) Clos network non-blocking network expensive (several full crossbars)

Page 37: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Benes network

rearrangeable network in which paths may have to be rearranged to provide a connection, requiring an appropriate controller

Clos topology composed of 2 x 2 switches e.g. (2, 2, 4) re-arrangeable Clos network constructed using two (2,

2, 2) Clos networks with 4 x 4 middle switches

Page 38: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Topology Irregular or ad hoc network topologies

customized for an application usually a mix of shared bus, direct, and indirect network topologies e.g. reduced mesh, cluster-based hybrid topology

Page 39: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Switching strategies Determine how data flows through routers in the network Define granularity of data transfer and applied switching

technique phit is a unit of data that is transferred on a link in a single cycle

Page 40: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Switching strategies Two main modes of transporting flits in a NoC are circuit

switching and packet switching Circuit switching

physical path between the source and the destination is reserved prior to the transmission of data

message header flit traverses the network from the source to the destination, reserving links along the way

Advantage: low latency transfers, once path is reserved Disadvantage: pure circuit switching does not scale well with NoC

size several links are occupied for the duration of the transmitted data, even

when no data is being transmitted for instance in the setup and tear down phases

Page 41: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Switching strategies Virtual circuit switching

creates virtual circuits that are multiplexed on links number of virtual links (or virtual channels (VCs)) that can be

supported by a physical link depends on buffers allocated to link Possible to allocate either one buffer per virtual link or one buffer

per physical link Allocating one buffer per virtual link

depends on how virtual circuits are spatially distributed in the NoC, routers can have a different number of buffers

can be expensive due to the large number of shared buffers multiplexing virtual circuits on a single link also requires scheduling at

each router and link (end-to-end schedule) conflicts between different schedules can make it difficult to achieve

bandwidth and latency guarantees

Page 42: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Switching strategies Allocating one buffer per physical link

virtual circuits are time multiplexed with a single buffer per link uses time division multiplexing (TDM) to statically schedule the usage of

links among virtual circuits flits are typically buffered at the NIs and sent into the NoC according to

the TDM schedule global scheduling with TDM makes it easier to achieve end-to-end

bandwidth and latency guarantees less expensive router implementation, with fewer buffers

Page 43: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Switching strategies Packet Switching

packets are transmitted from source and make their way independently to receiver possibly along different routes and with different delays

zero start up time, followed by a variable delay due to contention in routers along packet path

QoS guarantees are harder to make in packet switching than in circuit switching

three main packet switching scheme variants SAF (store-and-forward) switching

packet is sent from one router to the next only if the receiving router has buffer space for entire packet

buffer size in the router is at least equal to the size of a packet Disadvantage: excessive buffer requirements

Page 44: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Switching strategies VCT (virtual cut through) Switching

reduces router latency over SAF switching by forwarding first flit of a packet as soon as space for the entire packet is available in the next router

if no space is available in receiving buffer, no flits are sent, and the entire packet is buffered

same buffering requirements as SAF switching WH (wormhole) switching

flit from a packet is forwarded to receiving router if space exists for that flit

parts of the packet can be distributed among two or more routers buffer requirements are reduced to one flit, instead of an entire packet more susceptible to deadlocks due to usage dependencies between

links

Page 45: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Routing algorithms Responsible for correctly and efficiently routing packets or

circuits from the source to the destination Choice of a routing algorithm depends on trade-offs between

several potentially conflicting metrics minimizing power required for routing minimizing logic and routing tables to achieve a lower area footprint increasing performance by reducing delay and maximizing traffic

utilization of the network improving robustness to better adapt to changing traffic needs

Routing schemes can be classified into several categories static or dynamic routing distributed or source routing minimal or non-minimal routing

Page 46: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Routing algorithms Static and dynamic routing

static routing: fixed paths are used to transfer data between a particular source and destination does not take into account current state of the network

advantages of static routing: easy to implement, since very little additional router logic is required in-order packet delivery if single path is used

dynamic routing: routing decisions are made according to the current state of the network considering factors such as availability and load on links

path between source and destination may change over time as traffic conditions and requirements of the application change

more resources needed to monitor state of the network and dynamically change routing paths

able to better distribute traffic in a network

Page 47: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Routing algorithms Distributed and source routing

static and dynamic routing schemes can be further classified depending on where the routing information is stored, and where routing decisions are made

distributed routing: each packet carries the destination address e.g., XY co-ordinates or number identifying destination node/router routing decisions are made in each router by looking up the destination

addresses in a routing table or by executing a hardware function source routing: packet carries routing information

pre-computed routing tables are stored at a nodes’ NI routing information is looked up at the source NI and routing information is

added to the header of the packet (increasing packet size) when a packet arrives at a router, the routing information is extracted from

the routing field in the packet header does not require a destination address in a packet, any intermediate routing

tables, or functions needed to calculate the route

Page 48: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Routing algorithms Minimal and non-minimal routing

minimal routing: length of the routing path from the source to the destination is the shortest possible length between the two nodes e.g. in a mesh NoC topology (where each node can be identified by its XY

co-ordinates in the grid) if source node is at (0, 0) and destination node is at (i, j), then the minimal path length is |i| + |j|

source does not start sending a packet if minimal path is not available non-minimal routing: can use longer paths if a minimal path is not

available by allowing non-minimal paths, the number of alternative paths is increased,

which can be useful for avoiding congestion disadvantage: overhead of additional power consumption

Page 49: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Routing algorithms Routing algorithm must ensure freedom from deadlocks

common in WH switching e.g. cyclic dependency

freedom from deadlocks can be ensured by allocating additional hardware resources or imposing restrictions on the routing

usually dependency graph of the shared network resources is built and analyzed either statically or dynamically

Page 50: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Routing algorithms Routing algorithm must ensure freedom from livelocks

livelocks are similar to deadlocks, except that states of the resources involved constantly change with regard to one another, without making any progress occurs especially when dynamic (adaptive) routing is used e.g. can occur in a deflective “hot potato” routing if a packet is bounced

around over and over again between routers and never reaches its destination

livelocks can be avoided with simple priority rules

Routing algorithm must ensure freedom from starvation under scenarios where certain packets are prioritized during routing,

some of the low priority packets never reach their intended destination

can be avoided by using a fair routing algorithm, or reserving some bandwidth for low priority data packets

Page 51: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Flow control schemes Goal of flow control is to allocate network resources for

packets traversing a NoC can also be viewed as a problem of resolving contention during packet

traversal At the data link-layer level, when transmission errors occur,

recovery from the error depends on the support provided by the flow control mechanism e.g. if a corrupted packet needs to be retransmitted, flow of packets from the

sender must be stopped, and request signaling must be performed to reallocate buffer and bandwidth resources

Most flow control techniques can manage link congestion But not all schemes can (by themselves) reallocate all the

resources required for retransmission when errors occur either error correction or a scheme to handle reliable transfers must be

implemented at a higher layer

Page 52: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

ACK/NACK flow control scheme

when flits are sent on a link, a local copy is kept in a buffer by sender when ACK received by sender, it deletes copy of flit from its local buffer when NACK is received, sender rewinds its output queue and starts

resending flits, starting from the corrupted one implemented either end-to-end or switch-to-switch sender needs to have a buffer of size 2N + k

N is number of buffers encountered between source and destination k depends on latency of logic at the sender and receiver

overall a minimum of 3N + k buffers are required fault handling support comes at cost of greater power, area overhead

Page 53: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Flow control schemes Network and Transport-Layer Flow Control

Flow Control without Resource Reservation Technique #1: drop packets when receiver NI full

improves congestion in short term but increases it in long term Technique #2: return packets that do not fit into receiver buffers to sender

to avoid deadlock, rejected packets must be accepted by sender Technique #3: deflection routing

when packet cannot be accepted at receiver, it is sent back into network packet does not go back to sender, but keeps hopping from router to router

till it is accepted at receiver Flow Control with Resource Reservation

credit-based flow control with resource reservation credit counter at sender NI tracks free space available in receiver NI buffers credit packets can piggyback on response packets end-to-end or link-to-link

Page 54: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Clocking schemes Fully synchronous

single global clock is distributed to synchronize entire chip hard to achieve in practice, due to process variations and clock skew

Mesochronous local clocks are derived from a global clock not sensitive to clock skew phase between clock signals in different modules may differ

deterministic for regular topologies (e.g. mesh) non-deterministic for irregular topologies

synchronizers needed between clock domains

Pleisochronous clock signals are produced locally

Asynchronous clocks do not have to be present at all

Page 55: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Quality of Service (QoS) QoS refers to the level of commitment for packet delivery

refers to bounds on performance (bandwidth, delay, and jitter)

Three basic categories best effort (BE)

only correctness and completion of communication is guaranteed usually packet switched worst case times cannot be guaranteed

guaranteed service (GS) makes a tangible guarantee on performance, in addition to basic guarantees

of correctness and completion for communication usually (virtual) circuit switched

differentiated service prioritizes communication according to different categories NoC switches employ priority based scheduling and allocation policies cannot provide strong guarantees

Page 56: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

NoC Architectures examples

Page 57: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Æthereal Developed by Philips Synchronous indirect network WH switching Contention-free source routing based on TDM GT as well as BE QoS GT slots can be allocated statically at initialization phase,

or dynamically at runtime BE traffic makes use of non-reserved slots, and any

unused reserved slots also used to program GT slots of the routers

Link-to-link credit-based flow control scheme between BE buffers to avoid loss of flits due to buffer overflow

Page 58: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

HERMES Developed at the Faculdade de Informática PUCRS,

Brazil Direct network 2-D mesh topology WH switching with minimal XY routing algorithm 8 bit flit size; first 2 flits of packet contain header Header has target address and number of flits in the

packet Parameterizable input queuing

to reduce the number of switches affected by a blocked packet Connectionless: cannot provide any form of bandwidth or

latency GS

Page 59: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

MANGO Message-passing Asynchronous Network-on-chip providing GS

over open core protocol (OCP) interfaces Developed at the Technical University of Denmark Clockless NoC that provides BE as well as GS services NIs (or adapters) convert between the synchronous OCP

domain and asynchronous domain Routers allocate separate physical buffers for VCs

For simplicity, when ensuring GS

BE connections are source routed BE router uses credit-based buffers to handle flow control length of a BE path is limited to five hops

static scheduler gives link access to higher priority channels admission controller ensures low priority channels do not starve

Page 60: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Nostrum Developed at KTH in Stockholm Direct network with a 2-D mesh topology SAF switching with hot potato (or deflective) routing Support for

switch/router load distribution guaranteed bandwidth (GB) multicasting

GB is realized using looped containers implemented by VCs using a TDM mechanism container is a special type of packet which loops around VC multicast: simply have container loop around on VC having recipients

Switch load distribution requires each switch to indicate its current load by sending a stress value to its neighbors

Page 61: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Octagon Developed by STMicroelectronics direct network with an octagonal topology 8 nodes and 12 bidirectional links Any node can reach any other node with a max of 2 hops Can operate in packet switched or circuit switched mode Nodes route a packet in packet switched mode according

to its destination field node calculates a relative address and then packet is routed either

left, right, across, or into the node

Can be scaled if more than 8 nodes are required Spidergon

Page 62: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

QNoC Developed at Technion in Israel Direct network with an irregular mesh topology WH switching with an XY minimal routing scheme Link-to-link credit-based flow control Traffic is divided into four different service classes

signaling, real-time, read/write, and block-transfer signaling has highest priority and block transfers lowest priority every service level has its own small buffer (few flits) at switch input

Packet forwarding is interleaved according to QoS rules high priority packets able to preempt low priority packets

Hard guarantees not possible due to absence of circuit switching Instead statistical guarantees are provided

Page 63: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

SOCBUS Developed at Linköping University Mesochronous clocking with signal retiming is used Circuit switched, direct network with 2-D mesh topology Minimum path length routing scheme is used Circuit switched scheme is

deadlock free requires simple routing hardware very little buffering (only for the request phase) results in low latency

Hard guarantees are difficult to give because it takes a long time to set up a connection

Page 64: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

SPIN Scalable programmable integrated network (SPIN) fat-tree topology, with two one-way 32-bit link data paths WH switching, and deflection routing Virtual socket interface alliance (VSIA) virtual component

interface (VCI) protocol to interface between PEs Flits of size 4 bytes First flit of packet is header

first byte has destination address (max. 256 nodes) last byte has checksum

Link level flow control Random hiccups can be expected under high load

GS is not supported

Page 65: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Xpipes Developed by the Univ. of Bologna and Stanford University Source-based routing, WH switching Supports OCP standard for interfacing nodes with NoC Supports design of heterogeneous, customized (possibly

irregular) network topologies go-back-N retransmission strategy for link level error control

errors detected by a CRC (cycle redundancy check) block running concurrently with the switch operation

XpipesCompiler and NetChip compilers Tools to tune parameters such as flit size, address space of cores, max.

number of hops between any two network nodes, etc. generate various topologies such as mesh, torus, hypercube, Clos, and

butterfly

Page 66: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Emerging NoC paradigms Overall goal of on chip communication: transmit data with low latencies and high throughput using the least possible power and resources

Page 67: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

67 67

Limitations of a Traditional NoC

Multi-hop wireline communication • High Latency and energy dissipation

source destination

-core

-NoC interface

-NoC switch

80% of chip power will be from on-chip interconnects in the next 5 years – ITRS, 2007

Page 68: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

68 68

Wireless/RF Interconnects

Optical Interconnects

Three Dimensional Integration

Novel Interconnect Paradigms for Multicore designs

High Bandwidth and

Low Energy Dissipation

Page 69: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Motivation Modern on-chip electrical interconnects (EIs) are realized

using copper wires On-chip EIs can be classified into two categories

local interconnects: used for short distance communication, delay of less than a clock cycle

global interconnects: fewer in number, used for long distance communication, delay spanning multiple cycles

Long global EIs typically have high RC constants which results in greater propagation delay, transition time, and

crosstalk noise

Repeater insertion and increasing wire width can reduce the propagation delay somewhat

Page 70: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Motivation Increasingly harder for copper-based EI to satisfy design

requirements delay, power, bandwidth, delay uncertainty

Steep rise in parasitic resistance and capacitance of copper interconnects poses serious challenges for interconnect delay (especially at the global level) power dissipation interconnect reliability

Page 71: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Motivation Copper interconnects also constitute up to 70% of total on-

chip capacitance major sources of power dissipation

According to the International Technology Roadmap for Semiconductors (ITRS), interconnect innovation with new technologies is vital to satisfy performance, reliability, power requirements in long term support ultra-high data rates (greater than 100 Gbps/pin) be scalable enough to support tens to hundreds of concurrent

communication streams

Page 72: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Emerging Alternatives Optical Interconnects RF/Wireless Interconnects CNT Interconnects

Page 73: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects Optical interconnects (OIs) potential to overcome communication bottleneck replace electrical wires with optical waveguides

OIs offer many advantages over traditional electrical (copper-based) interconnects can support enormous intrinsic data bandwidths in the order of several

Gbps using only simple on–off modulation schemes relatively immune to electrical interference due to crosstalk and parasitic

capacitances power dissipation completely independent of transmission distance at

the chip level routing and placement simplified since it is possible to physically

intersect light beams with minimal crosstalk once a path is acquired, transmission latency of the optical data is very

small

Page 74: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects Board-to-board and chip-to-chip OIs proposed actively under development feasibility of on-chip OIs is an open research problem

High speed, electrically driven, on-chip monolithic light source still remains to be realized

Page 75: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects (Off-chip) laser source

provides light to modulator

Transmitter electro-optical modulator

transduces electrical data supplied from electrical driver into a modulated optical signal

several high speed electro-optical modulators proposed change refractive index or absorption coefficient of an optical path when an electrical

signal is injected Mach-Zehnder interferometer-based silicon modulators

higher modulation speeds (several GHz) large power consumption greater silicon footprint (around 10 mm)

Microresonator-based P-I-N diode type modulators compact in size (10–30 μm) low power consumption low modulation speeds (several MHz)

Page 76: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects

Waveguide path through which light is routed refractive index of waveguide material has

a significant impact on bandwidth, latency, and area of an OI two promising alternatives for waveguide material that trade-off

propagation speed and bandwidth Silicon (Si)

lower propagation speed, but low area footprint (higher bandwidth density) Polymer

higher propagation speed but greater area (reduced bandwidth density)

Page 77: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects Receiver

responsible for converting the optical signal back to an electrical signal

photo-detector must have high value for quantum efficiency

lower losses when converting optical information into an electrical form e.g. interdigitated metal–semiconductor–metal (MSM) receivers

fast response and excellent quantum efficiency

wave selective filter optional – only used when(WDM) is used selects among different received wavelengths

trans-impedance amplifier (TIA) stage converts photo-detector current to a voltage which is thresholded by

subsequent stages to digital levels

Page 78: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects OIs have intrinsic advantage of low signal propagation delay

in waveguides due to the absence of RLC impedances

Page 79: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects OIs require an electrical signal to be first converted into an

optical signal and then back into an electrical signal conversion delay independent of interconnect length

OIs will have a delay advantage over EIs only if waveguide propagation delay dominates overall delay

Page 80: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Optical Interconnects: Open Problems

Efficient transmitter and receiver components high speed, low power, and small feature-size electro-optical

modulators and photo-detector receivers need to be developed Integrated on-chip light source

such as the Indium Phosphide Hybrid Silicon Laser from Intel and UCSB [2008]

Polymer waveguide prohibitive manufacturing cost and complexity require suitable modulators

Temperature management OIs sensitive to temperature variations active or passive optical control method required to maintain stable

device operation

Page 81: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

RF/Wireless Interconnects Replace on-chip wires with integrated on-chip antennas

communicating via electromagnetic waves Data is converted from baseband (i.e., digital) to

RF/microwave signals and transmitted through free space or guided mediums

Free space signal broadcasting and reception is a common practice in modern wireless systems due to low cost implementation and excellent channeling capabilities

However, free space transmission and reception of RF/microwave signals requires an antenna size that is comparable to its wavelength even at near 100 GHz operating and cut-off frequencies in the future

the aperture size of the antenna = 1 mm2 which is too large

Page 82: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

RF/Wireless Interconnects Microwave transmission in guided mediums such as

microstrip transmission line (MTL) or coplanar waveguide (CPW) is a more viable alternative to free space low attenuation up to at least 200 GHz requires a smaller antenna size

Simulation results have shown that a 1cm long CPW experiences extremely low loss (1.6 dB at 100 GHz), and low dispersion (less than 2 dB for 50–150 GHz) EIs have 60 and 115 dB loss per cm at 100 GHz, and a freq dispersion

of 30–40 dB

Thus microwave transmission over MTL or CPW has a clear advantage over conventional EIs especially for global interconnects operating in multi-GHz range

Page 83: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

RF/Wireless Interconnects On-chip antennas fabricated on Si substrates

microstrip, dipole, and loop antennas

linear meander

zig-zag folded

Combining different antenna structures such as folded and meander can provide a higher power gain and a more compact on-chip antenna structure

Page 84: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

RF/Wireless Interconnects

Paths formed by

passage through air refraction through the SiO2 layer and reflection at the interface between

the silicon substrate and the underlying dielectric layer (A1N) refraction through the SiO2 and dielectric layers, and reflection by the

metal chuck that emulates a heat sink in the back of a die

Page 85: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

RF/Wireless Interconnects: Research Wireless interconnects may not only reduce the wires in

integrated circuits, but can also be used to replace I/O pins Signals propagating on these multiple paths constructively

and destructively interfere It was found by O et al. [ICCAD 2005] that by increasing the

A1N thickness, destructive signal interference is significantly reduced

Page 86: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

RF/Wireless Interconnects: Open Problems

Packaging and interference issues metal structures near antennas can change input impedances and

phase of received signals interference effects between the transmitted/received signal and

switching noise of nearby circuits

Ultra-high frequency requirements for antenna sizes to be feasible for on-chip fabrication, RF circuits must

operate in the ultra-high frequency domain (i.e., ~100 GHz range) unsuitable for applications in the very near future may be feasible in 5 years if scaling trends continue

Power overhead each transceiver has its own dedicated RF and CDR circuits

heavy circuit overhead as well as large power consumption

Page 87: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

RF/Wireless Interconnects: Open Problems

On-chip antennas lots of research on fabricating antennas on lossless or lower loss

substrates such as polytetrafluoroethylene (PTFE), quartz, duroid, and GaAs in the millimeter wave range

Not sufficient research in the area of fabricating antennas on silicon substrate much more lossy than other types of substrates reduces the antenna efficiency, requiring possibly greater power

Reference crystal oscillator required for FDMA cannot be easily implemented on-chip - has a large size

Security RF/wireless interconnects may be susceptible to hackers

Page 88: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

CNT Interconnects Carbon nanotubes – CNTs – are sheets of

graphite rolled into cylinders of diameters varying from 0.6 to about 3 nm

Demonstrate either metallic or semiconducting properties depending on direction in which they are rolled

CNTs are promising candidates as on-chip interconnects high mechanical and thermal stability high thermal conductivity large current carrying capacity highly resistant to electromigration and other sources of breakdown much better conductivity properties than Cu

due to longer electron mean free path (MFP) lengths in the micrometer range, compared to nanometer range MFP lengths for Cu

Page 89: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

CNT Interconnects It is predicted that isolated CNTs can replace Cu at the local

interconnect level because of their much lower lateral capacitance which improves

latency for very short distances However, for longer on-chip interconnections a bundle of

CNTs conducting current in parallel are more suitable because of high intrinsic resistance of an isolated CNT (> 6.45 KΩ)

A bundle of CNTs consists of metallic nanotubes that contribute to current conduction semiconducting nanotubes that do not contribute to current conduction

in an interconnect CNTs broadly classified into

SWCNT: isolated (single walled) CNT MWCNT: multi-walled CNT, made up of concentric SWCNTS

Page 90: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Multi-Wall Carbon Nanotubes (MWCNT)

MWCNTs may have diameters in a wide range varying from a few to hundreds of nanometers SWCNTs have diameters in the few nanometer range

MWCNT made up of SWCNTs with varying diameters have many of the properties of SWCNTs if properly connected to contacts, all MWCNT shells can conduct

Page 91: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Carbon Nanotube Interconnect: Open Problems

Inefficient metal-nanotube contacts makes propagation delay with CNT interconnects higher than with

Cu interconnects

Small MFP length MFP lengths need to be increased to reduce CNT propagation

delay

Density of nanotube bundles needed for global CNT interconnects that perform better than Cu

interconnects

Inductive effects at high frequencies currently inductive effects are ignored but expected to become

significant at very high frequencies

Page 92: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches
Page 93: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches
Page 94: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Conceptual transmitter and receiver

Page 95: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Small-world topology Very short average path length No of hops between any two nodes Less than log(N), N - no of nodes => interesting for

communication with minimal resources Can be constructed from a locally connected

netwrok by rewiring connections randomly and creating shortcuts in the network By following probability distributions depending on

inter-node distances and frequency of interactions Meteheuristics for optimizing topologies Evolutionary algorithms, simulated annealing

Page 96: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches
Page 97: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches
Page 98: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

3D and NoCs 3D stacking technologies Massive parallelism Distributed memory architecture 3D NoCs faster 3D Multiple silicon layers Attractive solution for growing pace of SoC Easier heterogeneous layers

One layer -> one technology NoCs: necessity for 3D chips Arbitrarily good scalability Efficient parallelism of computation and communication

Page 99: Special Course in Computer Science: Local Networksusers.abo.fi/lpetre/localnet12/lecture9.pdfNIU (network interface units) make the packets from intercore data Routers and switches

Some comparison