[slide] a generic network interface architecture for a networked processor array - nepa

A Generic Network Interface Architecture for a Networked

Processor Array (NePA)

Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh

EECS @ University of California, Irvine

2

Outline

IntroductionNetwork-on-ChipRelated WorksGeneric Network Interface– Related Works– Networked Processor Array (NePA) Architecture– Generic Network Interface– Programming Sequence– Modular Wrapper for a Slave IP Core– Case Studies: Memory/ Turbo Decoder IP Cores

Summary

Introduction

0

40

80

120

160

200

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Year

Gat

e D

elay

in T

imes

(ps)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Wire

Del

ay in

Tim

es (p

s)

Gate Delay (HP) Gate Delay (LOP) Gate Delay (LSTP) Global Wire Delay 1 Global Wire Delay 2Metal1 Wire Delay 1 Metal1 Wire Delay 2 Int. Wire Delay 1 Int. Wire Delay 2

from ITRS 2004 Report

In 2018, the interconnection delay is estimated to be 1000 times greater than gate delay [ITRS]The interconnection network among multiple IPs becomes another challenging issue in System-on-Chip (SoC) design

4

Introduction (cont’d)

Current Trends in VLSI Technology– Requirements

• Computation intensive applications• Highly integrated + low power

– Increasing # of computing resources in SoC• CPUs, DSPs, ASPs

– System platforms• MPSoC (Multi Processor System-on-Chip) or CMP (Chip Multi

Processor)• Homogeneous/Heterogeneous processors• Similarity in a small scale distributed computer system

Interconnection?

5

Network-on-Chip Interconnection

Interconnect NetworkSwitch Link

NetworkInterface

CPU RAM

I/O

co-ProcessorCommunication

ROM

DSPPeripheral

The use of switching based technologyHave been extensively used for computer networkCommunication between IPs can be packet based

The key efficiency of NoCCommunication resources are SHARED !

6

Network-on-Chip (cont’d)

Network-on-Chip (NoC) Architecture– Network-like interconnection

• Insertion of routers• Shortened wiring requirement• Alleviating scalability and freedom from the limitation of complex

wiring– Difference from computer network technology (Internet TCP/IP)

• Simple and light-weight modification• low power requirement for mobile applications• Performance and cost

Different interface specification of integrated components raise a considerable difficulty for adopting NoCtechniques

7

Generic Network Interface

The reuse of IP cores in plug-and-play manner can be achieved by using a generic network interface (NI)

– Reduce design time of new system– Translate packet-based communication into a higher level

protocol– Decouple computation from communication– Hide the implementation details of interconnection

8

Related Works

Different Packetization strategy – Software library, on-core and off-core implementation– A hardware wrapper implementation has the lowest area

overhead and latencyNI for standard Interface such as OCP, DTL and AXI– Improve reuse of IP cores– Performance is penalized because of increasing latency

Generic architecture and automatic generation of interface– Existing researches limit the embedded IP cores to CPU (ARM7

and MC68000)

The designs of wrapper for application specific cores still lack generic aspects

9

NePA Architecture

System Platform

MSHostI/F(HI)

IP1

MemoryStation

(MS)

HostI/F(HI)

Processing Element

(PE)

Processing Element

(PE)

Processing Element

(PE)

Processing Element

(PE)

HostI/F(HI)

MemoryStation

(MS)

MemoryStation

(MS)

HostI/F(HI)

PE Router

Processor Core(ARM™ / MIPS etc)

ProgramRAM

DataRAM

NI

MemoryStation

(MS)

IP2

IP3

IP4

IPx Router

Specific IP(FFT, Viterbi or Turbo coder)

Network Interface

Router

Data RAM

NI

Memory Controller

10

NePA Architecture:High-Performance Router Architecture

Interconnect throughout FIFO between neighboring PEs

– Simple Interconnect WiringMinimal (shortest) adaptive routing

– Livelock-freePoint-to-point single or block transferTwo disjoint sub-networks for the west-to-east and east-to-west traffics

– Network avoids a cyclic dependency– Resulting in deadlock-freedom

Prioritized packet delivery

W Input

N1

Inpu

t

S1

Inpu

t

IntR Input

S1

Out

put E Output

N1 Output

W Input

S2

Inpu

t

N2

Inpu

t

IntL

Inpu

t

N2 Output

W output

S2 Output

N

W

S

E

N

S

N1 N2

W

INT

E

S1 S2

Int R

Int L

INT

Internal Router

Right Router

Left Router

11

Network Interface Prototype:Packetization Unit

Build the packet header and converts the data into flits– Header builder: form the head flit based on the information provided by

registers– DMA controller: generate control over the address and read signal for

the internal memory automatically– Flit Controller: wrap up the head flit and body flits into a packet

12

Network Interface Prototype:Depacketization Unit

Receive data from interconnection network– Flit Controller: select head flit from a packet and pass it to the header

parser– Header parser: extract control information from the head flit and assert

an interrupt signal to the OpenRISC core– DMA controller: writes the body flit data into the internal memory

automatically

13

Network Interface Prototype:Programming Sequence

Sending SINGLE Packet– All required parameters are set to the associated registers– Writing command register generate a complete packet

Sending BLOCK packet– sDataReg represents the number of data– sReadAddrReg indicates the start address of data in memory

Receiving SINGLE/BLOCK packets– Parameters are accessed by interrupt service routine– Accessing rDataReg completes the procedures for current packet– For BLOCK packet, OpenRISC sets the corresponding write

address (wWriteAddrReg) for internal memory access

14

Generic Network InterfaceModification of Packet for NI access

A slave IP core is not able to write registers in a current NIThese registers are accessed by other cores using the network

– Opcode and Operand of an instruction are located at Tag and Data field in the SINGLE packet

– Type field indicates that packet contains an instruction for NI– Instruction decoder in the header parser fetches opcode and

operand from a packet– Update internal registers

15

Generic Network InterfaceModular Wrapper for a slave IP core

Un-buffered Mode: data is exchanged in data stream without intermediate bufferBuffered Mode: data is saved in the intermediate buffer temporarily

16

Generic Network InterfaceModular Wrapper for a Memory

Maintain data and shared among a number of PEs.Assume synchronous SRAM model

Wrapper design– Core type is slave IP– There is no control signals for initialization or status monitoring– Data interface is realized in the un-buffered mode removing the

FIFOs between NI and memory

Programming sequence– The base address is set to the desired value using SINGLE

packet– Sending BLOCK packet stores data into memory– Read operation is done by sending SINGLE packet

17

Generic Network InterfaceModular Wrapper for a Turbo Decoder

Stand-alone turbo decoder operating block by block processWrapper design– Core type is slave IP– There are six signals that are used for initialization and mode

selection– Data interface adopts buffered mode, inserting FIFOs between NI

and the coreProgramming sequence– Before starting turbo decoding, it is initialized by sending packet

which accesses the input control signals– Data is sent to the core using BLOCK packet– When decoding of one block is completed, NI start to send a

packet to the other node automatically

18

Summary

Introduced Networked Processor Array (NePA)Proposed network interface architecture for OpenRISCcoreClassified the possible IP cores for processing elementsProposed a modular wrapper for an embedded IP cores– Allocation table was used for the configuration of the modular

wrapper– Programming model was presented– Case studies in memory and turbo decoder cores demonstrated

feasibility and efficiency of the proposal

[slide] a generic network interface architecture for a networked processor array - nepa

Documents

interconnection delay

generic network interface

chip noc architecture

standard interface

chip contdnetwork

embedded ip cores

reuse of ip cores performance

core implementation