pycoram: yet another implementation of coram memory architecture for modern fpga-based computing...

30
PyCoRAM Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki †‡ , Kenji Kise , James C. Hoe * Tokyo Institute of Technology JSPS Research Fellow * Carnegie Mellon University Dec 7, 2013 CARL2013@Davis, CA

Upload: shinya-takamaeda-yamazaki

Post on 31-May-2015

1.035 views

Category:

Technology


0 download

DESCRIPTION

Presentation slide for CARL2013 (Co-located with MICRO-46) at Davis, CA. PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing

TRANSCRIPT

Page 1: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM Yet Another Implementation of CoRAM Memory

Architecture for Modern FPGA-based Computing

Shinya Takamaeda-Yamazaki†‡, Kenji Kise†, James C. Hoe*

†Tokyo Institute of Technology ‡JSPS Research Fellow

*Carnegie Mellon University

Dec 7, 2013 CARL2013@Davis, CA

Page 2: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Agenda

n  Background n  PyCoRAM Overview

n  PyCoRAM Microarchitecture

n  Evaluation

n  Conclusion

Dec 7, 2013 Shinya T-Y. Tokyo Tech 2

Page 3: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 3

Background

Page 4: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

FPGA as SoC

n  Put together various components on a single FPGA l  CPU core

•  Microblaze (Soft-macro)

•  Cortex-A9 (Hard-macro)

l  Hardware accelerator logic •  Modeled in traditional RTL

–  Verilog HDL, VHDL

•  Modeled in new modeling tool –  Bluespec, AutoESL, Chisel, …

l  DDRx DRAM interface

l  PCI-express

l  Ethernet, …

Dec 7, 2013 Shinya T-Y. Tokyo Tech 4

FPGA

CPU HW Acc

DRAM I/F Ether PCI-E

Interconnect

HW Acc

Page 5: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Portability Issue of Application Design n  How to support various FPGA platforms?

l  Different logic size, memory interface, peripherals and I/O propertyL

Dec 7, 2013 Shinya T-Y. Tokyo Tech 5

Digilent Atlys (Xilinx Spartan-6 LX45)

Xilinx ML605 (Xilinx Virtex-6 LX240T)

ScalableCore System (our FPGA system) (Xilinx Spartan-6 LX16 × 128-node)

Page 6: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

IP-core Based System Development n  To build a system, add IP-cores and connect themJ

l  IP-cores are connected through a standard on-chip interconnect

l  EDK automatically generates an on-chip interconnection and (some) device-dependent interfaces

•  No (or few) annoying steps!

Dec 7, 2013 Shinya T-Y. Tokyo Tech 6 Xilinx Platform Studio (XPS)

IP-core List

Interconnect

FPGA

CPU HW Acc

DRAM I/F Ether PCI-E

Interconnect

HW Acc

DRAM

IP-core Instances

Page 7: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Abstract Memory System for FPGAs n  CoRAM (Connected RAM) [FPGA’11]

l  High-level abstraction for memory management •  Decoupling computing logics and memory access behaviors

•  Memory access patterns in software model (C language)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 7

HW Kernels (Computing Logics)

CoR

AM

M

emor

y

Read Write

Manage

Control Threads (Memory Access

Pattern in C)

CoRAM Channel

Read/Write Read/Write

Communication FIFOs (Registers)

Abstracted On-chip Memories

Off-chip Memory

Page 8: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 8

What  “Runs”  CoRAM?

Memory Translation (TLBs)

Architecture Microarchitecture

Memory Interfaces and Caches

Fabr

ic

Clus

ter D

MA

Control thread

programs

Control Logic

FPGA SR

AM Core Logic

Fabr

ic

Clus

ter D

MA

Control Logic

Network-on-Chip

6/19/2013 CoRAM Tutorial 18

From CoRAM Tutorial @FPGA’13

High-level synthesis from C to RTL using LLVM

RTL conversion

CONNECT NoC generator

Page 9: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 9

PyCoRAM Overview

Page 10: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Motivation: CoRAM for EDK n  Integration of CoRAM memory architecture for modern

EDK-based development flow with standard IP-cores

Dec 7, 2013 Shinya T-Y. Tokyo Tech 10

Standard On-chip Interconnect

CoRAM Abstraction

Accelerator logic Standard IP-core

Device-dependent Interfaces

CPU core

Portable application design with CoRAM Cooperation with standard IP-cores

Page 11: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM

n  Python-based implementation of CoRAM memory architecture for modern FPGA EDKs l  CoRAM memory abstraction for EDK development flow

n  Key features l  Control Thread in Python

•  We developed Python-to-Verilog HLS Compiler from scratch

l  AMBA AXI4 Interconnect for on-chip interconnect •  For IP-core based development on Xilinx Platform Studio (XPS)

l  Parameterized RTL Design Support for User-logic •  Generate-statement and Parameter-statement analyzed by our

original Verilog analysis tool-chain (Pyverilog)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 11

Page 12: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Comparison with Original CoRAM CoRAM PyCoRAM

Language for Control-Thread C Python

Supported Memory Operations

(Blocking/Non-Blocking) Read/Write

(Blocking/Non-Blocking) Read/Write

On-chip Interconnect CONNECT NoC [FPGA’12] AMBA AXI4

FSM Granularity in Control Thread LLVM-IR Python AST Node

Generate Statement Support for User logics No Yes

Supported FPGAs Xilinx ML605 Altera Terasic DE-4

Any FPGAs supporting AXI Bus

# Lines of Code 11,682 lines (w/o CONNECT)

4,922 lines (w/o Pyverilog)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 12

FSM: Finite State Machine LLVM-IR: Low Level Virtual Machine Intermediate Representation AST: Abstract Syntax Tree

Page 13: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM Development Flow n  PyCoRAM generates an IP-core package from user-logic

RTLs and control thread scripts in Python l  Each part can be replaced with the original CoRAM’s component

Dec 7, 2013 Shinya T-Y. Tokyo Tech 13

User-logic (Verilog HDL)

Control Threads (Python)

Logic Hierarchy Analysis

Python-to-Verilog

Compilation

Control Signal

Insertion IP-core Packing

(RTL, .mpd, and

.pao)

IP-core Integration

on EDK

Synthesis Control

Signal Port Addition

FPGA Bit File

Portable Application

Design

PyCoRAM Tool-chain Vendor EDA Flow Python-to-Verilog HLS

RTL Conversion IP-core

generation with AXI4 Interface

Top design synthesis with

AXI4

Page 14: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

FPGA Accelerator with PyCoRAM IP-core

Dec 7, 2013 Shinya T-Y. Tokyo Tech 14

Control Thread

FSM

AXI4 Interconnect

DRAM Controller

DRAM (Off-chip)

DMAC

AXI I/F

CoRAM Memory

CoRAM Memory

AXI I/F

CoRAM Memory

CoRAM Memory

DMAC DMAC

AXI I/F

CoRAM Stream

DMAC

AXI I/F

CoRAM Stream

CoRAM Channel

HW Kernels (Computing Logics)

DMA Cluster

DMA Cluster

FPGA

Other AXI

IP-core or

CPU

PyCoRAM IP (Application)

Page 15: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 15

PyCoRAM Microarchitecture

Page 16: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM Microarchitecture (Logical View)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 16

User I/O

User Logic

CoRAM Channel

CoRAM Register

Control Thread

DMAC

CoRAM Memory

DMAC

CoRAM Stream FSM

GPIO

Page 17: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM Microarchitecture (Logical View)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 17

User I/O

User Logic

CoRAM Channel

CoRAM Register

Control Thread

DMAC

CoRAM Memory

DMAC

CoRAM Stream FSM

GPIO Modeled in RTL (Verilog HDL)

Memory Access Pattern

in Python

Page 18: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM Microarchitecture (Physical View)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 18

PyCoRAM IP

AXI4 Interconnect

DRAM Controller FPGA

User I/O

User Logic

CoRAM Channel

CoRAM Register

Control Thread

DMAC

AXI I/F

CoRAM Memory

DMAC

AXI I/F

CoRAM Stream FSM

GPIO

Page 19: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM Microarchitecture (Physical View)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 19

PyCoRAM IP

AXI4 Interconnect

DRAM Controller FPGA

User I/O

User Logic

CoRAM Channel

CoRAM Register

Control Thread

DMAC

AXI I/F

CoRAM Memory

DMAC

AXI I/F

CoRAM Stream FSM

GPIO

AXI4 Master Interface

Control Thread in Python

Parameterized RTL Design Support

Page 20: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Control Thread in Python n  Operations for CoRAM objects

l  To/from CoRAM Memory •  Data movement pattern with DMA operations

between on-chip CoRAM memory and DRAM

l  To/from CoRAM Channel •  Token communication action

between user-logic and control thread

Dec 7, 2013 Shinya T-Y. Tokyo Tech 20

def calc_sum(times):� ram = CoramMemory(idx=0, datawidth=32, size=1024)� channel = CoramChannel(idx=0, datawidth=32)� addr = 0� sum = 0� for i in range(times):� ram.write(0, addr, 128)� channel.write(addr)� sum += channel.read()� addr += 128 * (32/8)� print(‘sum=’, sum)�calc_sum(8)�

# Transfer (off-chip DRAM to BRAM) # Notification to User-logic # Wait for Notification from User-logic # $display Verilog system task �

User I/O

User Logic

CoRAM Channel

Control Thread

DMAC

CoRAM Memory FSM

0�1�2�3�4�5�6�7�8�9�10�11�

Page 21: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

CoRAM objects in User Logic n  CoRAM objects as standard BRAM or FIFO

l  Very similar interface to the standard memory components

l  User-logic can use their contents in them in the same way

n  Essential parameters to define object characteristics l  Thread name, ID, data width, address length, …

Dec 7, 2013 Shinya T-Y. Tokyo Tech 21

CoramMemory1P�#(� .CORAM_THREAD_NAME("thread_name"),� .CORAM_ID(0),� .CORAM_ADDR_LEN(ADDR_LEN),� .CORAM_DATA_WIDTH(DATA_WIDTH)� )�inst_memory�(.CLK(CLK),� .ADDR(mem_addr),� .D(mem_d),� .WE(mem_we),� .Q(mem_q)� );�

(a) CoRAM Memory

CoramChannel�#(� .CORAM_THREAD_NAME("thread_name"),� .CORAM_ID(0),� .CORAM_ADDR_LEN(CHANNEL_ADDR_LEN),� .CORAM_DATA_WIDTH(CHANNEL_DATA_WIDTH)� )�inst_channel�(.CLK(CLK),� .RST(RST),� .D(comm_d),� .ENQ(comm_enq),� .FULL(comm_full),� .Q(comm_q),� .DEQ(comm_deq),� .EMPTY(comm_empty)� );�

(b) CoRAM Channel

Page 22: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

AXI4 Master Interface n  DMA controller works as AXI4 master IP-core interface

Dec 7, 2013 Shinya T-Y. Tokyo Tech 22

DMA Controller

・・・

DramAddr BramAddr

Size WrEn RdEn Busy

Ready

Control Thread

FSM

Deq

ue

Em

pty

RdD

ata

Alm

Full

Enq

ue

WrD

ata

Rea

dy

RdE

n

RdE

n

Siz

e

Add

r

AXI Master Interface (Protocol Conversion)

AXI4 Interconnect

WR

EA

DY

WVA

LID

WD

ATA

BVA

LID

AWR

EA

DY

AWVA

LID

AWLE

N

AWA

DD

R

RVA

LID

RR

EA

DY

RD

ATA

AR

RE

AD

Y

AR

VALI

D

AR

LEN

AR

AD

DR

Write Address Channel

Write Data Channel

Read Address Channel

Read Data Channel

HW Kernels (Computing Logic)

Add

r

WrD

ata

RdD

ata

WrE

n

CoRAM Memory (BRAM)

Add

r

WrD

ata

RdD

ata

WrE

n

Add

r

WrD

ata

RdD

ata

WrE

n

CoRAM Memory (BRAM)

Add

r

WrD

ata

RdD

ata

WrE

n

CoRAM Channel

Deq

ue

Em

pty

RdD

ata

Alm

Full

Enq

ue

WrD

ata

Deq

ue

Em

pty

RdD

ata

Alm

Full

Enq

ue

WrD

ata

・・・ DMA Cluster

Page 23: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

For Parameterized RTL design support

Dec 7, 2013 Shinya T-Y. Tokyo Tech 23

Dataflow

State Machine

n  Generate-statement support by advanced RTL analyzer l  Not supported by the original CoRAM compiler

n  Pyverilog: Python-based Tool-chain for Verilog HDL Design l  Parser

l  Dataflow Analysis

l  Optimization

l  RTL Code Generation

l  Control flow Analysis

l  Graphical Output

Page 24: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 24

Evaluation

Page 25: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Evaluation n  Point: Maximum memory bandwidth utilization

l  PyCoRAM is a memory abstraction framework

n  Setup l  2 FPGA boards

•  Digilent Atlys –  Spartan-6 LX45

–  DDR2-800 DRAM 128MB (1.2GB/s*)

–  AXI4 128-bit, 100MHz (1.6GB/s)

•  Xilinx ML605 –  Virtex-6 LX240T

–  DDR3-800 DRAM 512MB (6.4GB/s)

–  AXI4 256-bit, 200MHz (6.4GB/s)

l  EDK •  Xilinx Platform Studio (14.6)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 25

Xilinx ML605 (Xilinx Virtex-6 LX240T)

Digilent Atlys (Xilinx Spartan-6 LX45)

*Due to 300MHz operation

Page 26: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Evaluation: Application n  Array-sum: calculate summation value of an array

l  Two CoRAM memories as Double-buffered

l  Varying SIMD width (=# simultaneous ops) to check the effect to the memory bandwidth utilization

•  4, 8, 16, 32, 64 (bytes)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 26

s3

+

s2

+

s1

+

s0

+

MUX

D[0]

D[1]

D[2]

D[3]

D[0]

D[1]

D[2]

D[3]

sum

+

MUX

D[3] D[2] D[1] D[0]

Output

CoRAM Memory

0

CoRAM Memory

1 from DMA Controller 0 from DMA Controller 1

Page 27: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Memory Bandwidth Utilization n  Good bandwidth utilization

l  Atlys: 85.5% (at 16-byte)

l  ML605: 84.9% (at 64-byte)

n  Degradation reasons l  Sequential (single) transaction for each DMA controller

•  Memory latency directly affects the performance adversely

Dec 7, 2013 Shinya T-Y. Tokyo Tech 27

0

0.2

0.4

0.6

0.8

1

4 8 16 32 64

Ban

dwid

th U

tiliz

atio

n

SIMD size [byte]

0

0.2

0.4

0.6

0.8

1

4 8 16 32

Ban

dwid

th U

tiliz

atio

n

SIMD size [byte]

Atlys (Spartan-6) ML605 (Virtex-6)

Page 28: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Dec 7, 2013 Shinya T-Y. Tokyo Tech 28

Conclusion and …

Page 29: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

Conclusion n  PyCoRAM: Python-based implementation of CoRAM

memory architecture for modern FPGA EDKs

n  Future work l  Further evaluation on more realistic applications

l  AXI4 slave feature for control thread

l  Tutorial slideJ

Dec 7, 2013 Shinya T-Y. Tokyo Tech 29

Standard On-chip Interconnect

CoRAM Abstraction

Accelerator logic Standard IP-core

Device-dependent Interfaces

CPU core

Automatically managed by EDK

Portable application design with CoRAM Cooperation with standard IP-cores

Page 30: PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

PyCoRAM and Pyverilog are ready for public!

n PyCoRAM (0.7.0-public) l https://github.com/shtaxxx/PyCoRAM

n Pyverilog (0.6.0-public) l https://github.com/shtaxxx/Pyverilog

Dec 7, 2013 Shinya T-Y. Tokyo Tech 30

Thanks!