from dataflow specifications to customised … · opencl fpga model intel fpga sdk for opencl: ......

24
Universidad Politécnica de Madrid (UPM) School of Telecommunications Systems and Engineering (ETSIST) Research Center on Software Technologies and Multimedia Systems (CITSEM) From Dataflow Specifications to Customised Reconfigurable Datapaths Using HLS: the OpenCL Case for FPGAs Rubén Salvador [Kindly hosted by INSA: KDesnos, MPelcat, JFNezan, DMenard, LMorin…] Dataflow Workshop Rennes, 12-14 December 2017

Upload: tranquynh

Post on 28-Apr-2018

269 views

Category:

Documents


2 download

TRANSCRIPT

Universidad Politécnica de Madrid (UPM)School of Telecommunications Systems and Engineering (ETSIST)

Research Center on Software Technologies and Multimedia Systems (CITSEM)

From Dataflow Specifications to Customised Reconfigurable Datapaths

Using HLS:the OpenCL Case for FPGAs

Rubén Salvador[Kindly hosted by INSA: KDesnos, MPelcat, JFNezan, DMenard, LMorin…]

Dataflow WorkshopRennes, 12-14 December 2017

Rubén Salvador From Dataflow to Customised FPGA Datapaths 2

Context

Rubén Salvador From Dataflow to Customised FPGA Datapaths 3

Context

Rubén Salvador From Dataflow to Customised FPGA Datapaths 4

OUTLINE

Motivation

OpenCL FPGA

Dataflow Mapping On Top Of OpenCL FPGA

Implementation Steps

Rubén Salvador From Dataflow to Customised FPGA Datapaths 5

OUTLINE

Motivation

OpenCL FPGA

Dataflow Mapping On Top Of OpenCL FPGA

Implementation Steps

Rubén Salvador From Dataflow to Customised FPGA Datapaths 6

Dataflow: a (naive) view from a newcomer

Dataflow

A flow of data… …moves… …is transformed…

between (2) points in space

along its way

…and sinks.

Spatial Computing

datapathhardware

FPGA

architecturepoint to point comms

on-chip memory

Rubén Salvador From Dataflow to Customised FPGA Datapaths 7

Customised FPGA-based datapaths for dataflow graphs

A

B

C

Dataflow FPGA HLS

System Level Integration

(SW) developers loveWide

CommunityEmbrace

Rubén Salvador From Dataflow to Customised FPGA Datapaths 8

Conquering Computing Community Embrace

OpenCL Dataflow

Pros

• Functionally portable• Wide community acceptance• Support for HLS• …

• Graph analysis & Guarantees• Schedulability, deadlocks, FIFO

sizing• Concurrent execution model• Comms interaction

Cons

• No dataflow (streaming) friendly• Global memory comms

• Compute accelerator model• Data offload (writes/reads)• Throughput oriented (vs latency)

• Niche domain• Most work for multi/manycore

What can dataflow bring to the OpenCL community?

Rubén Salvador From Dataflow to Customised FPGA Datapaths 9

OUTLINE

Motivation

OpenCL FPGA

Dataflow Mapping On Top Of OpenCL FPGA

Implementation Steps

Rubén Salvador From Dataflow to Customised FPGA Datapaths 10

OpenCL: framework for heterogeneous/parallel computing

Work Group (WG) Work Item (WI)

Data parallelism

Task parallelism

SIMD

Rubén Salvador From Dataflow to Customised FPGA Datapaths 11

OpenCL FPGA Model

Intel FPGA SDK for OpenCL: Best

Practices Guide

https://www.altera.com/en_US/pdfs/lit

erature/hb/opencl-sdk/aocl-best-

practices-guide.pdf

https://www.altera.com/products/design-

software/embedded-software-

developers/opencl/developer-zone.html

Rubén Salvador From Dataflow to Customised FPGA Datapaths 12

OUTLINE

Motivation

OpenCL FPGA

Dataflow Mapping On Top Of OpenCL FPGA

Implementation Steps

Rubén Salvador From Dataflow to Customised FPGA Datapaths 13

Dataflow On Top Of OpenCL: SoC FPGAs

Desired Features& Expected Gains

Hardware acceleration (custom datapath)

Reduced processor (communication) overhead

Reduced memory transactions

Self-timed Execution

Rubén Salvador From Dataflow to Customised FPGA Datapaths 14

Dataflow On Top Of OpenCL FPGA

Recent (2017) proposal: add dataflow semantics to OpenCL standard

OpenCL Khronos Group Standard

Tool Expertise & Design Space Exploration

Leverage OpenCL FPGA constructs to generate

efficient dataflow

Dataflow-driven “OpenCL” code generation

OpenCL Community

Dataflow Community

Rubén Salvador From Dataflow to Customised FPGA Datapaths 15

MoCs semantics for OpenCL Pipes

Synchronous Dataflow (SDF) Bulk Synchronous Parallel (BSP)

MoCs semantics to OpenCL

Proposal for the OpenCL Standard

Kapre, Nachiket, and Hiren Patel. Applying Models of Computation to

OpenCL Pipes for FPGA Computing. Proc. 5th IWOCL. ACM, 2017.

a.k.a.: compiler’s job

OpenCL compute model + MoC Comms Schemes

Rubén Salvador From Dataflow to Customised FPGA Datapaths 16

OpenCL Increasing Streaming Support

Pipes (OpenCL 2.0)

Standard OpenCL Kernel-to-Kernel communication

Channels (Intel FPGA)

Preferred Kernel-to-Kernel communication

Host-Kernel Pipes

… only prototype demo so far

Kang, K., and P. Yiannacouras. Host Pipes: Direct Streaming Interface

Between OpenCL Host and Kernel. Proc. 5th IWOCL. ACM, 2017.

Overlap multi-kernel operation

Self-triggered kernels (free run decoupled from host)

Rubén Salvador From Dataflow to Customised FPGA Datapaths 17

Kernel Operation Possibilities

Autorun kernels

No host-kernel communication logic

Autostart & Auto-restart

Communicate through channels

Intel FPGA SDK for OpenCL: Best Practices Guide

https://www.altera.com/en_US/pdfs/literature/hb/opencl-

sdk/aocl-best-practices-guide.pdf

Rubén Salvador From Dataflow to Customised FPGA Datapaths 18

Channels

Channels

Kernel execution decoupled from host

Blocking/Non-blocking Read/Write API

Synchronization mechanisms

I/O Channels -> Streaming DSP

Intel FPGA SDK for OpenCL: Programming

Guide

https://www.altera.com/content/dam/altera-

www/global/en_US/pdfs/literature/hb/opencl-

sdk/aocl_programming_guide.pdf

Rubén Salvador From Dataflow to Customised FPGA Datapaths 19

OUTLINE

Motivation

OpenCL FPGA

Dataflow Mapping On Top Of OpenCL FPGA

Implementation Steps

Rubén Salvador From Dataflow to Customised FPGA Datapaths 20

Mapping (Pi)SDF Dataflow Graphs To OpenCL Model

1.- Actor Firing Rules within OpenCL Kernels

Check overhead vs. performance

2.- Code Generatrion from PREESM

2.1.- Actor firing rules - scheduler

Acceptable?

2.2.- FIFO analysis & Buffer generation

2.3.- Memory Access Optimization

Kernel

K

A

B

C

Rubén Salvador From Dataflow to Customised FPGA Datapaths 21

Mapping (PiSDF) Dataflow Graphs To OpenCL Model

2.1.- Actor firing rules (scheduler)

Actor I/O IFs, firing rules, templates

Kernel

K

Enough with channels sync?

Borrow from CAPH ¿?• Host code:

• Platform initialization: automatic• Job management: automatic

• input data & result data• "only" necessary for the host/device frontier• pointers mapped to device buffers

• Kernel code:

• I/O interfaces (firing rules): automatic• Functionality: manual (provided by user)

Rubén Salvador From Dataflow to Customised FPGA Datapaths 22

Mapping (PiSDF) Dataflow Graphs To OpenCL Model

2.1.- Actor firing rules (scheduler)

Kernel

K

2.2.- Buffer generation

Leverage current PREESM buffer generation

Pipes vs Channels vs Ad-hoc Buffer

2.3.- Memory Accesses Optimization

Streaming Dataflow

Shared/Global Memory

Local FPGA DDRs (kernel only)Different workloads?

Out-of-order accesses?

Enough with channels sync?

Borrow from CAPH ¿?

Actor I/O IFs, firing rules, templates

Rubén Salvador From Dataflow to Customised FPGA Datapaths 23

future(future)

3.- Hack the Flow

Kernel (actor) functionality Component library

Host-Device communications

Area / Latency / Throughput

Device Wrapper

Plug HDL / CAPH

Compute upper bound?

Open Run Time ?¿

Dynamic Reconfiguration ?¿

Intel Xeon + FPGA

Graph Reconfiguration

HPC community

DSE: Predictability

DSE: Predictability

4.- New devices

Rubén Salvador From Dataflow to Customised FPGA Datapaths 24

Thanks for your attention!!

[email protected]

https://twitter.com/RubenSalvadorP

http://blogs.upm.es/rubensalvador/