solving the challenges of increasingly complex fpga ... · title: the xilinx all programmable...

26
© Copyright 2015 Xilinx . Sergei Storojev Tools and Design Methodology Applications Xilinx Solving the Challenges of Increasingly Complex FPGA-Designs Using a High-Level Synthesis Approach with Vivado HLS

Upload: others

Post on 25-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Sergei Storojev Tools and Design Methodology Applications Xilinx

Solving the Challenges of Increasingly Complex

FPGA-Designs Using a High-Level Synthesis

Approach with Vivado HLS

Page 2: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 2

Vivado HLS - key technology in Xilinx solution

Vivado HLS Fundamental Concepts

Case Study: Multichannel FIR Filter Architectural Exploration

Demo

Agenda

Page 3: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Vivado HLS - Key Technology in Xilinx Solution

Page 4: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 4

Xilinx Technology Evolution

Programmable Logic Devices Enables Programmable Logic

All Programmable Devices Enables All Programmable & Smarter Systems

28nm

Page 5: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 5

Xilinx Multi-Node Product Portfolio Offering

Increasing Performance & Integration

2M LC 4.4M LC

Page 6: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 6

Vivado Design Suite Technology Advantages

Integrated Design Environment

De

bu

g a

nd

An

aly

sis

Sh

are

d S

ca

lab

le D

ata

Mo

de

l Scalable to 100M Gates

IP and System-centric

Integration with

Fast Verification

Fast, Hierarchical and

Deterministic Closure

Automation with ECO

Accelerating

System Integration

Accelerating

Implementation

Page 7: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 7

Accelerates Algorithmic C to RTL IP integration

Comprehensive Integration with

the Xilinx Design Environment

VHDL or Verilog

System IP Integration

C, C++ or SystemC

RTL Implementation

Micro Architecture Exploration

Algorithmic Specification

Rapid RTL architecture exploration via Directives

Co-optimization with RTL synthesis for optimal QoR

Generates AXI4-based IP for Vivado IP Integrator

Libraries

Libraries:

Arbitrary Precision

Video, OpenCV

Math

Linear algebra

DSP: FFT and FIR

Case Study Traditional C-based Acceleration

Design Time Radar Design

60 days 5 days 12x

Verification Time Video Design (10 frames)

~2 days 10 sec 12,000x

Page 8: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 8

Design Integration: IP Centric Design Flow

IP Catalog

C-based IP Creation

Libraries

Arbitrary Precision

Video, OpenCV

Math

Linear algebra

DSP: FFT and FIR

System Integration

Vivado IP Integrator

C/C++, SystemC, OpenCL

VHDL or Verilog

System Generator for DSP

Vivado RTL

Page 9: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 9

All Programmable SoCs & Vivado HLS

SW Spec HW Spec

Requirements

Verify Iterate Verify Iterate

Accelerates Algorithmic C to Co-Processing Accelerator Integration

Page 10: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 10

Vivado High-Level Synthesis Serves a Wide Range of Applications across Markets

Communications

LTE MIMO receiver

Advanced wireless antenna

positioning

Audio, Video, Broadcast 3D cameras

Video transport

Consumer 3D television

eReaders

Aerospace and Defense Radar, Sonar

Signals Intelligence

Industrial, Scientific, Medical Ultrasound systems

Motor controllers

Automotive

Infotainment

Driver assistance

Computing & Storage High performance computing

Database acceleration

Test & Measurement Communications instruments

Semiconductor ATE

Page 11: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 11

Vivado High-Level Synthesis Serves a Wide Range of Applications across Markets

Communications

LTE MIMO receiver

Advanced wireless antenna

positioning

Audio, Video, Broadcast 3D cameras

Video transport

Consumer 3D television

eReaders

Aerospace and Defense Radar, Sonar

Signals Intelligence

Industrial, Scientific, Medical Ultrasound systems

Motor controllers

Automotive

Infotainment

Driver assistance

Computing & Storage High performance computing

Database acceleration

Test & Measurement Communications instruments

Semiconductor ATE

Page 12: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 12

Accelerated Development for Hardware Engineers

“The combination of Vivado IPI and HLS has been invaluable to our

development. The combination of these abstractions allowed us to

develop our algorithms in C++ and rapidly integrate the resulting IP,

saving greater than 15X in development costs versus an RTL approach.“

~Ties Bos, director of Software and FPGA at Gainspeed, Inc.

Vivado IPI and HLS

Page 13: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 13

All Programmable Abstractions Programming FPGA/SoC at Software Abstraction Level

Platform Integrator

System Engineer

HW Designer

DSP SW Engineer

System Engineer

Faster time to

differentiation

& revenue

Libraries

State of the Art

Implementation Co-optimized with Architecture

HLS System

Generator IP Integr.

Platform &

IP Integration

HW/SW Interface (drivers…)

Vivado Design Suite

SDxTM

SDAccelTM, SDNetTM,

MathWorks Zynq Design

C models / TLM

IP HLS IP

RTL IP

System Engineer

SW Engineer

Page 14: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Vivado HLS Fundamental Concepts

Page 15: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 15

Vivado HLS

C based code Abstract, no clocks!

Specific, timed

IP

?

Accelerates Algorithmic C to RTL Creation

Page 16: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 16

Vivado HLS – Synthesis

Accelerates Algorithmic C to RTL Creation

FSM Datapath

MUL

ADD

DIV

FPU

C, C++, SystemC OpenCL C Abstract, untimed

Target optimized, timed,

Connectivity ready Connectable IP / Verified RTL

• Directives / Pragmas

• Constraints

• Libraries

• Arbitrary Precision

• Video

• Math

• Linear algebra

• IP: FFT and FIR

Coding style impacts hardware realization

Page 17: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 17

Loops Optimization: Latency & Throughput Examples

void foo_top (...) { ... add: for (i=0;i<=3;i++) { b = a[i] + b; ...

Default: 4 cycles

Unroll: 1 cycle

0 1 2 3

clk

0 1 2 3

clk

Un

roll

ing

P

ipe

lin

ing

void foo_top (...) { ... add: for (i=1;i=<2;i++) { op_READ; op_COMPUTE; op_WRITE; } ...

w/o pipelining

pipelining

READ

clk COMPUTE WRITE

READ COMPUTE WRITE

READ

clk COMPUTE WRITE

READ COMPUTE WRITE

loop latency = 6

loop latency = 4

throughput = 3

throughput = 1

Page 18: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 18

Arrays – Partitioning and Reshaping

Arrays are the fundamental construct to describe memories…

– Array accesses can often be performance bottlenecks

Partitioning splits an array into independent arrays

– Arrays can be partitioned on any of their dimensions

Example: ARRAY_PARTITION set to “cyclic”, factor of 2

Initial array … becomes two arrays

Reshaping combines array elements into wider containers

– Different arrays into a single physical memory

– New RTL memories are automatically generated without changes to C code

0 1 2 … N-3 N-2 N-1 1 … N-3 N-1

0 2 … N-2

Page 19: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 19

Vivado HLS – Interfaces

Accelerates Algorithmic C to RTL Creation

Connectable IP Ready to use

IP block

C based code

• Directives / Pragmas

• AXI

• FIFO Interface

• BRAM Interface

• Default HLS protocols

• Handshake Interface

Abstract, untimed

Page 20: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 20

Vivado HLS – Verification

Accelerates Algorithmic C to RTL Creation

C, C++ Testbench and C, C++, SystemC

C simulation

C / RTL Co-simulation

• GCC

• G++

• SystemC

Connectable IP / Verified RTL

•Xsim •ISim •Questa SIM •VCS •NCSim •Riviera •OSCI

Abstract, untimed testbench

Automatic generation of

RTL testbench

Single testbench for

C-Sim and RTL-Sim

Creating a self-checking test bench is

highly recommended!

Page 21: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 21

5 steps process to improve your design

Design Methodology (UG902)

● ● ●

Page 22: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Case Study:

Multichannel FIR Filter Architectural Exploration

Page 23: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 23

Goal: explore different architecture solutions

– Targeting ZYNQ 7045

– At 50 MSPS fixed data rate, 81 coefficient filter

– 32-bit integer or 32-bit Floating-Point (FP) I/O samples and coefficients

13 different architectures considered

– Fixed the 50 MSPS data rate, 3 different clock frequencies:

• 50, 150, 300 MHz with respectively II=1, II=3, II=6

FIR Filter

Odd-symmetric

25x18 bits integer math accuracy (II=3)

32x32 bits integer math accuracy (II=3, II=6)

32-bit floating point accuracy (II=6)

Asymmetric

25x18 bits integer math accuracy (II=1, II=3, II=6)

32x32 bits integer math accuracy (II=1, II=3, II=6)

32-bit floating point accuracy (II=1, II=3, II=6)

Systolic MAC FIR Filter

Page 24: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 24

Overall Performance Summary

II MHz BRAM DSP48 FF LUT

25x18 asym MAC FIR 1 50 0 243 6467 612425x18 asym MAC FIR 3 150 0 81 16778 928525x18 asym MAC FIR 6 300 0 41 19528 1256025x18 odd-sym MAC FIR 3 150 0 41 12258 7890

32 float asym MAC FIR 1 50 0 1701 156066 29721732 float asym MAC FIR 3 150 0 567 87177 13062932 float asym MAC FIR 6 300 0 123 85451 9195932 float odd-sym MAC FIR 6 300 0 63 62171 54835

32x32 asym MAC FIR 1 50 0 972 16037 896832x32 asym MAC FIR 3 150 0 324 27688 1350632x32 asym MAC FIR 6 300 0 164 27529 1824232x32 odd-sym MAC FIR 3 150 0 164 18304 1093232x32 odd-sym MAC FIR 6 300 0 84 18642 13183

ZYNQ 7045 max resources 1090 900 437200 218600

expectedArchitecture

Syntesis Estimation

To accomplish the work

• RTL Approach will take days

• Vivado HLS: 6 hours only

4 hours: C coding

2 hours: Tcl scripting, HLS Execution

Page 25: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Demo

Page 26: Solving the Challenges of Increasingly Complex FPGA ... · Title: The Xilinx All Programmable PowerPoint Template Author: Sergei Storojev Keywords: Public Created Date: 1/19/2015

© Copyright 2015 Xilinx .

Page 26

Vivado HLS

Key technology in Xilinx solution

Reduces >10x design time and >100x verification time

Enables rapid RTL architecture exploration for optimal QoR

Conclusion