multi processor systems with configurable hardware ... · streaming applications with regular...

22
DEIS University of Bologna Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri

Upload: others

Post on 21-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEIS

University of Bologna

Multi processor systems with configurable hardware acceleration

Ph.D in Electronics, Computer Science and Telecommunications

Ph.D Student:Davide Rossi

Ph.D Tutor:Prof. Roberto Guerrieri

Page 2: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Outline

Motivations Electronics systems requirements and issues

The Morpheus Platform Heterogeneous multi-core Reconfigurable platform

The Manyac Platform Homogeneous and regular multi-core platform Configurable and reconfigurable acceleration

Results Programming productivity Performance (area, power) Impact on manufacturing costs

Page 3: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Motivations (1)

New generation embeddedapplications are pushing signalprocessing systems to improve: Performance

Energy efficiency

Flexibility

Programmability

Time to market*source ITRS

*source ITRS

Page 4: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Motivations (2)

4

*source SEMATECH

*source PHILIPS

Increase of products development costs (NRE):

Design costs Front-end Implementation Verification Testing Software development

Mask costs Significant impact on

small volume products

Page 5: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Morpheus: Main Goals Programming legacy through: ARM Processor acting as system supervisor

Flexibility and performance gain through three heterogeneous reconfigurable processing cores: Fine grain fabric (Abound Logic Flexeos eFPGA)

Medium grain fabric (STMicroelectronics DREAM)

Coarse grain fabric (PACT XPP-III)

Programming productivity through: High level programming approaches for

reconfigurable engines

5

Page 6: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Morpheus: Architecture

AMBA (main bus)

ARM9

DREAM

PCM

XPP

eFPGA

MainMem

ConfMem

ExternalMemory Controller

AMBA (configuration bus)

BridgeDNA

NoC

ARM core Standard peripheral set

3 communication domains Synchronization and control:

Main bus (AHB)

Data transfers: 8-nodes 64-bit NoC (STNoC)

Configuration: Configuration bus (AHB)

Hardware services: Predictable Configuration

Manager (PCM)

Direct Network Accesses (DNA)

4 Domains Dynamic Frequency Scaling

NOC

Page 7: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Morpheus: Reconfigurable engines Encapsulated into three independent clock islands

Local buffers act as domain crossing mechanism (DPDC memories)

PACT XPP Coarse grain device (16-bit) Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language)

DREAM Medium grain computation intensive device (4-bit) Iterative applications with complex addressing patterns

Programming: Griffy-C

eFPGA Fine grain device (1-bit LUT)

Applications handling bit-manipulations, configurable I/O

Programming: VHDL

7

Page 8: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Morpheus: Chip description and Measurements

XPP subsystem:Max Freq@1V: 150 MHz

Dynamic power: 7,5 mW/MHz

DREAM subsystem:Max Freq@1V: 200 MHz

Dynamic power: 2,1 mW/MHz

eFPGA subsystem:Max Freq@1V: 100 MHz

Dynamic power: 0,8 mW/MHz

PACT XPP

DREAM

eFPGA

PCM

ARM

C.

MEMM.

MEM

Technology: CMOS090GP Supply voltage: 1V Transistor count: 97 M Chip area: 110 mm2

Static power: 235 mW Max frequency 250 MHz Peak power: 3W

ARM DOMAINMax Freq@1V: 250 MHz

Dynamic power: 2.4 mW/MHz

Page 9: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Manyac: Main Goals

Flexibility and Programmability through: Multi-processor approach

Performance gain trough: Application specific hardware accelerators

Programming/design productivity through: High level programming approach based on OpenCL Automatic synthesis of accelerators from high-level

language (Griffy-C)

Reduction of costs through: Platform-based design approach Regular replication of identical tiles Regular silicon structures for implementation of

accelerators

9

Page 10: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Manyac: Architecture Regular replication of

identical computational tiles + one IO tile

Communication: ring topology NoC (STNoC)

3 Hierarchy levels memory infrastructure: Private memory Local memory Global memory

Hardware synchronization Hardware accelerators

Regular gate arrays

10

The architectural parameters are configurable at design time

Page 11: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Manyac: Configurable Hardware Accelerators(ST Microelectronics)

Pipelined datapaths targeting three kinds of configurable gate array: Run-time programmable gate array

Routing and functionalities are programmed through SRAMs

Post-fabrication programmability

Via-programmable gate array Routing and functionalities are

programmed through one via layer Customization: 1 metal layer

Metal-programmable gate array Functionalities are mapped on a

library of metal programmable cells Customization: 9 metal layers

11

customizations through VIAs

customizations through metals

Page 12: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Manyac: Programming Model

12

Based on OpenCL Sequential code executes on

a host processor Parallel and hardware

accelerated code executes on the parallel device

Two programming models Data parallel (Homogeneous) Task parallel (Heterogeneous)

Hardware accelerated functions are encapsulated within parallel kernels and tasks

Page 13: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Manyac: Design environment

13

OpenCL compiler Allocates function and

variables according to OpenCL qualifiers

Generates host and device code

TLM simulation platform High level exploration of

architectural parameters

RTL platform Cycle-accurate simulation

platform Entry point for physical

implementation

Griffy environment Accelerators design,

simulation models and implementation

Page 14: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Manyac: Implementation Technology:

CMOS40LP, 1.1V

Configuration Technology: Metal programmable

CT Area: Post Layout: 0,8 mm2

Metal Programmable area (targeting motion detection application): Post Layout: 0,2 mm2

4 Tiles Cluster Area: Post Synthesis: ~5 mm2

Max frequency (post layout): 250 MHz (wc, 125°C, 1.0V)

Power consumption : 45 mW@250MHz (nc, 25°C, 1.1V)

14

Computational tile area breakdown by logic entity

Computational tile layout

Page 15: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Results: analysis of programming productivity

15

Programming effort required to implement signal processing application on different

computational platforms

Objective: evaluate programming productivity improvement due to high level approaches

Efforts are estimated according with programming language tables based (*SPR) on the Function Point Analysis (FPA) extended to VHDL language

Griffy-C and NML treated as ASM

Reduction of design effort with respect to VHDL:

1,3x ÷ 2x

Language

Average Source Statements per

FP

Productivity Average per Staff

MonthC 128 9 FP

ASM 213 5 FPVHDL 19 18 FP

Page 16: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Results: Morpheus performance

16

PERFORMANCE (GOPS)

ENERGY EFFICIENCY (GOPS/W)

Application fields selected for characterization: Image processing (Edge

detection, Binarization, Rgb2YUV)

Video processing ( Motion Estimation, Motion Compensation)

Telecommunications (CRC, AES, Ethernet)

Performance (measured): 1,6 ÷ 15 GOPS

Energy efficiency (measured): 2,7 ÷ 52,9 GOPS/W Reduction of dynamic power due

to frequency scaling: 1.5x ÷ 5.5x

Page 17: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Results: Manyac performance by configuration technology

17

Technology node: CMOS65LP 8 cores platform All figures are estimated Std-cell based accelerators:

Performance: 5,5 ÷ 25 GOPS Energy efficiency: 24 ÷ 113 GOPS/W Area efficiency: 0,6 ÷ 3 GOPS/mm2

Metal programmable accelerators overhead is negligible

Via programmable accelerators Performance overhead: 1,25x Energy efficiency: 2,9x Area efficiency: 4,7x

Run-time programmable accelerators Performance overheads: 1,25x Energy efficiency: 3,7x Area efficiency: 10x

Page 18: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Results: Manyac manufacturing costs by configuration technology

18

MANUFACTURING COST PER CONFIGURATION TECHNOLOGY

TECHNOLOGY NODES TRENDS

Assumptions: Technology node: CMOS65LP 5 customizations (or re-spins) of

the same platform

Run-time programmable and via programmable technologies are convenient only for very low market volumes Run-time programmable: <5K

pieces Via programmable: 5K ÷ 12K

pieces)

Metal programmable technology is convenient for larger market volumes

Perspectives: As technology nodes scale

reconfigurable technologies are becoming even more convenient

Page 19: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Conclusion Two multi-core platforms with configurable/ reconfigurable

acceleration have been presented: The Morpheus platform (heterogeneous, reconfigurable) The Manyac platform (homogeneous, configurable)

Improvement of design/programming productivity due to highlevel approaches: 1,3x ÷ 2x

Multi-processor systems with accelerators implemented onreconfigurable and structured ASIC technologies are able toprovide high performance, still showing some overhead interms of power and area with respect to traditional standard-cell based approach.

The proposed approaches provide an effective way to reducemanufacture costs, especially for low volume products.

19

Page 20: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Collaborations

The PhD is in collaboration with STMicroelectronics

Collaborations within 2 European projects: MORPHEUS (FP6) MODERN (ENIAC)

Page 21: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

PublicationsBook chapters:

N. Voros et al. “Dynamic System Reconfiguration in Heterogeneous Platforms”, Chapter 5: “The DREAM digital Signal Processor”, Chapter 8:” The MORPHEUS Data Communication and Storage Infrastructure”, Springer, 2009.

Conference Papers:

D. Rossi et al. “A Heterogeneous Digital Signal Processor Implementation for Dynamically Reconfigurable Computing”, CICC (Custom Integrated Circuit Conference), 2009.

D. Rossi et al. ”A Multi-Core Signal Processor for Heterogeneous Reconfigurable Computing”, International Symposium on System-on-Chip, Proceedings, 2009.

F. Campi et al. “RTL-to-Layout Implementation of an Embedded Coarse Grained Architecture for Dynamically Reconfigurable Computing in Systems-on-Chip”, Proceedings, 2009.

Journal Papers:

D. Rossi et al. , ”A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing”, JSSC IEEE Journal of Solid-State Circuits, 2010.

D. Rossi, C. Mucci, F. Campi, S. Spolzino, L. Vanzolini, H. Sahlbach, S. Whitty, R. Ernst, W. Putzke-Röming, and R. Guerrieri, “Application Space Exploration of a Heterogeneous Run Time Configurable Digital Signal Processor”, IEEE Transactions on Very Large Scale Integration (TVLSI) Systems, 2012.

Page 22: Multi processor systems with configurable hardware ... · Streaming applications with regular computation patterns Programming: NML (Natural Mapping Language) DREAM Medium grain computation

DEISUniversity of Bologna

Thanks for your attention