pulp project update · our risc-v cores §zero-riscy §rv32-icm §micro-riscy §rv32-ce §ariane...

23
2 Integrated Systems Laboratory 1 Department of Electrical, Electronic and Information Engineering PULP Project Update ORCONF 2018, Gdansk, Poland. 21.09.2018 Davide Rossi 1 , [email protected] Antonio Pullini 2 , Davide Schiavone 2 , Francesco Conti 1 , Florian Gasler 1 , Florian Zaruba 2 , Stefan Mach 2 , Germain Haugou 2 , Manuele Rusci 1 , Alessandro Capotondi 1 , Giuseppe Tagliavini 1 , Daniele Palossi 2 , Andrea Marongiu 1,2 , Fabio Montagna 1 , Victor Javier Kartsch Morinigo 1 , Simone Benatti 1 , Eric Flamand 2 , Frank K. Gürkaynak 2 , Luca Benini 1,2

Upload: others

Post on 18-Feb-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

2Integrated Systems Laboratory

1Department of Electrical, Electronicand Information Engineering

PULP Project Update

ORCONF 2018, Gdansk, Poland. 21.09.2018

Davide Rossi1, [email protected]

Antonio Pullini2, Davide Schiavone2, Francesco Conti1, Florian Gasler1,Florian Zaruba2, Stefan Mach2, Germain Haugou2, Manuele Rusci1, Alessandro Capotondi1, Giuseppe Tagliavini1, Daniele Palossi2, Andrea Marongiu1,2, Fabio Montagna1, Victor Javier Kartsch Morinigo1, Simone Benatti1, Eric Flamand2, Frank K. Gürkaynak2, Luca Benini1,2

Page 2: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

|| 22.09.18Davide Rossi 2

Computing fot the Internet of Things

Battery + Harvesting poweredà a few mW power envelope

Long range, low BW

Short range, medium BW

Low rate (periodic) data

SW update, commands

Transmit

Idle: ~1µWActive: ~ 50mW

Analyze and Classify

µController

IOs

1 ÷ 10 mW

e.g. CortexM

SenseMEMS IMU

MEMS Microphone

ULP Imager

100 µW ÷ 2 mW

EMG/ECG/EIT

Page 3: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

|| 22.09.18Davide Rossi 3

Which Computing Platform?

~ 1 bps

~1 Kbps

~100 Kbps

~1000 Kbps~ 1000 MOPS

~ 100 MOPS

~10 MOPS

<1 MOPS

SENSORBANDWIDTH

COMPUTATIONALDEMAND

COMPUTINGPLATFORM

e.g. CortexM0

e.g. CortexM3

e.g. CortexM4/M4F

?

Page 4: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

|| 22.09.18Davide Rossi 4

PULP: Parallel Computing For The IoT

The output of our research on PULP is open source (Solderpad License)

Compiler Infrastructure

Processor & Hardware IPs

Virtualization Layer

Programming Model

Low-Power Silicon Technology

Page 5: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

PULP Open-Source Releases and External Contributions

1 February 2016First release of PULPino, our single-core microcontroller

2

3

May 2016Toolchain and compiler for our RISC-V implementation (RI5CY), DSP extensions

August 2017PULPino updates, new cores Zero-riscyand Micro-riscy, FPU, toolchain updates

February 2018PULPissimo, ARIANE

1

2

4

September 2017Porting of ARM CMSIS to PULPinohttps://github.com/misaleh/CMSIS-DSP-PULPino

June 2017Porting of Verilator and BEEBS benchmarks to PULPinohttps://github.com/embecosm/ri5cy

December 2017STING: Open-Source Verification Environment for PULPinohttp://valtrix.in/programming/running-sting-on-pulpino

Releases Community Contributions

November 2017Numerous Bug fixes to RI5CY in PULPinohttps://github.com/pulp-platform/riscv

3

March 2018Open PULP, SDK, VP

4

5

September 2018Multi-Cluster PULP: Hero5

Page 6: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Platforms

InterconnectPeripheralsRISC-V Cores

The PULP family explained

Accelerators

Page 7: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

RISC-V Cores

We have developed several optimized RISC-V cores

RI5CY

32b

Microriscy32b

Zeroriscy32b

Ariane

64b

Page 8: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Our RISC-V cores

§ Zero-riscy§ RV32-ICM

§ Micro-riscy§ RV32-CE

§ Ariane§ RV64-ICM

§ RI5CY§ RV32-ICMX§ SIMD§ HW loops§ Bit

manipulation§ Fixed point

§ RI5CY+FPU§ RV32-ICMFX

Low Cost Core

Linux capable Core

Core with DSP enhancements

Floating-point capable Core

32 bit 64 bit

ARM Cortex-M0+ ARM Cortex-M4 ARM Cortex-A55ARM Cortex-M4F

Page 9: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

§ 4-stage pipeline, optimized for energy efficiency§ 40 kGE, 30 logic levels, Coremark/MHZ 3.19§ Includes various extensions (X) to RISC-V for DSP applications

RI5CY – Our workhorse 32-bit core

Page 10: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

§ Only 2-stage pipeline, simplified register file§ Zero-Riscy (RV32-ICM), 19kGE, 2.44 Coremark/MHz§ Micro-Riscy (RV32-EC), 12kGE, 0.91 Coremark/MHz§ Used as SoC level controller in newer PULP systems

Zero/Micro-riscy, small area core for control applications

Page 11: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

§ Tuned for high frequency, 6 stage pipeline, integrated cache§ In order issue, out-of-order write-back, in-order-commit§ Supports privilege spec 1.11, M, S and U modes§ Hardware Page Table Walker

§ Implemented in GF22nm (Poseidon), and UMC65 (Scarabaeus)§ In 22nm: 910 MHz worst case conditions

(SSG, 125/-40C, 0.72V)§ 8-way 32kByte Data cache and

4-way 32kByte Instruction Cache § Core area: 175 kGE

ARIANE: Our Linux Capable 64-bit core

7%8% 3%

21%

44%

9%

8%AreaPC GenIFIDIssueExReg FileCSR

Page 12: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

§ Full FPGA implementation§ Xilinx Vertex 7 – VC707§ Core: 50 – 100 MHz§ Core: 15 kLUTs§ 1 GB DDR3

§ Mapping to simpler boards underway

Ariane mapped to FPGA boots Linux

Page 13: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Accelerators

InterconnectPeripheralsRISC-V Cores

Only processing cores are not enough, we need more

RI5CY

32b

Microriscy32b

Zeroriscy32b

Ariane

64bAXI4 – InterconnectDMA GPIO

APB – Peripheral BusI2SUART

Logarithmic interconnectSPIJTAG

Neurostream(ML)

HWCrypt(crypto)

PULPO(1st order opt)

HWCE(convolution)

Page 14: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Platforms

Accelerators

InterconnectPeripheralsRISC-V Cores

The pulp-platforms put everything together

RI5CY

32b

Microriscy32b

Zeroriscy32b

Ariane

64bAXI4 – InterconnectDMA GPIO

APB – Peripheral BusI2SUART

Logarithmic interconnectSPIJTAG

Neurostream(ML)

HWCrypt(crypto)

PULPO(1st order opt)

HWCE(convolution)

R5

MI

O

interconnect

A

Single Core• PULPino• PULPissimo• Kerbin

Page 15: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

§ Simple design§ Meant as a quick release

§ Separate Data and Instruction memory§ Makes it easy in HW§ Not meant as a Harvard arch.

§ Can be configured to work with all our 32bit cores§ RI5CY, Zero/Micro-Riscy

§ Peripherals copied from its larger brothers§ Any AXI and APB peripherals

could be used

PULPino our first single core platform

PULPinoDataMem

RISC-Vcore

InstMem

I$

APB

-inte

rcon

nect

GPIO

AXI

-in

terc

onne

ct

BusAdapt

SPI M

UART

I2C

UART

SPI S

BootROM

Page 16: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

§ Shared memory§ Unified Data/Instruction Memory§ Uses the multi-core infrastructure

§ uDMA for I/O subsystem§ Can copy data directly from I/O to

memory without involving the core§ Support for Accelerators

§ Direct shared memory access§ Programmed through APB bus§ Number of TCDM access ports

determines max. throughput § Used as a fabric controller

in larger PULP systems

PULPissimo the improved single core platform

RI5CYIbuf/ I$

instr data

Event Unit

Tightly Coupled Data Memory Interconnect

MemBank

MemBank

MemBank

MemBank

MemBank

MemBank

uDMA

APB / Peripheral Interconnect

Clock / ResetGenerator

DebugUnit

FLLs

I/Ointfs

UARTSPII2SI2C

SDIOCPI

JTAG

HardwareAccelerator Ext

Page 17: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Coreplex

§ Still work in progress§ Current version is very simple§ Useful for in-house testing§ A more advanced version will

likely be developed soon.

Kerbin the single core support structure for Ariane

Kerbin

APB

-inte

rcon

nect

GPIO

AXI

-in

terc

onne

ct

BusAdapt

SPI M

UART

I2C

UART

SPI S

BootROM

Aria

neI$

D$

Timer

AXI2Per

Debug

FLL

Page 18: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

PULP in the IoT domain: Latest version Mr. Wolf

Page 19: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Mr. Wolf will (hopefully) solve problems

§ TSMC 40LP - 9mm2

§ 64 pin QFN package§ Cluster with

§ 8x RI5CY§ 2x shared FPUs § 64 kByte TCDM

§ SoC domain with§ micro-riscy core to control

power modes§ DC-DC converter (Dolphin)

to regulate power§ 512 kBytes L2

Page 20: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

GAP8: PULP-Based Industry IoT processor

Image, sound and vibrations pattern matching at unmatched

power efficiency for the smart IoT

GreenOFDM: Disruptive low power OFDM wireless

communication for the smart IoT

Applications

§ GAP-8 processor: § 8-cores PULP system§ Neural network based pattern matching engine§ Up to 8 GOPS, 300 MOPS for 1mW

§ Commercialized by GreenWaves Technologies French Startup§ Developed in collaboration with University of Bologna and ETH Zurich§ GAPUINO evaluation boards available: www.greenwaves-technologies.com

SW Defined Modem for any IoTwireless communication

Page 21: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Platforms

Accelerators

InterconnectPeripheralsRISC-V Cores

Finally for HPC applications we have multi-cluster systems

RI5CY

32b

Microriscy32b

Zeroriscy32b

Ariane

64bAXI4 – InterconnectDMA GPIO

APB – Peripheral BusI2SUART

Logarithmic interconnectSPIJTAG

M

I

Ocluster

interconnect

A R5R5R5

M MMMin

terc

onne

ct

cluster

interconnect

R5 R5R5R5

M MMM

cluster

interconnect

R5 R5R5R5

M MMM

cluster

interconnect

A R5R5R5

M MMMM

I

O inte

rcon

nect

Neurostream(ML)

HWCrypt(crypto)

PULPO(1st order opt)

HWCE(convolution)

R5

MI

O

inte

rcon

nect

A

Single Core• PULPino• PULPissimo

Multi-core• Fulmine• Mr. Wolf

Multi-cluster• Hero

IOT HPC

R5R5

Page 22: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

§ Many companies (we know of) are actively using PULP§ They value that it is silicon proven§ They like that it uses a permissive open source license

Silicon and Open Hardware fuel PULP success

§ GreenWaves Technologies§ Dolphin§ IQ Analog (14nm chips)§ Embecosm§ lowRISC§ Mentor Graphics§ Cadence Design Systems§ ST Microelectronics (IT,F)§ Micron§ Google§ IBM§ NXP

§ Stanford§ Cambridge§ UCLA§ CEA/LETI§ EPFL§ National Chia Tung

University§ Politecnico di Milano§ Politecnico di Torino§ Universita Roma I§ Instituto Superior Tecnico –

U. de Lisboa§ Fondazione Bruno Kessler

§ Zagreb FER§ Universita di Genova§ Istanbul Technical U. § RWTH Aachen§ Lund§ USI – Lugano§ Bar-Ilan§ TU-Kaiserslautern§ TU-Graz§ UC San Diego§ CSEM§ IBM Research

§ SIAE Microelectronica§ Advanced Circuit

Pursuit§ Shanghai Xidian

Technology§ SCS Zurich§ IMT technologies§ Microsemi§ Arduino§ RacyICs

Companies that are using/evaluating PULP Research Centers/Universities using PULP

Com

pani

es w

ith a

nnou

nced

pro

duct

s, b

usin

ess

Com

pani

es th

at u

se P

ULP

inte

rnal

ly o

r for

trai

ning

Com

pani

es e

xplo

ring

oppo

rtun

ities

Page 23: PULP Project Update · Our RISC-V cores §Zero-riscy §RV32-ICM §Micro-riscy §RV32-CE §Ariane §RV64-ICM §RI5CY §RV32-ICMX §SIMD §HW loops §Bit manipulation §Fixed point

||

Thanks for your attention!!!

www.pulp-platform.org

pulp.ethz.chwww-micrel.deis.unibo.it/pulp-project

https://twitter.com/pulp_platform

The fun is just [email protected]