65nm rad-hard hi-rel high-performance dsp

25
Ramon Chips RC64 65nm Rad-Hard Hi-Rel Manycore High-Performance DSP for Software Defined Payloads 1 Ramon Chips is named in memory of Col. Ilan Ramon, Israeli astronaut who died on board the Columbia space shuttle, 1/2/2003 Ran Ginosar CEO, Ramon Chips Professor, EE, Technion—Israel DSP Day, 15 June 2016 © 2016

Upload: others

Post on 28-Apr-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 65nm Rad-Hard Hi-Rel Manycore High-Performance DSP for Software Defined Payloads

1

Ramon Chips is named in memory of Col. Ilan Ramon, Israeli astronaut who died on board the Columbia space shuttle, 1/2/2003

Ran Ginosar CEO, Ramon Chips Professor, EE, Technion—Israel

DSP Day, 15 June 2016

© 2016

Page 2: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

Contents

Ramon Chips RC64 Architecture & Implementation RC64 Programming Model RC64 Software RC64 Applications

Evening poster: MODEM on RC64 Tomorrow: Image processing on RC64

2 © 2016

Page 3: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

Ramon Chips Government funded, in Israel, since 2002 Make ITAR-free rad-hard hi-reliability high-performance

processors for space Deliver & support for 30 years Combined leadership & heritage in

– RH & HR – Semiconductors – Architecture – Software – Applications – Engineering – Production – Support

3 © 2016

Page 4: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 4

Present Products COBHAM GAISLER

GR712RC 2-core LEON3 Image

Compression

Plastic PQFP

OPSAT 3000

MASCOT on HAYABUSA-2

© 2016

Page 5: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 motivation

Meet and exceed the NG-DSP challenge – From 1 GFLOPS to 40 GFLOPS and 150 GOPS

Enable payload supercomputing for space – Replace ASICs, FPGAs, GPUs, CPUs – Planned for 30 years (2020-2050)

Chip 64 processors

Board many chips

Multi-boards supercomputer

5 © 2016

Page 6: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 64 DSP cores

– CEVA X1643 – 300 MHz, 40 GFLOPS,

150 GOPS HW scheduler Modem HW accelerators 4 Mbyte shared memory Fast I/O Rad-Hard, FDIR 65nm LP TSMC 10 Watt PBGA & CCGA 624 Designed for

SOFTWARE-DEFINED-PAYLOADS

6

Shared Memory

M M M M M M M M

SpFi/sRIO DDR2/3 AD/DA SpW NVM DMA

scheduler FEC

DSP

$

DSP

$

DSP

$

DSP

$ DSP

$ DSP

$

DSP

$

DSP

$

M M M M M M M M

M M M M M M M M

© 2016

Page 7: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 7

316 mm2 Wire-bonded, IO around periphery

20.4 mm

15.5

mm

(32+16) × 0.8 Gb/s

12 × 2 × 6.25 Gb/s

© 2016

Page 8: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 performance

DVB-S2 modem: 2 Gb/s transmit, 1 Gb/s receive FFT (complex 16 bit fixed-point): 150 GOPS FFT (complex SP FP): 18 GFLOPS

None of these use DDR3 external memory.

Only streaming

10 Watt

8 © 2016

Page 9: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 9

RC64 vs other space processors

© 2016

Presenter
Presentation Notes
Maestro was designed by Boeing, funded by DTRA (US Government) as part of the OPERA program. It contained 49 cores from Tilera and was intended for fabrication on IBM 90nm SOI-CMOS process [Malone, MAPLD 2009]. The project seems to have ceased in 2011 and there have been no publications since then on any progress. Tilera, the IP provider, was acquired by EZchip and no longer supports the IP core. ESA-21469 was an attempt by ESA (NGDSP effort) to buy license for the 21469 single-core DSP from Analog Devices (USA) and convert it into a space ASIC in Europe. The project was aborted due to very high cost of the license [analyzed by John Franklin, Astrium-UK as described on ESA DSP Day in 2012 and explained in presentations by Dr. Roland Trautner in ESA DSP Day 2014]. Proton 200k is a USA-made card with TI TMS320C6713 running as “temporal TMR” [Czajkowski & McCartha, “Ultra Low-Power Space Computer Leveraging Embedded SEU Mitigation,” IEEE Aerospace, 2003]. SSDP (Scalable Sensor Data Processor) is ESA-funded project by TAS-E and Recore to create a rad-hard DSP. Its first generation will include two Recore DSP cores and implement in 180nm CMOS using IMEC DARE library [DSP_Day_2014_-_SSDP_Development_Status_TASE.pdf, ESA DSP day, Sept 2014] RADSPEED was a hardened version of the UK-based ClearSpeed CSX700 SIMD floating point accelerator [J. Marshall, MAPLD 2011]. ClearSpeed has ceased operations.
Page 10: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 Power Dissipation

64 DSP cores (300 MHz): 8 Watt 12 SpFi links (6.25 Gbps): 2 Watt Power is scalable by # cores, frequency, I/O

– One core at 150 MHz, no SpFi: 60 mWatt

Power—Performance – 0.1 mW / MFLOP

10,000 MFLOPS / Watt

– 0.05 mW / MOP (Add or Mult) – 0.1 mW / MegaMAC

10,000 MegaMAC / Watt 20,000 MegaOPS / Watt

10 © 2016

Page 11: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 Package Options

Thermal cycling control by HW & SW – Temp sensors on-chip, SW maintains fixed temp – Mitigation of column / ball shearing due to cycles

1. PBGA 624 – Wire bonded

2. CCGA 624 – Wire bonded

3. CLGA 624

11 © 2016

Page 12: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 12

Pinout

CONFIDENTIAL © 2016

Page 13: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 RC64 RC64 RC64

RC64 RC64 RC64 RC64

RC64 RC64 RC64 RC64

RC64 RC64 RC64 RC64

CTRLCPU

CTRLCPU

CTRLCPU

CTRLCPU

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

13

Software-Defined Payload

D/A

C

ON

VER

TER

S

A/D

C

ON

VER

TER

S

© 2016

Page 14: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 14

Software-Defined Payload

RC64 RC64 RC64 RC64

RC64 RC64 RC64 RC64

RC64 RC64 RC64 RC64

RC64 RC64 RC64 RC64

CTRLCPU

CTRLCPU

CTRLCPU

CTRLCPU

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

DDR3

FLASH

SSMM CNTL

COMPRESSEX

PLO

IT

ENCO

DEM

ODU

LATE

CAM

ERA

IN

ENCRYPT

D/A

C

ON

VER

TER

S

A/D

C

ON

VER

TER

S

© 2016

Page 15: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 Near-Term Development Plan

15

2016 2017 2018 2019

Eval Board

SDK & SysSW

RC64 design Fab Test Screening Qualification

VPX Board 1

Reference Applications

Simulators

FPGA

Board 2

© 2016

Page 16: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 Task-Oriented Programming Model Shared memory (PRAM)

– No message passing among processors

Single program, bare metal, no OS – Many-core, not multi-core

Code in TWO parts: – Task-dependency-graph – Sequential task codes

16

Shared memory

I/O duplicable task lock-free sharing

© 2016

Presenter
Presentation Notes
Duplicable tasks are designed for data parallelism: Written once, executed in many copies (instances) Each instance receives a unique number (when dispatched by the scheduler). The instance number enables the task code in accessing its unique data in shared memory. Each core can execute any task and any instance. The scheduler dispatches a task (or instance) by sending TWO numbers to a core: code entry point, and instance number. When the core completes execution, it sends a token back to the scheduler.
Page 17: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

Code Example

Convert (independent) loop iterations for ( i=0; i<10000; i++ ) {

a[i] = b[i]*c[i];

}

into duplicable (parallel) tasks set_task_quota(ABC, 10000)

void ABC(id) { a[id] = b[id]*c[id]; }

duplicable ABC Task graph

Each task is sequential !

17 © 2016

Instance number

Presenter
Presentation Notes
Task map is a single box in this case (shadow means duplicable task) Each instance computes ONE data element. Number of instances equals number of data elements. Rather than equal to the number of cores !
Page 18: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 18

RC64 SW Development Tools

Compiler, ASM, Linker

Task Compiler

Parallel Program Simulator

Compiler tool chain

Parallel Programming

Parallel Program Profiler

Event Recorder (time stamp

tracer)

Optimization & performance

tuning

Parallel DSP Kernels & Libraries

Core DSP Libraries

RC64 Cycle-Accurate

Simulator

Simulation Many Core Debugger

© 2016

Page 19: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64 Run Time Model

19

Hardware (RC64 and Peripherals)

RC64 HW DMA Engines RC64 HW Scheduler

IO API Many-Task API Boot

Application Tasks Network

Messaging Host

Command Control

Message Routing

Error Correcting DDR and Flash

MP (Multi- RC64)

HW

Kernel

System Services

Distributed Executive

Boot and

FDIR

© 2016

Page 20: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 20

Transparent / Regenerative SW-Defined PLD

MO

D

MO

D

DEM

OD

D

EMO

D

DEM

OD

REGENERATIVE PACKET

PROCESSOR

RF

RF

RF

RF

ADC

ADC

ADC

ADC

DEMUX

DEMUX

DEMUX

DEMUX

Transparent Switch

Level CTRL

Level CTRL

Level CTRL

Level CTRL

MUX

MUX

MUX

MUX

DAC

DAC

DAC

DAC

RF

RF

RF

RF

Digital Beam

Forming Network

MO

D

MOD ENCOD

CONTROL SPECTRUM MONITOR

DEMOD DECOD

© 2016

Page 21: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

EOS Software-Defined Payload

21

Sensors

Solid State Mass Memory

DDR2/3

RC64 Protocol

conversion

RC64

memory control

RC64 Compress

RC64 Encrypt

RC64 Encode & Modulate

DAC

RC64 Exploitation

© 2016

Page 22: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

RC64

CLK

22

3U VPX cards

RC64

DD

R3

FLAS

H

CLK

RC64

DD

R3

FLAS

H

CLK

ADC

ADC C

LK

RC64

DD

R3

FLAS

H

CLK

DAC

DAC

CLK

RC64

DD

R3

FLAS

H

CLK

ADC

ADC

CLK

COMPUTE

WB TRANSMITTER

RECEIVER

WB RECEIVER

Page 23: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 23

3U VPX 4X card

RC64

CLK

RC64

RC64

RC64 DDR3

FLASH

Page 24: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips

Summary

High performance DSP/CPU for space 64 cores, large shared memory, high speed I/O Simpler shared memory programming For software-defined payloads HW: chips, boards, multi-board modules SW: SDK, System SW, Reference Applications ES 2017

EM 2018 FM 2019

Evening poster, another talk tomorrow

24 © 2016

Page 25: 65nm Rad-Hard Hi-Rel High-Performance DSP

Ramon Chips 25

www.ramon-chips.com

© 2016