building a universal silicon compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... ·...

26
Building a Universal Silicon Compiler Andreas Olofsson Program Manager Defense Advanced Research Project Agency (DARPA) The Salishan Conference on High Speed Computing, Gleneden Beach, Oregon April 24 Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

Building a Universal Silicon Compiler

Andreas OlofssonProgram Manager

Defense Advanced Research Project Agency (DARPA)

The Salishan Conference on High Speed Computing,Gleneden Beach, Oregon

April 24

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Page 2: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

2

Modern day hardware design has a cost problem!

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

10K

1M

100

100M

10B

Complexity is the root cause!

Page 3: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

3

Why can’t we build low volume systems at low cost?

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Disruption

Source: General Dynamics

Source: Wikipedia

Page 4: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

4

The last 50+ Years of Moore’s Law

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

1946 Today

Speed, mph (S) 78 102

Efficiency, mpg (E) 14.6 22

Cost, $K (C) 1.7 27

(S * E ) / C 669 83

1946 Today

Speed, OPS/S (S) 359 17.9e15

OPS/W (E) 0.002 2e9

Cost, $M (C) 6.5 97

(S * E ) / C 0.11 3e23

https://en.wikipedia.org/wiki/Titan_(supercomputer)https://www.caranddriver.com/chevrolet/cruze

https://www.allpar.com/history

https://en.wikipedia.org/wiki/ENIAC

Page 5: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

5

The Next 50 Years of Moore’s Law? (2015-2065)

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)https://en.wikipedia.org/wiki/HAL_9000

1 Nonillion FLOPS

1 ExaWatt(@ 1 TFLOPS/W)

= 100,000 x Earth’s Power

Budget

𝐸𝐸𝑀𝑀𝑀𝑀𝑀𝑀−𝐵𝐵𝑀𝑀𝐵𝐵 = 𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘 = 3 𝑥𝑥 10−21 JoulesWe need to keep reaching for the Landauer Limit!

Page 6: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

6

1970’s: Heroic human chip design efforts

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

• Intel 4004 (1971-1981)• 10-um feature size• 2,300 transistors Rubylith operators

Source: https://en.wikipedia.org/wiki/Intel_400

Source: http://www.computerhistory.org/revolution/artifact/287/1614

Page 7: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

7

1980’s: Tackling chip design complexity with CAD

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Birth of Modern EDA• Intel 80386 (1985-2007)• 1-um feature size• 275,000 transistors

Synthesis

Layout Systems

Place and Route

Frameworks

Source: https://en.wikipedia.org/wiki/Intel_80386Source: http://opencircuitdesign.com/magic/ Source: Introduction to VLSI

systems by Carver Mead

Synopsys Cadence Mentor Graphics

Source: http://venividiwiki.ee.virginia.edu/mediawiki/Source: https://en.wikipedia.org/wiki/Logic_synthesis

Page 8: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

8

1990’s-Today: Barely keeping up with complexity

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

• NVIDIA V100 (2017-)• 0.012um feature size• 21,000,000,000 transistors

“It took several thousand engineers several years to create, at a development cost of $3 billion.“–Jensen Huang

Death by a million papercuts…functional correctness, security, safety, reliability, performance, IP integration, power management, firmware, system integration, wire delays, placement, routing, clocking, design rules, antenna effects, ESD, muti voltage domains, power gating, multi threshold, floor-planning, I/O structures, flip-chip, wirebonding, 2.5D packaging, TSVs, RDL routing, area minimization, routing congestion, on-chip variability, self heating, stress and proximity effects, electro migration, SEUs, signal integrity, power delivery networks, decoupling, modeling, low voltage operations, cooling, scan insertion, BIST, ATPG, STA, yield optimization, power minimization, and all the EDA tools needed to make this work..

Source: https://nvidianews.nvidia.com/file?fid=59129280a138351b9447113c

Page 9: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

9

Epiphany-V: My last chip before joining DARPA

ValueProcess TSMC 16FF+Transistors 4.5BDie Area 117 mm2

Flip Chip Bumps 3,460I/O Signals 1,040Clock Domains 1,152Voltage Domains 2,052

ValueFrequency 500Mhz*32 bit Performance (Peak) 2 TFLOPS64 bit Performance (Peak) 1 TFLOPSMemory Bandwidth 16 TB/secNOC Bisection Bandwidth 0.75 TB/secTypical Power ~10WMinimum Power 1mW

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

A. Olofsson, “EPIPHANY-V: A TFLOPS scale 16nm 1024-core 64-bit RISC Array Processor”, HOTCHIPS29, 2017

Page 10: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

10

Epiphany-V: Performance Comparisons

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

GPU CPU1 CPU2 E5Process 16nm 14nm 14nm 16nmProcessor Nodes 56 72 24 1024Area (mm2) 610 683 456 117Transistors 15.3B 7.1B 7.2B 4.5BPower (W) 250 245 145 ~10TFLOPS 4.7 3.6 1.3 1GFLOPS/mm2 7.7 5.27 2.85 8.55GFLOPS/W 18.8 14.69 9.08 100SRAM/mm2 0.034 0.05 0.15 0.54Nodes/mm2 0.09 0.11 0.05 8.75

5X

79x

World leading performance at less than $1M design cost

Page 11: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

11

Epiphany-V: Low Cost Hardware Design

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

MONTH01 02 03 04 05 06 07 08 09 10 11 12

Architecture

RTL

Verification

IP

EDA work

Floorplan

Optimization

Tapeout

“Server Farm”:• One 2010 Dell PowerEdge

T610 with a quad-core Xeon 5500 and 32GB DDR3

• One RTL to GDS EDA license

Designer Responsibility Effort (hours)

Contractor A FPU 200Contractor B Verification 200Contractor C EDA Services 112Ola Jeppsson Simulator/SDK 500Andreas Olofsson Everything else 4,100

Why is low cost chip development not common place?

Page 12: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

12

…why I joined DARPA

Modern Software Compilation

A Universal No Human In the Loop Hardware Compiler

Software compilers moved beyond humans in the loop 50 years ago!

Synthesis/Generators

Source Code, Schematics, Constraints

IDEAIntelligent No Human In the Loop Layout Engine

Linker

Knowledge DrivenDomainSpecificModels

QualifiedLayout

Final layout

Partial layout

Circuits

QualifiedCircuits

QualifiedSource Code

A no-human in the loop hardware compiler addresses cost, schedule, resource, trustchallenges in current SoC design cycle.

POSH

Source: mattturck.com/bigdata2017 Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Page 13: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

How is hardware “compilation” (layout) handled today?

13

Package Design• Excel spreadsheet input• 1K signals• 100% manual labor• 2-4 experts• 2-8 weeks

Digital Design:• Verilog netlist input, constraints, scripts• 10M-1B signals• 99% automated place and route• 1-100 experts• 3-18 months

Analog Design:• Schematic input• 1K-100K signals• 100% EDA assisted manual labor• 2-4 experts• 3-18 months

Board Design:• Schematic input• 1-10K signals• 100% EDA assisted manual labor• 2-4 experts• 3-6 months

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)Sources: Axis, AdaptevaSources: Intel, NVIDIA, Adapteva

Sources: EETimes Sources: Analog Devices, Raspberry Pi

Page 14: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited) 14

Why now? What has changed?

Traditional Algorithmic EDA Research

1988

1987

1983

1993-2018 (stable evolutionary progress)Optimization algorithms, Productivity & Integration

Ref-2003: The Tides of EDA, Alberto Sangiovanni-Vincentelli

• ML Algorithm Innovations• Data driven• Massive compute (Moore’s Law)• Replacing existing heuristics/humans

A New Machine Learning Based EDA Approach

Can we map a layout cost function to ML?Can we access/label enough quality data?

SynopsysCadence Mentor Graphics

Source: NVIDIA

Page 15: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

Assign Strategies

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited) 15

Why now? (how this approach is different)

Today IDEADesigner provides manual constraints to layout person (or EDA tool)

Centroid Mirroring Isolation

Common Vocabulary of Strategies

VDD

IBIAS

VSS

VIN VIP

VOUT

Place dummies, interdigitize

Commoncentroid layout

Max 10µm from main supply, 0.5µm width

VDD

IBIAS

VSS

VIN VIP

VOUT

Circuit Classifier

Auto-Placement

Auto-Routing

Model Training

Novelty:Auto create layout constraints by classifying circuit patterns and applying strategies from knowledge database.

Page 16: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

IDEA Program Objective

16Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

IDEA will create a no-human-in-the-loop hardware compiler for translating source code to layouts of

System-On-Chips, System-In-Packages, and Printed Circuit Boards in less than 24 hours

Machine generated chip and package layout

Intent driven system generation

Machine generated board layout

Image source: Raspberry Pi

Page 17: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

Reinventing Board Development

17

Open Parts DB

Intent PruningSc

hem

atics

F1 F2

F1 F2

GoalOptimizer

Sche

mat

ic

LayoutGenerator

IDEA

Today• 100% Manual• Error prone• Rarely optimal

ManualPart Selection Manual Schematic Manual Layout

Sche

mat

icsSystem

Generator

New Concept: Machine synthesized board from intent and open COTS parts library.

Potential SolutionsInexact Description

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Source: Raspberry Pi

Page 18: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

An Open 5M+ Component IC Database

18

• 5M+ parts in circulation• Information embedded in

datasheets and reference designs• No standard models• Automatic optimization not possible

Today

• IC standard models (LEF,LIB,IP-XACT)• Extend standards for boards / SIPs• Creation of 5M+ part DB• Model all properties needed for

constraint based system optimization

IDEA

ConnectorsActive Passives

Root

ResIndDiode Trans IC

MemProc ADC DAC PMIC

Ind

SRAM ONFI DRAM

DDR4 DDR3 DDR2

CapacityWidthFreq, PowerTempPackageCost

TypeToleranceTemp CoffTemp RatingVoltage ratingPackageCostInventoryObsolecence

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Page 19: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

19

Reinventing the Chip IP Stack

Example Software

Stack

LINUX

MySQL

Billion Dollar Company Code

User Content

$15B+Open

SourceInfiniteLayerStack

Software Current SoCHardware Design

IP Vendors

Chip

Com

pany

D

Chip

Com

pany

B

Chip

Com

pany

A

Chip

Com

pany

C

MemCache

Thrift

Cassandra

Apache

PhP

Jenkins

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

POSH will create a viable open source hardware design and verification ecosystem that enables cost effective

design of ultra-complex SoCs.

Page 20: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

The State of Open Source Hardware

20Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Still a long way to go!

Page 21: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

21

How is Hardware Different from Software?

Software Hardware

Programmers Millions Thousands

Writing Code Easy Hard

Reading Code Hard Very hard

Debugging Hard Near impossible

Cost of bugs Low Very high

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

What technologies are needed to make open source hardware viable?

Page 22: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

22

Selection of Open Source Library ComponentsDigital Circuit IP Blocks

FPGA FabricMulti-core 64-bit RISC-V processor sub-systemGPU (OpenGL ES 3.0)PCI Express ControllerEthernet ControllerMemory ControllersUSB 3.0 ControllerMIPI Camera Serial Interface controllerCPU SubsystemH264 encoder/decoderAES256 encrypt/decryptSHA-2/SHA-3 acceleratorSecure Digital ControllerHigh Definition Multimedia InterfaceSerial ATA ControllerJESD204B ControllerNAND Flash ControllerCAN Controller

Mixed Signal Circuit IP Blocks Description

Standard I/O interfaces PHYs

DDR, PCIe, SATA, USB, XAUI, CPRI

PLL Range: 10MHz – 10GHz

DLL Range: 10Mhz – 10GHz

Analog to Digital Converters Range: 1 – 10,000 MSPS

Digital to Analog Converters Range: 1 – 10,000 MSPS

Voltage Regulators Input: 1.8V – 12V, Output 0.25V – 1.8V

Monitor circuits Temperature, voltage, process

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

How can we cost effectively develop and maintain a high quality catalog of

portable open source digital and analog components?

Page 23: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

23

IDEA/POSH End State – A Universal Hardware Compiler

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Source: Shutterstock

Source: Amazon

Source: NVIDIA

Page 24: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

24

Creating a viable low volume electronic ecosystem

$1

$10

$100

$1,000

$10,000

$100,000

$1,000,000

$10,000,000

$100,000,000

$1,000,000,000

1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09

MON

EY

UNITS SOLD

$0.1/chip $1/chip $10/chip $100/chip $1000/chip $1K NRE

$10K NRE $100K NRE $1M NRE $10M NRE $100M NRE $1B NBRE

Today

Tomorrow

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

Source: General Atomics

Image Sources: General Atomics, Tesla, Apple

Page 25: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

25

Specialization: The future of supercomputing!

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

GeneralPurpose

(Exascale Report)

ScientificMachine

(ref. Anton)

MachineLearning*

OPS (Exa) 1 1 1

DRAM (TB) 3,600 100 10

Power (MW) 68 2 0.2

Cost $500M $10M $2M

*$4/GB DRAM, 5 TOPS/W, shared MPW, IDEA/POSH working, market silicon pricing at 14nm

Source: IBM Source: Crayhttps://www.scientificcomputing.com

Page 26: Building a Universal Silicon Compilersalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2018... · 2019-11-18 · Building a Universal Silicon Compiler. Andreas Olofsson. Program Manager

www.darpa.mil

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited) 26