ucb ace lab - risc-v · 2016-09-20 · uc berkeley...
TRANSCRIPT
![Page 1: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/1.jpg)
UC Berkeley
The Berkeley Out-‐of-‐Order Machine (BOOM!): Computer Architecture Research Using an
Industry-‐CompeBBve, Synthesizable, Parameterized RISC-‐V Processor
Christopher Celio, Krste Asanovic, David PaLerson
2015 [email protected]
Tuesday, June 30, 15
![Page 2: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/2.jpg)
UC Berkeley What is BOOM?
§ superscalar, out-‐of-‐order processor wriLen in Berkeley’s Chisel RTL
§ It is synthesizable§ It is parameterizable§ We hope to use it as a plaQorm for architecture research
2
BOOM is a work-in-progress. Results shown in the talk are preliminary and subject to
change!
Tuesday, June 30, 15
![Page 3: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/3.jpg)
UC Berkeley Other Berkeley RISC-‐V Processors
3
§ Sodor CollecBon- RV32I -‐ Bny, educaBonal, not-‐synthesizable
§ Z-‐scale - RV32IM -‐ micro-‐controller
§ Rocket- RV64G -‐ in-‐order, single-‐issue applicaBon
core§ BOOM
- RV64G -‐ out-‐of-‐order, superscalar applicaBon core
Tuesday, June 30, 15
![Page 4: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/4.jpg)
UC Berkeley Why OoO?
§ Great for ...- tolera'ng variable latencies- finding ILP in code (instruc'on-‐level parallelism)- complex method for fine-‐grain data prefetching- plays nicely with poor compilers and lazily wri<en code
4
Performance!
Tuesday, June 30, 15
![Page 5: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/5.jpg)
UC Berkeley OoO widely used in industry
§ Intel Xeon/i-‐series (10-‐100W)§ ARM Cortex mobile chips (1W)§ Intel Atom§ Sun/Oracle Niagara UltraSPARC§ Play Sta'on
5Tuesday, June 30, 15
![Page 6: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/6.jpg)
UC Berkeley Academic OoO Research
§ general lack of effort in academia to build, evaluate OoO designs
§ most research uses so[ware simulators- cannot produce area, power numbers- hard to trust, verify results- McPAT is calibrated against 90nm Niagara, 65nm Niagara 2, 65nm Xeon, and 180nm Alpha 21364
- very slow§ Other Academic OoO RTL efforts...- Illinois Verilog Model, Princeton Sharing Architecture, NCSU FabScalar (Alpha, PISA)
- other ISAs can be very challenging to implement fully- rely on SW simulators for performance numbers- hopefully RISC-‐V can make everybody’s lives easier!
6Tuesday, June 30, 15
![Page 7: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/7.jpg)
Perf (CoreMark/s) vs. Area (um2)
UC Berkeley Design-‐space exploraKon
§ Very preliminary§ Parameters- fetch width- issue width- ROB size- IW size- LSU size- Regfile size- # of branch tags
§ 3x range in area§ 2x range in performance
7
data collected byOrianna DeMasi
1wide
2wide4wide
pareto curve
Tuesday, June 30, 15
![Page 8: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/8.jpg)
UC Berkeley Research Methodology
§ Which benchmarks?§ How many cycles do we need to run?§ State of the art
- “SimPoints”- run 4-‐10 snapshots per SPEC2000/2006 benchmark- each snapshot runs for ~10M instrucBons
§ What other people do (ISCA 2014 results)-~50M instrucKons / workload-~200B instrucKons / paper
§ What we can do-map design to an FPGA- run 50 MHz (~1T cycles/6hrs)- run full reference benchmark (~2 Trillion instrucBons avg)- run on FPGA cluster (~1-‐2 weeks simulaBon in one day, or ~30-‐60T instrucKons/day) 8
Tuesday, June 30, 15
![Page 9: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/9.jpg)
UC Berkeley
Berkeley Architecture Research Infrastructure
§ RISC-‐V ISA§ Chisel HCL (hardware construcBon language)§ Rocket-‐chip SoC generator
9Tuesday, June 30, 15
![Page 10: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/10.jpg)
UC Berkeley The RISC-‐V ISA is easy to implement!
§ relaxed memory model§ accrued FP excepBon flags§ no integer side-‐effects (e.g., condiBon codes)§ no cmov or predicaBon§ no implicit register specifiers - JAL requires explicit rd
§ rs1, rs2, rs3, rd always in same space-allows decode, rename to proceed in parallel
10Tuesday, June 30, 15
![Page 11: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/11.jpg)
UC Berkeley The RISC-‐V ISA
§ BOOM supports “M” (mul/div/rem)- imul can be either pipelined or unpipelined
§ BOOM supports “A” -AMOs+LR/SC
§ BOOM supports “FD” - single, double-‐precision floa'ng point- IEEE 754-‐2008 compliant FPU- SP, DP FMA with hw support for subnormals
§ RV64G
11Tuesday, June 30, 15
![Page 12: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/12.jpg)
UC Berkeley Rocket-‐Chip SoC Generator
12
§ open-‐source§ taped out 10 Bmes by Berkeley
§ runs at 1.6 GHz in IBM 45nm§ makes for a great library of processor components!
Tuesday, June 30, 15
![Page 13: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/13.jpg)
UC BerkeleySupports Privileged ISA (“S”), Virtual Memory
§ boots Linux!§ just released Privileged ISA v1.7§ instant to update- Privileged ISA nearly en'rely isolated to Control/Status Register (CSR) File, TLBs
- updated git submodule pointers- changed “tohost” to “mtohost” in one line
13Tuesday, June 30, 15
![Page 14: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/14.jpg)
UC Berkeley Chisel
§ Hardware Construc'on Language embedded in Scala
§ not a high-‐level synthesis language§ hardware module is a data structure in Scala
§ Full power of Scala for wri'ng generators- object-‐oriented programming- factory objects, traits, overloading
- funcBonal programming- high-‐order funs, anonymous funcs, currying
§ generated C++ simulator is 1:1 copy of Verilog designs
14Tuesday, June 30, 15
![Page 15: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/15.jpg)
UC Berkeley Chisel Hardware ConstrucKon Language
§ object-‐oriented, funcBonal programming§ powerful for wriBng hw generators§ 12 days (+1092 loc) to add SP,DP floaBng point § 9 days (+900 loc) to go from no VM to booBng Linux
15Tuesday, June 30, 15
![Page 16: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/16.jpg)
UC Berkeley BOOM
16
in-‐orderfront-‐half
out-‐of-‐orderback-‐half
Fetch Decode &Rename
Issue Window
Unified PhysicalRegister
File
Functional Unit
Tuesday, June 30, 15
![Page 17: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/17.jpg)
UC Berkeley BOOM
17
§ PRF - explicit renaming- holds specula've and commi<ed data- holds both x-‐regs, f-‐regs
§ Unified Issue Window- holds all instruc'ons
§ split ROB/issue window design
Fetch Decode &Rename
Issue Window Unified
PhysicalRegister
File (PRF)
FPU
ALU
Rename Map Tables & Freelist
ROB
Commit
Tuesday, June 30, 15
![Page 18: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/18.jpg)
UC Berkeley Parameterized Superscalar
18
OR
val exe_units = ArrayBuffer[ExecutionUnit]()exe_units += Module(new ALUExeUnit(is_branch_unit = true , has_fpu = true , has_mul = true ))exe_units += Module(new ALUMemExeUnit(fp_mem_support = true , has_div = true ))Issue
SelectRegfileWriteback
dual-issue (5r,3w)
bypassing
ALU
div
LSUAgen D$
bypassing
ALU
FPU
bypassnetwork
RegfileRead
imul
exe_units += Module(new ALUExeUnit(is_branch_unit = true))exe_units += Module(new ALUExeUnit(has_fpu = true , has_mul = true ))exe_units += Module(new ALUExeUnit(has_div = true))exe_units += Module(new MemExeUnit())
Issue Select
RegfileWriteback
Quad-issue (9r,4w)
ALU
div
LSUAgen D$
ALU
imul
FPU
ALU
bypassing
bypassnetwork
RegfileRead
Tuesday, June 30, 15
![Page 19: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/19.jpg)
UC Berkeley Full Branch SpeculaKon Support
§ next-‐line predictor (NLP)- BTB, BHT, RAS- combinaBonal
§ backing predictor (BPD)- global history predictor- SRAM (1 r/w port)
19
Branch Prediction
I$
Fetch Buffer
Fetch1
μDecPC1
Fetch2
NLP
ExeBrTarget
NPC
Front-end
BPD
PC2
Front-end
TakePC
BHT Target >>
Tuesday, June 30, 15
![Page 20: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/20.jpg)
UC Berkeley Load/Store Unit
§ load/store queue with store ordering- loads execute fully out-‐of-‐order wrt stores, other loads- store-‐data forwarded to loads as required
§ non-‐blocking data cache
20Tuesday, June 30, 15
![Page 21: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/21.jpg)
UC Berkeley Synthesizable§ Runs on FPGA- (Zynq zedboard and Zynq zc706)
§ 2GHz (30 FO4) in TSMC 45nm- speed of logic (SRAM is slower)
212-wide BOOM layout.
Regfile
LLC Data
LLC Data (256k)ROB
D$ (32k)
Issue
Uncore
I$ (32k)Rename
Exe
Exe
I$
bpd
RenUncore
1.7mm2 @ 45nm
preliminary resultsTuesday, June 30, 15
![Page 22: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/22.jpg)
UC Berkeley Benefits of using Chisel§ ~9,000 loc in BOOM github repo§ addiBonal ~11,500 loc instanBated from other libraries- ~5,000 loc from Rocket core repository- func'onal units, caches, PTWs, etc.
- ~4,500 loc from uncore- coherence hubs, L2 caches, networks, host/target interfaces
- ~2000 loc from hardfloat- floa'ng point hard units
22Tuesday, June 30, 15
![Page 23: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/23.jpg)
UC Berkeley Feature Summary
23
Feature BOOM
ISA RISC-V (RV64G)
Synthesizable √FPGA √
Parameterized √floating point √
AMOs+LR/SC √caches √
VM √Boots Linux √Multi-core √
lines of code 9k + 11k
Tuesday, June 30, 15
![Page 24: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/24.jpg)
UC Berkeley That’s BOOM!
24
Issue Select
RegfileWriteback
Quad-issue (9r,4w)
ALU
div
LSUAgen D$
ALU
imul
FPU
ALU
bypassing
bypassnetwork
RegfileRead
Tuesday, June 30, 15
![Page 25: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/25.jpg)
Category ARM Cortex-A9 RISC-V BOOM-2w
ISA 32-bit ARM v7 64-bit RISC-V v2 (RV64G)
Architecture 2 wide, 3+1 issue Out-of-Order 8-stage
2 wide, 3 issue Out-of-Order 6-stage
Performance 3.59 CoreMarks/MHz 3.91 CoreMarks/MHz
Process TSMC 40GPLUS TSMC 40GPLUS
Area with 32K caches
~2.5 mm2 ~1.00 mm2
Area efficiency 1.4 CoreMarks/MHz/mm2 3.9 CoreMarks/MHz/mm2
Frequency 1.4 GHz 1.5 GHz
UC Berkeley Comparison against ARM
25
Category ARM Cortex-A9 RISC-V BOOM-2w
ISA 32-bit ARM v7 64-bit RISC-V v2 (RV64G)
Architecture 2 wide, 3+1 issue Out-of-Order 8-stage
2 wide, 3 issue Out-of-Order 6-stage
Performance 3.59 CoreMarks/MHz 3.91 CoreMarks/MHz
Process TSMC 40GPLUS TSMC 40GPLUS
Area with 32K caches
~2.5 mm2 ~1.00 mm2
Area efficiency 1.4 CoreMarks/MHz/mm2 3.9 CoreMarks/MHz/mm2
Frequency 1.4 GHz 1.5 GHz
+9%!
2-wide BOOM layout.
Regfile
LLC Data
LLC Data (256k)ROB
D$ (32k)
Issue
Uncore
I$ (32k)Rename
Exe
Exe
I$
bpd
RenUncore
note: not to scale
preliminary resultsTuesday, June 30, 15
![Page 26: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/26.jpg)
0
1.00
2.00
3.00
4.00
5.00
6.00CoreMark/MHz
Cor
eMar
k/M
Hz
UC Berkeley Industry Comparisons
26
BOOM
-‐4w
BOOM
-‐2w
Rocket
in-‐orderprocessors
out-‐of-‐orderprocessors
Ivy Bri
dge
Cortex
-‐A15
Cortex
-‐A9
Cortex
-‐A8
Cortex
-‐A5MIPS74
k
preliminary resultsTuesday, June 30, 15
![Page 27: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/27.jpg)
UC Berkeley Industry Comparisons
27
Processor Core Area CoreMark/MHz Freq (MHz) IPC
Intel Xeon E5 2668 (Ivy) ~12 mm2@22nm 5.60 3,300 1.96
ARM Cortex-A15 2.8 mm2@28nm 4.72 2,116 1.50
BOOM-4wide 1.1 mm2@45nm 4.70 1,000 1.50
BOOM-2wide 0.8 mm2@45nm 3.91 1,500 1.26
ARM Cortex-A9 2.5 mm2@40nm 3.59 1,400 1.27
MIPS 74K 2.5 mm2@65nm 2.50 1,600 -
Rocket (RV64G) 0.5 mm2@45nm 2.32 1,500 0.76
ARM Cortex-A5 0.5 mm2@40nm 2.13 - -
48x
preliminary resultsTuesday, June 30, 15
![Page 28: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/28.jpg)
UC Berkeley Ivy Bridge Tile Comparison
282-wide BOOM layout.
Regfile
LLC Data
LLC Data (256k)ROB
D$ (32k)
Issue
Uncore
I$ (32k)Rename
Exe
Exe
I$
bpd
RenUncore
2-wide BOOM layout.
Regfile
LLC Data
LLC Data (256k)ROB
D$ (32k)
Issue
Uncore
I$ (32k)Rename
Exe
Exe
I$
bpd
RenUncore
Ivy Bridge-‐EP Tile (32kB/32kB + 256kB caches)
~12nm @ 22nm
BOOM-‐2w Chipscaled to 0.4mm2 @ 22nm
BOOM-2w Chip (32kb/32kB + 256kB caches) 1.7mm2 @ 45nm
preliminary resultsTuesday, June 30, 15
![Page 29: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/29.jpg)
UC Berkeley Synthesis Results
290
100000
200000
300000
400000
500000
600000
700000
No FPU FPU BOOM-1w BOOM-2w BOOM-4w
Core Area (um^2)
Issue UnitRename Stage (maptables)RRd Stage (bypasses)Register FileROBLSUBr PredictorFreelistBusyTableFetchBufferFPUImulOther
Rocket
BOOM
preliminary resultsTuesday, June 30, 15
![Page 30: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/30.jpg)
UC Berkeley Synthesis Results
30
Issue Unit Rename Stage (maptables)RRd Stage (bypasses) Register FileROB LSUBr Predictor FreelistBusyTable FetchBufferFPU ImulOther
0
100000
200000
300000
400000
500000
600000
700000
No FPU BOOM-1w BOOM-4w
Core Area (um^2)
0
200000
400000
600000
800000
1000000
1200000
Rck-I Rck-G BOOM-1wBOOM-2wBOOM-4w
Tile Area (um^2)
CoreI$ (16 KB)D$ (16 KB)
preliminary resultsTuesday, June 30, 15
![Page 31: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/31.jpg)
UC Berkeley Lessons§ RISC-‐V is a great ISA
- it gets out of your way- the instrucBon count difference is greater between gcc versions than between ISAs
§ code-‐reuse is great- leveraging exisBng Rocket-‐chip infrastructure
§ Way too much of my Bme is wasted on corralling benchmarks-we should share our efforts- hLps://github.com/ccelio/Speckle/- make generaBng portable SPEC CPU2006 easy
§ Debugging is hard- good verificaBon tests are more valuable than good RTL- use asserts EVERYWHERE- use an ISA simulator in parallel with RTL simulaBon
31Tuesday, June 30, 15
![Page 32: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/32.jpg)
UC Berkeley “Speckle” -‐ a wrapper for SPEC CPU2006
§ SPEC is designed to be run naBvely- a pain for cross-‐compiling, running on a simulator or FPGA
§ If you have a copy of CPU2006...-modify the provided cfg file- Speckle will compile and generate a portable directory of binaries, input files, and input arguments, and a run script
§ hLps://github.com/ccelio/Speckle/
32Tuesday, June 30, 15
![Page 33: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/33.jpg)
UC Berkeley Conclusion§ BOOM supports full RV64G + privileged ISA (VM support)§ Able to boot Linux and run CoreMark, SPECINT, and Dhrystone benchmarks
§ BOOM is 9,000 loc and 3 person-‐years of work
§ Future Work- bring-‐up more interes'ng applica'ons- add ROCC interface- explore new µarch designs- tape-‐out this fall- open-‐source by winter workshop
33Tuesday, June 30, 15
![Page 34: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/34.jpg)
UC Berkeley QuesKons?
34Tuesday, June 30, 15
![Page 35: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$](https://reader033.vdocuments.net/reader033/viewer/2022060212/5f04f9217e708231d410a044/html5/thumbnails/35.jpg)
UC Berkeley Funding Acknowledgements
35
§ Research par*ally funded by DARPA Award Number HR0011-‐12-‐2-‐0016, the Center for Future Architecture Research, a member of STARnet, a Semiconductor Research Corpora*on program sponsored by MARCO and DARPA, and ASPIRE Lab industrial sponsors and affiliates Intel, Google, Huawei, Nokia, NVIDIA, Oracle, and Samsung.
§ Approved for public release; distribu*on is unlimited. The content of this presenta*on does not necessarily reflect the posi*on or the policy of the US government and no official endorsement should be inferred.
§ Any opinions, findings, conclusions, or recommenda*ons in this paper are solely those of the authors and does not necessarily reflect the posi*on or the policy of the sponsors.
Tuesday, June 30, 15