manage performance & power using 40-nm fpgasvaughn/papers/manage_power... · 2012. 4. 13. ·...
TRANSCRIPT
Seyi Verma
Vaughn Betz
Manage Performance & Power Using 40-nm FPGAs
Copyright © 2008 Altera Corporation
Agenda
� Design requirements
� Stratix® IV FPGAs: Highest performance and lowest power− Processing techniques
− Architectural innovations
− Quartus® II software CAD system
� High-end FPGA competitive comparison
Copyright © 2008 Altera Corporation
Design Requirements
Designer GoalsWhat You
Need FromYour FPGA
What You Need FromYour FPGA Software
Functionality High density
�Ease of use
�Maximum utilization
�Low compile times
Meet timing constraintsHigh performance
�Detailed constraint entry
�Powerful timing analysis tool
�Optimization of all timing requirements
Meet power budget Low power� Automatic power optimization
� Accurate power analysis/reports
Maximize performance and minimize power
Stratix IV High-End FPGA Highlights
Copyright © 2008 Altera Corporation
Stratix IV FPGAs� Highest density
− Up to 680K LEs
− Up to 22.4-Mbits internal RAM
− Up to 1,360 18x18 multipliers
� Highest bandwidth and performance− Up to 48 transceiver blocks operating
at up to 8.5 Gbps
− Maximum clock rates of 600 MHz
� Broad protocol support
− Including hard IP for PCI Express Gen1 and Gen2
� Lowest power− 40-nm process benefits including
0.9-V core voltage
− Programmable Power Technology (PPT)
− Quartus II PowerPlay technology
5
Managing Performance and Power Through Process Technology
Copyright © 2008 Altera Corporation
NMOS
PMOS
40-nm Process�Aggressive gate length
− Better density and higher speed than 45 nm
�Second-generation strained silicon
− Increases electron mobility by up to 30%
− Transistor level performance is up to 40% higher
�Strained silicon benefit can be converted to
− Higher speed
− Lower power (standby and dynamic)
In 40 nm, Altera converts benefits of strain to lower power
Copyright © 2008 Altera Corporation
Leading Edge Process Technology at 40 nm
� Advanced 40-nm process at 0.9 V− Lower voltage � reduces dynamic power by 33%
− 30% capacitance reduction � reduces dynamic power
− 39% shorter channel length compared to 65 nm �increases performance
� Multiple-gate oxide thicknesses (triple oxide) − Trade-off static power vs. speed per transistor
� Multiple-threshold voltages− Trade-off static power vs. speed per transistor
� Low-k inter-metal dielectric− Reduces dynamic power, increases performance
� Super strained silicon− Increases electron and hole mobility by 30%
− Balance between power and performance
� Copper interconnect− Increased performance, reduced IR drop
Best-in-class process for power and performance
40-nm
40 nm
Architecture Innovation:Better Performance, Reduced Power
Copyright © 2008 Altera Corporation
Design-Specific Power Optimization� Only a small fraction of logic is performance critical
Slack Histogram
Performancecritical
Not performance critical
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
0-10%
10-20%
20-30%
30-40%
40-50%
50-60%
60-70%
70-80%
80-90%
90-100%
Slack %
Nu
mb
er
of
co
nn
ec
tio
ns
Copyright © 2008 Altera Corporation
Programmable Speed vs. LeakageProgrammable Speed vs. Leakage
Note: A simple “model” showing Programmable Power Technology. Actual implementation varies and is patented.
Source
Body
Drain
Gate
0 V
< 0 V
High speed (HS)
Low power (LP)
High speed
Low power
VT
VBody
VBody – Automatically controlled by software
Copyright © 2008 Altera Corporation12
Programmable Power Technology
Performance where you need it, lowest power everywhere else
Logic array
High-speed logic
Timing criticalpath
Low-power logic
Unused low-power logic
Copyright © 2008 Altera Corporation
Quartus II Software Automatically Uses PPT
Placement and routing
Synthesis
Timing analysis
Tiles with timing slack���� low power
No Yes
All high-speed tiles
Mostly low-power tiles
Unused tiles ���� low power
Done?
Copyright © 2008 Altera Corporation
Stratix IV FPGA DDR3 Support
Interconnect Performance
DDR3>533 MHz/1067
Mbps
DDR2400 MHz/800
Mbps
QDR II 350 MHz
QDR II+ 400 MHz
RLDRAM II 400 MHz
LVDS 1.60 Gbps
DDR3 DIMM support
at 533 MHz through
read/write leveling
Read and write leveling and resynchronization capability
Read
Write
1 :2demux
Sync block
Sync block
DQ Hard IO 1 :2demux
Syncblock
Syncblock
Dynamic
on-chip
termination
Variable delay
for dynamic trace
compensation
Programmable
drive strength
and slew rate
control
I/O FeatureStratix IV FPGAs
Benefit
Dynamic On-Chip Termination (OCT)
���� Saves power
DDR3 Read/Write Leveling
����Required for
DIMM support
Variable I/O Delay ����Allows signal
de-skew
Copyright © 2008 Altera Corporation
Write Read
Dynamic OCT
�Write: Rs on, Rt off → Matching line impedance
�Read: Rs off, Rt on → Terminating far end
Stratix IV FPGA Memory chip Stratix IV FPGA Memory chip
Copyright © 2008 Altera Corporation
Power Reduction with DDR3and Dynamic OCT
� DDR3 consumes 30% lower power than DDR2
− DDR2 requires 1.8-V VCC rails
− DDR3 requires 1.5-V VCC rails
� Dynamic OCT reduces termination power by 1 W/72-bits
Save 1.9W per 72-bit DIMM at 1067 Mbps
Copyright © 2008 Altera Corporation
Stratix IV GX Embedded Transceivers
Data Rate Transceiver Power
3.125 Gbps 100 mW
6.375 Gbps 135 mW
8.5 Gbps 165 mW
Highest system bandwidth and power efficiency
� Up to 48 transceivers
− 3G, 6G, and 8G channels
� Comprehensive protocol support
AU61
Slide 17
AU61 In figure (which I can't change), please change the following:Delete extra "c" in TX phase compensation... box.Change 8b / 10b to 8B/10B Altera User, 24/06/2008
Copyright © 2008 Altera Corporation
Complete PCI Express Solution (Hard IP)
PCS/PMA
PCI Express Hard IP
Legend
Soft Logic
Integrated Gen1/Gen2 support
PCI Express Gen2
Feature x8 x4 x2
Bandwidth 40 Gbps 20 Gbps 10 Gbps
Dynamic Power 600 mW 440 mW 350 mW
� Gen1 and Gen2 hard IP protocol stack (x2, x4, and x8)
− Guaranteed timing closure− Low latency− Minimizes power− Saves logic and memory− No IP fee
Quartus II Software CAD Takes Full Advantage Of Silicon
Copyright © 2008 Altera Corporation
Timing Analysis and Closure
� Easy to use− Enter constraints via script or GUI− Extensive reports and visualization
� Powerful timing constraints− Uses SDC syntax− Complex clocking schemes− Source-synchronous design
� Complete analysis− Rise/fall delays− On-die variation (min/max)− Jitter models− Multiple corners (3 for Stratix IV FPGAs)
� Accurate delays− Full non-linear circuit simulator− Within 1% of SPICE, but 10,000 times
faster− Complex end-of-life effects modeled
� Full optimization− Optimize all timing constraints− Across all process corners
Copyright © 2008 Altera Corporation
Power: Analysis and Optimization
Designentry
Constraints
Speed ����Area ����Power ����
Place and route
Optimize power ����
PowerPlay power
analyzer
Power-optimized design �
Synthesis
Optimize power ����
Accurate power modeling
�Physics-based models
�Proven methodology and correlation
Accurate modeling enables good optimization
�Routing, logic, RAM, static
Copyright © 2008 Altera Corporation
Dynamic Power Optimization
0%
5%
10%
15%
20%
25%
30%
35%
40%
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77
Design
Dyn
am
ic p
ow
er
red
uc
tio
n v
s.
min
imu
m e
ffo
rt
Extra effort
Normal effort
On average,15%
dynamic power
reduction
On average,15%
dynamic power
reduction
Copyright © 2008 Altera Corporation
Higher Productivity Through Lower Compile Times
� Problem:− FPGA capacity outpacing CPUs
− 68x FPGA logic
− 16x CPU speed
− 4.3x gap
� Solution:− Faster algorithms
− Parallel code
− Incremental compile
0
100
200
300
400
500
600
700
1998 2000 2002 2004 2006 2008
Lo
gic
Ele
me
nts
(T
ho
us
an
ds
)
0
10
20
30
40
50
60
70
Re
lative
Sp
ecIN
T
Logic ElementsRelative CPU Speed
Quartus II Compilation Time History
100%
Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08
Quartus II Version
Re
lati
ve
Co
mp
ile
Tim
e (
Lo
g s
ca
le)
50%
75%
25%
QII 6.1
QII 4.0
QII 8.0
Stratix
Stratix II
Stratix III
Stratix III w/ Parallel Compile
Competitive Comparison
Stratix IV FPGAs and Virtex-5 FPGAs
Copyright © 2008 Altera Corporation25
Stratix IV FPGA Power Saving Techniques
Power Minimizing Technique
Stratix IV
FPGAs
Virtex-5
FPGAs
Silicon process optimizations � �
Programmable power technology �
DDR3 with dynamic OCT �
PCI Express Gen2 hard IP �
Software static power optimization �
Copyright © 2008 Altera Corporation
Static Power Comparison
Stratix IV FPGAs double the density AND minimize static power
Stratix IV E Family(Typical Design)
Virtex-5 LXFast/Medium
Virtex-5 LX(Slow)
Sta
tic p
ow
er
(W)
Logic elements
0.000
1.000
2.000
3.000
4.000
5.000
6.000
0 100000 200000 300000 400000 500000 600000 700000
Copyright © 2008 Altera Corporation
PCI Express Hard IP Minimizes Core Power
FPGA industry’s lowest power PCI Express solution
Xilinx’s PCIe soft IP Altera’s PCIe hard IP
Po
wer
Assumptions:
Altera: Hard IP in Gen2 x4 configuration
Xilinx: Soft IP with bandwidth equivalent to Gen2 x4 derived using resource count from Xilinx CoreGen and XPE v10.1
Conditions: Toggle Rate = 20%
50% Less Power
Copyright © 2008 Altera Corporation
50% Less power
45% Less power
29% Less power
Reduce power consumption by 50%
Save Multiple Watts/Device (300K LEs)Save Multiple Watts/Device (300K LEs)
Copyright © 2008 Altera Corporation
Bu
ild
of
mate
rial
(BO
M)
co
st
per
devic
e
Device
$0.000
$10.00
$20.00
$30.00
$40.00
$50.00
$60.00
XC5VLX330T EP3SL340 (1.1V) EP3SL340 (0.9V) EP4SGX360
$70.00
$80.00
57%Savings
46%Savings
36%Savings
PCB Cost Cost for Caps Cost for Heatsink and Fan
Hidden CostsHidden Costs
Reduce hidden costs with Stratix IV FPGAsby 57% per device!
Copyright © 2008 Altera Corporation
Core Performance
On average, Stratix IV FPGAs are 35% fasterthan Virtex-5 FPGAs
0.90
1.00
1.10
1.20
1.30
1.40
1.50
1.60
1.70
1.80
1.90
2.00
2.10
Real customer designs
Fm
ax r
ati
oS
trati
x I
V/V
irte
x-5
Stratix IV advantage
Virtex-5 advantage
Stratix IV FPGAsare on average 35% faster
Conditions•Best effort (Xplorer, DSE, and seed sweep)•Fastest speed grade•Quartus II software version 8.0 and ISE 10.1i•Real customer designs
Copyright © 2008 Altera Corporation
Stratix IV FPGAs: Unprecedented Core PerformanceStratix IV FPGAs: Unprecedented Core Performance
Achieve higher performance and lower costs with Stratix IV FPGAs
Stratix IV FPGAsVirtex-5 FPGAs
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
Slow
35%
Medium Fast
Rela
tive F
PG
A c
ore
perf
orm
an
ce
Copyright © 2008 Altera Corporation
Stratix IV FPGAs: Unprecedented I/O Performance
InterconnectVirtex-5 FPGAs
(65 nm)Stratix IV FPGAs
(40 nm)
DDR2 333 MHz 400+ MHz
DDR3 No DIMM support 533 MHz
QDR II 300 MHz 350 MHz
QDR II+ No DIMM support 400 MHz
RLDRAM II 300 MHz 400 MHz
LVDS 1.25 Gbps 1.6 Gbps
Transceivers 3G, 6G 3G, 6G, 8G
PCI-Express Hard IP x8 No solution 40G
Higher bandwidth with Stratix IV FPGAs
Copyright © 2008 Altera Corporation
� Highest bandwidth and performance− Core clock speeds of 600 MHz− Up to 48 transceiver blocks operating
at up to 8.5 Gbps− DDR3 at 1067 Mbps− LVDS at 1.6 Gbps− Quartus II software performance optimization
� Lowest power− 0.9-V core voltage on an optimal 40-nm process− Programmable Power Technology− Lowest transceiver power− DDR3 with dynamic OCT− Integration of hard IP for PCI Express Gen1 and
Gen2− Quartus II PowerPlay technology
Highest system bandwidth and power efficiency
SummarySummary
40-nm andarchitecturalinnovations
Stratix IV FPGAs
Copyright © 2008 Altera Corporation
Resources� Visit Stratix IV FPGA power web page
− http://www.altera.com/products/devices/stratix-fpgas/stratix-iv/overview/power/stxiv-power.html
� Visit Stratix IV FPGA performance web page
− http://www.altera.com/products/devices/stratix-fpgas/stratix-iv/overview/performance/stxiv-performance.html
� See Stratix IV FPGA transceiver (8G) and LVDS (1.5 Gbps) demos
− http://www.altera.com/b/40-nm-stratix-iv-video.html
� Download Stratix IV FPGA power white paper
− http://www.altera.com/literature/wp/wp-01059-stratix-iv-40nm-power-management.pdf
� Stratix IV EPE – power estimation spreadsheet
− http://www.altera.com/support/devices/estimator/pow-powerplay.jsp
� See what Programmable Power Technology did for Stratix III FPGAs
− http://www.altera.com/products/devices/stratix-fpgas/stratix-iii/overview/power/st3-power.html
Copyright © 2008 Altera Corporation
Additional Resources
� How Altera performs FPGA benchmarks
− http://www.altera.com/literature/wp/wpfpgapbm.pdf
� Guidance on accurately benchmarking FPGAs white paper
− http://www.altera.com/literature/wp/wp-01040.pdf
� Reproduce results through open cores
− http://www.altera.com/products/devices/stratix-fpgas/stratix-iii/overview/architecture/performance/st3-opencores.html
Thank You!
For more information visit: www.altera.com