fpga / soc teknologi - i dag og i fremtiden
TRANSCRIPT
An Introduction to Xilinx All Programmable Solutions
FPGA Seminar
NOVI – Ålborg
May 31’st 2017
© Copyright 2016 Xilinx.
Page 2
Kontakt detaljer :
© Copyright 2016 Xilinx.
Agenda
Update on Xilinx FPGA / SOC solutions
Roadmap : Where are the FPGA / SOC technology taking us –
what is the future ?
Development tool’s for FPGA / SOC – now and the future
Xilinx ReVision
3
© Copyright 2016 Xilinx.
Page 4
An Expanding All Programmable Portfolio
© Copyright 2016 Xilinx.
Industry View of 20nm Technology Cost
Page 5
*Source: Nvidia, 2013 International Trade Partner Conference
© Copyright 2016 Xilinx.
Page 6
Mid-Range Kintex® Portfolio for Price-Performance-per-Watt
Performance
1.7XPerformance/
Watt
Most cost-effective
Mainstream protocols
Highest DSP bandwidth
16G backplane support
The only FinFET mid-range FPGA
High-end features in the mid-range
2.4XPerformance/
Watt
1X
© Copyright 2016 Xilinx.
Page 7
Kintex® Portfolio: Expanding Mid-Range Capabilities Maximum Values
Logic Cells / System Logic Cells1 478 1,451 1,143
Block RAM (Mb) 34 76 34.6
UltraRAM (Mb) - - 36
DSP Slices 1,920 5,520 3,528
Peak DSP Performance (GMACs) 2,845 8,180 6,287
Transceiver Count 32 64 76
Peak Transceiver Line Rate (Gb/s) 12.5 16.3 32.75
Peak Transceiver Bandwidth (Gb/s) 800 2,086 3,268
Integrated PCI Express® Gen2 x8 Gen3 x8 Gen3 x16, Gen4 x8
Memory Interface Performance (Mb/s) DDR3-1866 DDR4-2400 DDR4-2666
I/O Pins 500 832 668
1: UltraScale™ & UltraScale+™ Devices measured in System Logic Cells
© Copyright 2016 Xilinx.
Page 8
Cost Optimized Solutions
© Copyright 2016 Xilinx.
Introducing the new Cost-Optimized Portfolio
• Better processor
scalability with single-
core ARM Cortex-A9
Artix®-7
Zynq®-7000
Spartan®-6
• Smaller Densities
• Win 10 ISE® Tool
Support
I/O Optimized
Transceiver
Optimized
Artix®-7
System
Optimized
Zynq®-7000
Spartan®-6
Page 9
Spartan-7
• 2.5X Performance/Watt
• Industry Leading
Vivado Tool Support
© Copyright 2016 Xilinx.
Page 10
Continuing the Spartan Heritage
SPARTAN SPARTAN-llE SPARTAN-3E SPARTAN-3A
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
Spartan-XL
Spartan-II/IIE
Spartan-3
Spartan-3L
Spartan-3E
Spartan-3A DSP
Spartan-3AN
Spartan-6 Spartan-7Spartan-3A
0.5um 90nm 45nm 28nm
Nearly two decades, and three quarter of a billion devices shipped
© Copyright 2016 Xilinx.
Page 11
5 New Devices and One New Family:The Broadest Cost-Optimized All Programmable Portfolio
Value
LX4 LX9 LX16 LX150LX45 LX100LX75LX25
A50T A75TA35TA15T A200TA100TA25TA12T
Mid-RangeZ-7010 Z-7015 Z-7020Z-7007S
Z-7012S
Z-7014S
S6 S15 S50 S100S75S25
© Copyright 2016 Xilinx.
Page 12
Spartan-7 FPGA OverviewIndustry’s Best performance-per-watt for cost-sensitive applications
Security
Encryption, authentication
AES256 CBC & SHA-256
XADC & SYSMON
1MSPS ADC
Thermal monitoring
Small Package
Form FactorOnly 28nm device in an
8x8mm package
High-Range I/OLow cost interfacing
Up to 1.25G LVDS
DDR3-800Up to 800Mb/s
Flexible soft controller
DSPWider 25x18 multiplier
160 slices, 176GMACs
Block RAM36K/18K blocks
Up to 4.2Mb total
2.5X Perf/Watt50% lower power &
30% faster than Spartan-6
3.3V
© Copyright 2016 Xilinx.
Page 13
Spartan-7 FPGAs
Notes:
1. Packages with the same last letter and number sequence, e.g., A484, are footprint compatible with all other Spartan-7 devices with the same sequence. The footprint compatible devices within this family are
outlined.
Spartan®-7 FPGAsI/O Optimization at the Lowest Cost and Highest Performance-per-Watt
Part Number XC7S6 XC7S15 XC7S25 XC7S50 XC7S75 XC7S100
Logic Cells 6,000 12,800 23,360 52,160 76,800 102,400
Slices 938 2,000 3,650 8,150 12,000 16,000
CLB Flip-Flops 7,500 16,000 29,200 65,200 96,000 128,000
Max. Distributed RAM (Kb) 70 150 313 600 832 1,100Block RAM/FIFO w/ ECC (36 Kb
each)5 10 45 75 90 120
Total Block RAM (Kb) 180 360 1,620 2,700 3,240 4,320Clock Mgmt Tiles (1 MMCM + 1
PLL)2 2 3 5 8 8
Max. Single-Ended I/O Pins 100 100 150 250 400 400
Max. Differential I/O Pairs 48 48 72 120 192 192
DSP Slices 10 20 80 120 140 160
Analog Mixed Signal (AMS) / XADC 0 0 1 1 1 1
Configuration AES / HMAC Blocks 0 0 1 1 1 1
Commercial Speed Grade -1,-2 -1,-2 -1,-2 -1,-2 -1,-2 -1,-2
Industrial Speed Grade -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1L -1,-2,-1LPackage(1) Body Area (mm) Available User I/O: 3.3V SelectIO™ HR I/OCPGA196 8x8 100 100CSGA225 13x13 100 100 150CSGA324 15x15 150 210FTGB196 15x15 100 100 100 100FGGA484 23x23 250 338 338FGGA676 27x27 400 400
© Copyright 2016 Xilinx.
Page 14
Artix®-7 FPGA OverviewThe industry’s cost-optimized performance leader
Security Encryption & authentication
AES256 CBC & SHA-256
XADC & SYSMON1Msps ADC reduces BOM cost
Complies with reliability standards
Small Package
Form FactorSmallest for 35K-215K LCs
Meets stringent SWAP-C
High-range I/OLow cost interfacing
Up to 300Gb/s
LVDS bandwidth
6.6Gb/s GTPUp to 211Gb/s bandwidth
DDR3-1066Low-cost DRAM
Up to 1,066Mb/s
Flexible soft controller
DSPWider 25x18 multiplier
Up to 740 slices and
931GMACs @ 629MHz
Block RAM36K/18K blocks
Up to 12.8Mb total
© Copyright 2016 Xilinx.
Page 15
Artix-7 FPGAs
Notes:
4. Device migration is available within the Artix-7 family for like packages but is not supported between other 7 series families.
3. Leaded package option available for all packages. See DS180, 7 Series FPGAs Overview for details.
1. Supports PCI Express Base 2.1 specification at Gen1 and Gen2 data rates.
2. Represents the maximum number of transceivers available. Note that the majority of devices are available without transceivers. See the Package section of this table for details.
Artix®-7 FPGAsTransceiver Optimization at the Lowest Cost and Highest DSP Bandwidth (1.0V, 0.95V, 0.9V)
Part Number XC7A12T XC7A15T XC7A25T XC7A35T XC7A50T XC7A75T XC7A100T XC7A200T
LogicResources
Logic Cells 12,800 16,640 23,360 33,280 52,160 75,520 101,440 215,360
Slices 2,000 2,600 3,650 5,200 8,150 11,800 15,850 33,650
CLB Flip-Flops 16,000 20,800 29,200 41,600 65,200 94,400 126,800 269,200
MemoryResources
Maximum Distributed RAM (Kb) 171 200 313 400 600 892 1,188 2,888
Block RAM/FIFO w/ ECC (36 Kb each) 20 25 45 50 75 105 135 365
Total Block RAM (Kb) 720 900 1,620 1,800 2,700 3,780 4,860 13,140
Clock Resources CMTs (1 MMCM + 1 PLL) 3 5 3 5 5 6 6 10
I/O ResourcesMaximum Single-Ended I/O 150 250 150 250 250 300 300 500
Maximum Differential I/O Pairs 72 120 72 120 120 144 144 240
Embedded Hard IP
Resources
DSP Slices 40 45 80 90 120 180 240 740
PCIe® Gen2(1) 1 1 1 1 1 1 1 1
Analog Mixed Signal (AMS) / XADC 1 1 1 1 1 1 1 1
Configuration AES / HMAC Blocks 1 1 1 1 1 1 1 1
GTP Transceivers (6.6 Gb/s Max Rate)(2) 2 4 4 4 4 8 8 16
Speed Grades
Commercial -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2 -1, -2
Extended -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3 -2L, -3
Industrial -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L -1, -2, -1L
Package(3), (4) Dimensions (mm)
Ball Pitch(mm)
Available User I/O: 3.3V SelectIO™ HR I/O (GTP Transceivers)
CPG236 10 x 10 0.5 106 (2) 106 (2) 106 (4) 106 (2) 106 (2)
CSG324 15 x 15 0.8 210 (0) 210 (0) 210 (0) 210 (0) 210 (0)
CSG325 15 x 15 0.8 150 (2) 150 (4) 150 (4) 150 (4) 150 (4)
FTG256 17 x 17 1.0 170 (0) 170 (0) 170 (0) 170 (0) 170 (0)
SBG484 / SBV484 19 x 19 0.8 285 (4)
FootprintCompatible
FGG484 23 x 23 1.0 250 (4) 250 (4) 250 (4) 285 (4) 285 (4)
FBG484 / FBV484 23 x 23 1.0 285 (4)
FootprintCompatible
FGG676 27 x 27 1.0 300 (8) 300 (8)
FBG676 / FBV676 27 x 27 1.0 400 (8)
FFG1156 / FFV1156 35 x 35 1.0 500 (16)
© Copyright 2016 Xilinx.
Page 16
Migrating from Spartan-6Spartan-7 or Artix-7?
Vivado support enables customers to build scalable cost optimized platforms
Logic + GTs
Logic Only
Spartan-6LXT
Spartan-6LX
For designs requiring…
© Copyright 2016 Xilinx.
Dual Cortex-A9 MPCore
1 GHz
5000 DMIPS
Xilinx Processing Heritage
2001 2003 2005 2007 2012
130nm
Dual 405 Cores
450+ MHz
700+ DMIPS
90nm
65nm
Dual 440 Cores
550+ MHz
1100+ DMIPS 28nm
10+ years, 4 Generations
Perf
orm
ance
405 Core
300+ MHz
450+ DMIPS
© Copyright 2016 Xilinx.
Introducing single ARM Cortex™-A9 devices built on the proven Zynq-7000 architecture
Offering the highest integration at the lowest cost within the Cost-Optimized Portfolio
New devices fortify processor scalability from the entry-level to the high-end for embedded designs
Page 18
Introducing Zynq-7000S Devices
Single-Core ARM® Devices Enhance Scalable Processing Portfolio
Lower Cost Entry Points Enhance Scalable Processing Portfolio
© Copyright 2016 Xilinx.
Page 19
Zynq-7000S Offers Scalability in Motor Control
Zynq-7000S Zynq-7000
Maximum Capabilities
• 2 Full Drives
• Fieldbus Protocols
via PL
• Profibus
• CanOpen
• Others
Maximum Capabilities
• 4 Full Drives
• 2nd Cortex-A9
enables AnyBus IP
• EtherCAT
• Profinet
• Powerlink
• EtherNet I/P
• Modbus
Z-7014S
Processing System
Programmable Logic
ARM
Cortex-A9
Motor Control
Computations
Z-7020
Processing System
Programmable Logic
ARM
Cortex-A9ARM
Cortex-A9
Fieldbus IPAnyBus IP
Motor Control
Computations
© Copyright 2016 Xilinx.
Page 20
Introducing Zynq-7000S Devices
Application Processors
A9
Integrated Memory
Mapped Peripherals• e.g. USB2.0, GigE
Integrated Analog• Dual multi-channel 12-bit ADC
• Up to 1Msps
• Temp & Voltage sensors
Programmable Logic
Extensive IP Portfolio• Standardized AXI4 interfaces
• Enables peripheral expansion
• Includes software drivers
Tightly Coupled Domains• 3000+ PS/PL interconnects
• Low Latency
• Up to 100Gb of bandwidth
High Bandwidth Memory
• L1/L2 CPU Caches
• Dedicated On-Chip Memory (OCM)
• DDR3, DDR2, LPDDR2 w/ ECC
Zynq-7000S• Single-Core
• Up to 766MHz
Zynq-7000• Dual-Core
• Up to 1GHz
Zynq-7000S• Artix-7 Series FPGA
• 23K-65K Logic Cells
Zynq-7000• 7 Series FPGA
• 28K-440K Logic Cells
© Copyright 2016 Xilinx.
Hig
h-E
nd
Mid
-Ran
ge
Co
st-
Op
tim
ize
d
Page 21
Extending Scalability Across the Zynq® Portfolio
Dual-core ARM Cortex-A9
28nm Artix-7 FPGA
Dual-core ARM Cortex-A9
28nm Kintex®-7 FPGA
Dual-Core ARM Cortex-R5
Dual-Core ARM Cortex-A53
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali™-400 MP2
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali-400 MP2
H.264/H.265 Video Codec
16nm FinFET+ Logic
Single-Core ARM® Cortex™-A9
28nm Artix®-7 FPGA
© Copyright 2016 Xilinx.
Page 22
Cost-Optimized Devices Mid-Range DevicesDevice Name Z-7007S Z-7012S Z-7014S Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100Part Number XC7Z007S XC7Z012S XC7Z014S XC7Z010 XC7Z015 XC7Z020 XC7Z030 XC7Z035 XC7Z045 XC7Z100
Pro
cess
ing
Syst
em (
PS)
Processor CoreSingle-Core
ARM® Cortex™-A9 MPCore™Up to 766MHz
Dual-Core ARM Cortex-A9 MPCore
Up to 866MHz
Dual-Core ARM Cortex-A9 MPCore
Up to 1GHz(1)
Processor Extensions NEON™ SIMD Engine and Single/Double Precision Floating Point Unit per processorL1 Cache 32KB Instruction, 32KB Data per processorL2 Cache 512KB
On-Chip Memory 256KBExternal Memory Support(2) DDR3, DDR3L, DDR2, LPDDR2
External Static Memory Support(2) 2x Quad-SPI, NAND, NORDMA Channels 8 (4 dedicated to PL)
Peripherals 2x UART, 2x CAN 2.0B, 2x I2C, 2x SPI, 4x 32b GPIOPeripherals w/ built-in DMA(2) 2x USB 2.0 (OTG), 2x Tri-mode Gigabit Ethernet, 2x SD/SDIO
Security(3) RSA Authentication of First Stage Boot Loader,AES and SHA 256b Decryption and Authentication for Secure Boot
Processing System to Programmable Logic Interface Ports
(Primary Interfaces & Interrupts Only)
2x AXI 32b Master, 2x AXI 32b Slave4x AXI 64b/32b Memory
AXI 64b ACP16 Interrupts
Pro
gram
mab
le L
ogi
c (P
L)
7 Series PL Equivalent Artix®-7 Artix-7 Artix-7 Artix-7 Artix-7 Artix-7 Kintex®-7 Kintex-7 Kintex-7 Kintex-7Logic Cells 23K 55K 65K 28K 74K 85K 125K 275K 350K 444K
Look-Up Tables (LUTs) 14,400 34,400 40,600 17,600 46,200 53,200 78,600 171,900 218,600 277,400Flip-Flops 28,800 68,800 81,200 35,200 92,400 106,400 157,200 343,800 437,200 554,800
Total Block RAM (# 36Kb Blocks)
1.8Mb(50)
2.5Mb(72)
3.8Mb(107)
2.1Mb (60)
3.3Mb(95)
4.9Mb (140)
9.3Mb (265)
17.6Mb (500)
19.1Mb (545)
26.5Mb (755)
DSP Slices 60 120 170 80 160 220 400 900 900 2,020PCI Express® — Gen2 x4 — — Gen2 x4 — Gen2 x4 Gen2 x8 Gen2 x8 Gen2 x8
Analog Mixed Signal (AMS) / XADC(2) 2x 12 bit, MSPS ADCs with up to 17 Differential InputsSecurity(3) AES & SHA 256b Decryption & Authentication for Secure Programmable Logic Config
Speed Grades
Commercial -1 -1 -1 -1
Extended -2 -2,-3 -2,-3 -2
Industrial -1, -2 -1, -2, -1L -1, -2, -2L -1, -2, -2LNotes:
1. 1 GHz processor frequency is available only for -3 speed grades for devices in flip-chip packages. Please see the data sheet for more details.
2. Z-7007S and Z-7010 in CLG225 have restrictions on PS peripherals, memory interfaces, and I/Os. Please refer to the Technical Reference Manual for more details.
3. Security block is shared by the Processing System and the Programmable Logic.
Zynq®-7000 AP SoC Family
© Copyright 2016 Xilinx.
Page 23
Zynq®-7000 All Programmable SoC FamilyHR I/O, HP I/O, PS I/O, and Transceivers (GTP or GTX)
Cost-Optimized Devices Mid-Range DevicesDevice Name Z-7007S Z-7012S Z-7014S Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100
Package Footprint
Dimensions(mm) (1)
HR I/O, HP I/OPS I/O(2), GTP Transceivers
HR I/O, HP I/OPS I/O(2), GTX Transceivers
CLG225 13x1354, 0
84(3), 054, 0
84(3), 0
CLG400 17x17100, 0128, 0
125, 0128, 0
100, 0128, 0
125, 0128, 0
CLG484 19x19200, 0128, 0
200, 0128, 0
CLG485(4) 19x19150, 0128, 4
150, 0128, 4
SBG485 / SBV485(4) 19x1950, 100 128, 4
FBG484 / FBV484 23x23100, 63128, 4
FBG676 / FBV676(1) 27x27100, 150
128, 4100, 150
128, 8100, 150
128, 8
FFG676 / FFV676(1) 27x27100, 150
128, 4100, 150
128, 8100, 150
128, 8
FFG900 / FFV900 31x31212, 150128, 16
212, 150128, 16
212, 150128, 16
FFG1156 / FFV1156 35x35250, 150128, 16
Notes:
1. Devices in the same package are footprint compatible. FBG676 / FBV676 and FFG676 / FFV676 are also footprint compatible.
2. PS I/O count does not include dedicated DDR calibration pins.
3. PS DDR and PS MIO pin count is limited by package size. See DS190, Zynq-7000 All Programmable SoC Overview for details.
4. CLG485 and SBG485 / SBV485 are pin-to-pin compatible. See product data sheets and user guides for more details.
See DS190, Zynq-7000 All Programmable SoC Overview for package details.
© Copyright 2016 Xilinx.
New Low-Cost Kits for Cost-Optimized Devices
Avnet MiniZed Z007S Kit in June 2017• Zynq-7000S: Attack ASSPs needing companion FPGAs
S7 ARTY 7S50 Kit in July 2017
S7 ARTY 7S25 Kit in Dec 2017• Spartan 7: First Production 7S50 Silicon in June
$89
ARTY 7A35T Kit Available Now• Artix-7: Enable new 7A25T & 7A12T design starts now!
© Copyright 2016 Xilinx.
Page 25
Cost-Optimized Portfolio Supported with Free Vivado WebPACK™
Family Devices
ALL
ALL
ALL Zynq®-7000S +
Zynq-7000 up to
Z-7030
Drag and drop hundreds of Xilinx & partner 7 series IP blocks
– Includes MicroBlaze™ soft processor and AXI block-level interconnect
Industry’s only no-cost, mixed-language simulator with no code line limits
Best-in-class quality-of-results
© Copyright 2016 Xilinx.
SoCs1FPGAs
Portfolio at a Glance
Process Node 45nm 28nm 28nm 28nm
ProcessorMicroBlaze™
Soft ProcessorMicroBlaze
Soft ProcessorMicroBlaze
Soft Processor
Single- or Dual-Core
ARM® Cortex™-A9
Logic Density
Range (Logic Cells)4K → 150K 6K → 102K 12K → 200K 28K → 85K
Max Memory
Interface (Mb/s)DDR3-800 DDR3-800 DDR3-1066 DDR3-1066
LVDS I/O
Performance1.08Gb/s 1.25Gb/s 1.25Gb/s 1.25Gb/s
Transceiver
Max Gb/s3.2Gb/s N/A 6.6Gb/s 6.25Gb/s
Zynq®-7000Artix®-7Spartan-7Spartan®-6
1: Cost-optimized devices based on Artix-7 programmable logicPage 26
© Copyright 2016 Xilinx.
Page 27
• 20nm UltraScale
Update
© Copyright 2016 Xilinx.
Page 28
Block-Level Innovations Optimize Critical Paths for Massive Bandwidth and Processing
27x18
XDSPWider multipliers,
fewer blocks per function
DDR4
Memory I/O
30% higher data rates
20% lower power
Block
RAM
Block RAMHardened data cascading
Improved power, performance
Transceivers12.5G low speed grade
16G & 28G backplane
33G chip-to-chip
Integrated IP100G Ethernet MAC
150G Interlaken
PCI Express Gen3
SSI
Technology Virtual monolithic die
Security AES-GCM mode,
greater key protection,
more authentication schemes
Co-Optimized
© Copyright 2016 Xilinx.
Effect of
routing
resources
& analytical
placement
Logic cells
O(N2)
Interconnect tracks O(N)
UltraScale Architecture Re-Designs the Core
Page 29
Clock
Domain 1
Clock Domain 3
Clock Domain 2
Wire lengthPartially
Used CLB
40nm 28nmN
20nm
© Copyright 2016 Xilinx.
Page 30
Integrated 100G Ethernet MAC, 150G Interlaken
150G
Interlaken
Up to
12 x 12.5Gb/s
Up to
6 x 25 Gb/s
100GE MAC 10 x10 Gb/s 4 x 25Gb/s
Configuration OptionsResource Savings
80% 90%
Interlaken
(12 lane, 10G)
7-Series
Soft IP
UltraScale
Hard IP
LUTs 32,700 0
Fabric Flip Flops 46,200 1,536
BRAM 16 0
Transceivers 12 12
Ethernet MAC + PCS
(10x10G)
7-Series
Soft IP
UltraScale
Hard IP
LUTs 70,000 0
Fabric Flip Flops 65,000 1,280
BRAM 41 0
Transceivers 10 10
Interlaken
(12 lane, 10G)
7-Series
Soft IP
UltraScale
Integrated IP
Ethernet MAC + PCS
(10x10G)
7-Series
Soft IP
UltraScale
Integrated IP
Hard IP Lanes x Line Rate
Feature Benefit
Large Scale Integration
• More headroom for power budget
• Lower latency and higher performance
• Frees up logic for additional functionality, e.g., packet processing
• Simplified flow and easier routing for shorter run-times
• No licensing requirements
Multiple configuration options Flexibility to meet existing and future design requirements
© Copyright 2016 Xilinx.
Page 31
2nd Generation 3D IC Infrastructure Enables Virtual Monolithic Design
Feature Benefit
~20,000 registered routing lines between die• Enables >500 MHz datapath performance between SLRs
• Deterministic, predictable timing
Clocking Architecture Spans SLR boundaries Abundant clock resources to meet demanding application
Foot-print compatibility between SSI and non-SSI devices Ability to seamlessly migrate from monolithic to 3D-IC devices
SLR0 SLR1 SLR2
passive interposer
Substrate
© Copyright 2016 Xilinx.
Page 32
UltraScale Demos – Delivering What We Promised
High Performance Proven in System Applications
© Copyright 2016 Xilinx.
Page 33
Kintex® UltraScale™ FPGAsDevice Name KU025(1) KU035 KU040 KU060 KU085 KU095 KU115
Logic Resources
System Logic Cells (K) 318 444 530 726 1,088 1,176 1,451
CLB Flip-Flops 290,880 406,256 484,800 663,360 995,040 1,075,200 1,326,720
CLB LUTs 145,440 203,128 242,400 331,680 497,520 537,600 663,360
Memory Resources
Maximum Distributed RAM (Kb) 4,230 5,908 7,050 9,180 13,770 4,800 18,360
Block RAM/FIFO w/ECC (36Kb each) 360 540 600 1,080 1,620 1,680 2,160
Block RAM/FIFO (18Kb each) 720 1,080 1,200 2,160 3,240 3,360 4,320
Total Block RAM (Mb) 12.7 19.0 21.1 38.0 56.9 59.1 75.9
Clock ResourcesCMT (1 MMCM, 2 PLLs) 6 10 10 12 22 16 24
I/O DLL 24 40 40 48 56 64 64
I/O Resources
Maximum Single-Ended HP I/Os 208 416 416 520 572 650 676
Maximum Differential HP I/O Pairs 96 192 192 240 264 288 312
Maximum Single-Ended HR I/Os 104 104 104 104 104 52 156
Maximum Differential HR I/O Pairs 48 48 48 48 56 24 72
Integrated IP Resources
DSP Slices 1,152 1,700 1,920 2,760 4,100 768 5,520
System Monitor 1 1 1 1 2 1 2
PCIe® Gen1/2/3 1 2 3 3 4 4 6
Interlaken 0 0 0 0 0 2 0
100G Ethernet 0 0 0 0 0 2 0
16.3Gb/s Transceivers (GTH/GTY) 12 16 20 32 56 64 64
Speed Grades
Commercial -1 -1 -1 -1 -1 -1 -1
Extended -2 -2 -3 -2 -3 -2 -3 -2 -3 -2 -2 -3
Industrial -1 -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -2 -1 -1L -2
PackageFootprint(2, 3, 4)
Package Dimensions (mm)
HR I/O, HP I/O, GTH/GTY
A784 23x23(5) 104, 364, 8 104, 364, 8
A676 27x27 104, 208, 16 104, 208, 16
A900 31x31 104, 364, 16 104, 364, 16
A1156 35x35 104, 208, 12 104, 416, 16 104, 416, 20 104, 416, 28 52, 468, 28
A1517 40x40 104, 520, 32 104, 520, 48 104, 520, 48
Footprint Compatible with
Virtex® UltraScale Devices
C1517 40x40 52, 468, 40
D1517 40x40 104, 234, 64
B1760 42.5x42.5 104, 572, 44 52, 650, 48 104, 598, 52
A2104 47.5x47.5 156, 676, 52
B2104 47.5x47.5 52, 650, 64 104, 598, 64
D1924 45x45 156, 676, 52
F1924 45x45 104, 520, 56 104, 624, 64
Notes:
1. Certain advanced configuration features are not supported in the KU025. Refer to the Configuring FPGAs section in DS890, UltraScale Architecture and Product Overview.
2. Maximum achievable performance is device and package dependent; consult the associated data sheet for details.
3. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
4. See UG575, UltraScale Architecture Packaging and Pinouts User Guide for more information.
5. 0.8mm ball pitch. All other packages listed 1mm ball pitch.
Disclaimer: This document contains preliminary information and is subject to change without notice. Information provided herein relates to products and/or services not yet available for sale, and provided solely for information purposes and are not intended, or to be construed, as an offer for sale or an attempted commercialization of the
products and/or services referred to herein. Please contact your Xilinx representative for the latest information.
© Copyright 2016 Xilinx.
Page 34
Virtex® UltraScale™ FPGAsDevice Name XCVU065 XCVU080 XCVU095 XCVU125 XCVU160 XCVU190 XCVU440
Logic Resources System Logic Cells (K) 783 975 1,176 1,567 2,027 2,350 5,541
CLB Flip-Flops 716,160 891,424 1,075,200 1,432,320 1,852,800 2,148,480 5,065,920
CLB LUTs 358,080 445,712 537,600 716,160 926,400 1,074,240 2,532,960
Memory Resources
Maximum Distributed RAM (Kb) 4,830 3,980 4,800 9,660 12,690 14,490 28,710
Block RAM/FIFO w/ECC (36Kb each) 1,260 1,421 1,728 2,520 3,276 3,780 2,520
Block RAM/FIFO (18Kb each) 2,520 2,842 3,456 5,040 6,552 7,560 5,040
Total Block RAM (Mb) 44.3 50.0 60.8 88.6 115.2 132.9 88.6
Clock Resources
CMT (1 MMCM, 2 PLLs) 10 16 16 20 28 30 30
I/O DLL 40 64 64 80 120 120 120
Transceiver Fractional PLL 5 8 8 10 13 15 0
I/O Resources
Maximum Single-Ended HP I/Os 468 780 780 780 650 650 1,404
Maximum Differential HP I/O Pairs 216 360 360 360 300 300 648
Maximum Single-Ended HR I/Os 52 52 52 52 52 52 52
Maximum Differential HR I/O Pairs 24 24 24 24 24 24 24
Integrated IP Resources
DSP Slices 600 672 768 1,200 1,560 1,800 2,880
System Monitor 1 1 1 2 3 3 3
PCIe® Gen1/2/3 2 4 4 4 4 6 6
Interlaken 3 6 6 6 8 9 0
100G Ethernet 3 4 4 6 9 9 3
GTH16.3Gb/s Transceivers 20 32 32 40 52 60 48
GTY30.5Gb/s Transceivers 20 32 32 40 52 60 0
Speed Grades
Commercial -1
Extended -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -1H -2 -3 -2 -3
Industrial -1 -2 -1 -2 -1 -2 -1 -2 -1 -2 -1 -2 -1 -1L -2
PackageFootprint(1, 2)
Package Dimensions (mm)
HR I/O, HP I/O, GTH 16.3Gb/s, GTY 30.5Gb/s
Footprint Compatible with
Kintex® UltraScale Devices
C1517 40x40 52, 468, 20, 20 52, 468, 20, 20 52, 468, 20, 20
D1517 40x40 52, 286, 32, 32 52, 286, 32, 32 52, 286, 40, 32
B1760 42.5x42.5 52, 650, 32, 16 52, 650, 32, 16 52, 650, 36, 16
A2104 47.5x47.5 52, 780, 28, 24 52, 780, 28, 24 52, 780, 28, 24
B2104 47.5x47.5 52, 650, 32, 32 52, 650, 32, 32 52, 650, 40, 36 52, 650, 40, 36 52, 650, 40, 36
C2104 47.5x47.5 52, 364, 32, 32 52, 364, 40, 40 52, 364, 52, 52 52, 364, 52, 52
B2377 50x50 52, 1248, 36, 0
A2577 52.5x52.5 0, 448, 60, 60
A2892 55x55 52, 1404, 48, 0
Notes:
1. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
2. See UG575, UltraScale Architecture Packaging and Pinouts User Guide for more information.
Disclaimer: This document contains preliminary information and is subject to change without notice. Information provided herein relates to products and/or services not yet available for sale, and provided solely for information purposes and are not intended, or to be construed, as an offer
for sale or an attempted commercialization of the products and/or services referred to herein. Please contact your Xilinx representative for the latest information.
© Copyright 2016 Xilinx.
Page 35
• 16nm UltraScale +
Update
© Copyright 2016 Xilinx.
Page 36
New & Enhanced UltraScale+™ Capabilities
DDR4
© Copyright 2016 Xilinx.
Page 37
Tuned Process for Optimal Performance/WattOptimal Operating Voltage Selection
Normalized Fabric
Performance1.0x 1.2x 1.6x 1.2x
Normalized Total
Power1.0x .7x .8x .5x
Performance/Watt 1.0x 1.7x 2x 2.4x
© Copyright 2016 Xilinx.
Page 38
UltraRAM: New Memory Technology
Up to 360Mb to replace external memory for cost, power, performance
© Copyright 2016 Xilinx.
UltraRAM Capabilities
.
.
.
.
.
.
Features Block RAM UltraRAM
Density per block 36K/18K 288K
Configurable Port Width -
Asynchronous Clocking -
Built-in FIFO -
ECC
Unused site gating
Sleep mode
Deep-sleep mode (3-clk cycle wake-up time) -
Hardened data output cascading
Hardened data input & address cascade -
Hard cascade across column - deterministic latency -
Optional input cascade/pipelines stages -
Hardened address decoder -
72DIN
72
DIN
ADDR
ADDR
ADDR
UltraRAM vs. Block RAM Comparison (Sub-Set)
Different Capabilities for Different Use Models
Page 39
.
.
.
© Copyright 2016 Xilinx.
Page 40
New Integrated PCIe Gen3x16 and Gen4x8 BlockNew Features Benefits
Gen3 x16 (8 Gb/s per lane) Performance for today’s high-end systems, e.g., 100G data center
Gen4 x8 (16 Gb/s per lane) Enables next generation system topologies
Hardened SR-IOV (4 Physical, 252 Virtual Functions) Expanded virtualization for demanding data center applications
Increased Number of Tags• 256 managed tags and 256 user managed tags
• Enables more outstanding RD requests for greater system performance
New DMA IP Complete end-to-end solution
Capable of
Multi-100G Ports
© Copyright 2016 Xilinx.
Multi-Node Footprint Migration
Page 41
20nm 16nm
Leverage system level investment across platforms
Future-proof migration path to 16nm
© Copyright 2016 Xilinx.
Page 42
© Copyright 2016 Xilinx.
Page 43
Virtex® UltraScale+™ FPGAs
Device Name VU3P VU5P VU7P VU9P VU11P VU13P
Logic
System Logic Cells (K) 862 1,314 1,724 2,586 2,822 3,763
CLB Flip-Flops (K) 788 1,201 1,576 2,364 2,580 3,441
CLB LUTs (K) 394 601 788 1,182 1,290 1,720
Memory
Max. Distributed RAM (Mb) 12.0 18.3 24.1 36.1 38.7 51.6
Total Block RAM (Mb) 25.3 36.0 50.6 75.9 70.9 94.5
UltraRAM (Mb) 90.0 132.2 180.0 270.0 270.0 360.0
Clocking Clock Management Tiles (CMTs) 10 20 20 30 12 16
Integrated IP
DSP Slices 2,280 3,474 4,560 6,840 8,928 11,904
PCIe® Gen3 x16 / Gen4 x8 2 4 4 6 3 4
150G Interlaken 3 4 6 9 6 8
100G Ethernet w/ RS-FEC 3 4 6 9 9 12
I/OMax. Single-Ended HP I/Os 520 832 832 832 624 832
GTY 32.75Gb/s Transceivers 40 80 80 120 96 128
Speed Grades
Extended -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3
Industrial -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2
Footprint(1,2) Dimensions (mm) HP I/O, GTY 32.75Gb/s
Footprint Compatible with 20nmUltraScale
Devices
C1517 40x40 520, 40
F1924(3) 45x45 624, 64
A210447.5x47.5 832, 52 832, 52 832, 52
52.5x52.5(4) 832, 52
B210447.5x47.5 702, 76 702, 76 702, 76 624, 76
52.5x52.5(4) 702, 76
C210447.5x47.5 416, 80 416, 80 416, 104 416, 96
52.5x52.5(4) 416, 104
A2577 52.5x52.5 448, 120 448, 96 448, 128
Notes:
1. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
2. All packages are 1.0mm ball pitch.
3. GTY transceiver up to 16.3Gb/s. Refer to data sheet for details.
4. These 52.5x52.5mm packages have the same PCB ball footprint as the 47.5x47.5mm packages and are footprint compatible.
© Copyright 2016 Xilinx.
Page 44
Kintex® UltraScale+™ FPGAs
Notes:
1. GTY maximum data rate is limited.
2. Maximum achievable performance is device and package dependent; consult the associated data sheet for details.
3. For full part number details, see the Ordering Information section in DS890, UltraScale Architecture and Product Overview.
4. The B784 package is only offered in 0.8mm ball pitch. All other packages are 1.0mm ball pitch.
Device Name KU3P KU5P KU9P KU11P KU13P KU15P
LogicSystem Logic Cells (K) 356 475 600 653 747 1,143
CLB Flip-Flops (K) 325 434 548 597 683 1,045CLB LUTs (K) 163 217 274 299 341 523
MemoryMax. Distributed RAM (Mb) 4.7 6.1 8.8 9.1 11.3 9.8
Total Block RAM (Mb) 12.7 16.9 32.1 21.1 26.2 34.6UltraRAM (Mb) 13.5 18.0 0 22.5 31.5 36.0
Clocking Clock Management Tiles (CMTs) 4 4 4 8 4 11
Integrated IP
DSP Slices 1,368 1,824 2,520 2,928 3,528 1,968PCIe® Gen3 x16 / Gen4 x8 1 1 0 4 0 5
150G Interlaken 0 0 0 2 0 4100G Ethernet w/RS-FEC 0 1 0 1 0 4
I/O
Max. Single-Ended HD I/Os 96 96 96 96 96 96Max. Single-Ended HP I/Os 208 208 208 416 208 572
GTH 16.3Gb/s Transceivers 0 0 28 32 28 44GTY 32.75Gb/s Transceivers 16(1) 16(1) 0 20 0 32
Speed GradesExtended -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3 -1 -2L -3Industrial -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2 -1 -1L -2
Footprint(2,3) Dimensions (mm) HD I/O, HP I/O, GTH 16.3Gb/s, GTY 32.75Gb/s
Packaging
B784 23x23(4) 96, 208, 0, 16 96, 208, 0, 16A676 27x27 48, 208, 0, 16 48, 208, 0, 16B676 27x27 72, 208, 0, 16 72, 208, 0, 16D900 31x31 96, 208, 0, 16 96, 208, 0, 16 96, 312, 16, 0E900 31x31 96, 208, 28, 0 96, 208, 28, 0
A1156 35x35 48, 416, 28, 0 48, 468, 28, 0
E1517 40x40 96, 416, 32, 20 96, 416, 32, 24A1760 42.5x42.5 96, 416, 44, 32E1760 42.5x42.5 96, 572, 32, 24
© Copyright 2016 Xilinx.
Page 45
• Zynq UltraScale +
EG & EV
© Copyright 2016 Xilinx.
Page 46
The First All Programmable Multiprocessing SoC (MPSoC)
The Right Engines for the Right Tasks
Delivering 64-bit Performance and Terabyte Address Space
Delivering an Extra Node of Value
© Copyright 2016 Xilinx.
Zynq® UltraScale+™ System Features
Page 47
© Copyright 2016 Xilinx.
Page 48
Zynq® UltraScale+™ Block Diagram
© Copyright 2016 Xilinx.
Page 49
Unprecedented System Power ManagementDesigned with Lower Power Applications In Mind
© Copyright 2016 Xilinx.
Zynq® UltraScale+™ Connection Diagram
Page 7
© Copyright 2016 Xilinx.
Application Processing System: ARM Cortex-A53
Feature Benefit
ARMv8-A architecture,
Multicore Cortex-A53 up to 1.5 GHz
• 64-bit increases compute capability while maintaining 32-bit compatibility
• ARM’s most power-efficient A5x APU & most widely used 64-bit processor
• 1 terabyte physical address space
• 2.7X performance/watt (DMIPS) vs. predecessor (processor comparison only)
NEON Technology SIMD engine accelerates multimedia, signal & image processing algorithms
Floating-Point Unit (FPU)• Hardware support for FP operations in half-, single- and double-precision
• IEEE754-2008 compliant (current Floating Point standard)
Hardware Virtualization Enables multiple SW environments & apps simultaneous access to system resources
Application Processing Unit
321
ARM
Cortex™-A53
NEON™
I-Cachew/Parity
Floating Point Unit
D-Cachew/ECC
4
SCU
1MB L2 w/ECC
Performance Power
© Copyright 2016 Xilinx.
Page 52
Real-Time Processing System: ARM Cortex-R5
Real-Time Processing Unit
21
ARM
Cortex™-R5
Vector FloatingPoint Unit
128 KB TCM w/ECC
32 KB I-Cachew/ECC
32 KB D-Cachew/ECC
GIC
Memory ProtectionUnit
Feature Benefit
ARMv7-R Architecture, up to 600MHz
• Flagship ARM series for deterministic processing for critical real-time operation
• Offloads APU to perform compute-intensive tasks, reducing overall system power
• Supports Real-Time Operating Systems (RTOS) or Bare Metal
Dual-Core for Multi-Mode Operation• Lock-Step Mode for fault tolerance and fault detection, doubles TCM to 256KB
• Split-Mode with each real-time core operating autonomously
128KB Memory with ECC• Tightly coupled with processor for deterministic and low-latency response
• Ideal for critical code structures such as interrupt service routines
Safety Certifiable• Industry-proven to meet safety-critical standards
• e.g., IEC 61508 (industrial) and IEC 26262 (automotive)
Lock-Step Configuration
COMPARE
#include <stdio.h>
main ()
{
char *string;
string = “..”;
printf(“%s” string);
if (m_cust.valid == “F”)
{ m_app.status = “Reject”;
m_cust.eligible = false;
}
if (m_car.type == “S”)
{ m_rent.perDay = 80;
};
if (m_Car….
© Copyright 2016 Xilinx.
Page 53
ARM-Based Graphics ProcessorFeature Benefit
ARM Mali™-400 MP2 up to 667MHz
• Most power-optimized ARM GPU with Full HD support (1080p)
• Ideal for 2D vector graphics and 3D graphics (e.g., HMI, waveform processing)
• Supports open standards, e.g., OpenGL ES 1.1 & 2.0
Native Embedded Linux Support Out-of-the-box drivers and libraries for graphics support
Dual Pixel Processors Up to 1.3 GPix/s (fill rate) and 20 GFLOPS (shader rate)
Optimized Memory Interface Tightly coupled w/memory controller for efficient communication with DisplayPort controller
ARM Mali™-400 MP2
Geometry
Processor
2
Pixel
Processor
1
Memory Management Unit
64 KB L2 Cache
2
2.5D/3D Visualization On-Screen
Displays1080p Resolution
Intensive fill rate for smoother transition and frame rate
High performance shaders for complex 3D scenes
© Copyright 2016 Xilinx.
Page 54
Integrated H.264 / H.265 Video Codec EngineFeature Benefit
Integrated Video Codec Unit @up to 667MHz
• Broad application ranging from surveillance, digital cameras, broadcasting
• Up to 8 simultaneous streams coming from FPGA fabric or Processing System
• Higher display density, faster encoding, and lower power vs. soft implementation
• Up to 4Kx2K (60 fps) or 8Kx4K (15 fps)
Power Management, Performance Monitoring• Clock gating (dynamic savings), power gating (static/dynamic savings)
• Measure task execution time, bandwidth, and latency for fast design optimization
Video Codec Unit
Encoder
(x4)Decoder
(x2)
Memory Controller
Camera
Ethernet
Ethernet
DisplayPort
© Copyright 2016 Xilinx.
Page 55
Platform Management UnitDedicated Hardware for Power Management and Safety
Feature Benefit
Power Management
Power Domains & Islands• ASIC-like, domain- & block-level power control to use only what’s needed when needed
• Eliminate static power of unused blocks
Power Management Framework • Xilinx-provided library to simplify & customize power control for application requirements
• Systematic power coordination between processing elements for reliable shutdown & resume
Functional Safety & System Management
SW Test Library & Error Handling Xilinx-provided libraries to manage key processing elements & detect errors
Triple-Redundancy Processor Continuous & reliable operation in the event of an error
Processing System
Memory
Application
Processing Unit
Programmable Logic
A53 A53
A53 A53
Off
Off
Power
Down
Power
Down
Battery Power Domain
Low Power
Domain
Full Power
Domain
VCC_PSBATT
PL Domain
General
ConnectivitySecurity
System
Control
PMU
Power
SystemMonitor
Triple
Redundant
Processor
32KB ROM
128KB RAM
With ECC
Power
Domain Controls
Peripheral
& Memory
Access
IO Unit
&
Interrupt
Controller
Wake
Signals
Platform Management Unit Block Diagram
Power
Down
© Copyright 2016 Xilinx.
Page 56
UltraScale+™ Programmable Logic
Security, ReliabilityDecryption, Anti-Tamper
SEU Resilience
External MemoryDDR4 at 2,666Mb/s
DDR4
DSPFloating & Fixed Point Enhanced
Block RAMHardened cascading
UltraRAMMassive Capacity
SRAM replacement
Networking IP100G Ethernet
150G Interlaken
Transceivers16G & 28G backplane
32.75G chip-to-chip
PCI Express®Gen3 x16
Gen4 x8
I/O InterfacingHigh-Density I/O
MIPI D-PHY Support
© Copyright 2016 Xilinx.
Page 57
Embedded Software Development ToolsFeature Benefit
Eclipse-Based IDE Familiar software development environment
Linaro GCC Tool Chain (Industry standard compiler tool chain for Embedded Linux & Bare Metal (included in SDK)
Multi-Core Debug Debug & cross triggering for Cortex-A53s, Cortex-R5s, and MicroBlaze™ Processor
Performance Profiling & Analysis Analyze interfaces across processing and programmable logic domains
Ecosystem Development Tools • Broad support for 3rd party dev tools & debug, e.g., ARM DS-5, Lauterbach Trace-32
• Designers use their preferred development & debug environment
Xilinx Software Design Kit for SW Dev and Project, Build, & Tool Chain Management
© Copyright 2016 Xilinx.
Page 58
Reference DesignsExamples of System Topologies to Jump-Start Differentiation
Reference Design(e.g., Boot Loaders, Firmware, Framework, OSs)
Example Design (SMP Linux / RPU Split)
SMP Linux FreeRTOS
Start System Development Immediately
Inter-Processor Framework
APUR51 Core R51 Core
RPU
Message Passing
C-Code
User
App
User
App
Pro
vide
d by
Xili
nx
Features Details & Benefits
Common System
Topologies
• Pre-built & validated
• Enables immediate application development
“Mini-Reference Designs”
• Incrementally build to full system solution, e.g.,
• OS implementation
• ‘Hello World’ for each processor on top of OS
• Processing System & FPGA logic integration
• SDSoC software acceleration
• OpenAMP communication
Available Topologies
SMP Linux / RPU Split• APU: SMP Linux
• RPU: Baremetal (R51), FreeRTOS (R52)
SMP Linux / RPU Lock-Step• APU: SMP Linux
• RPU: Baremetal (R51), FreeRTOS (R52)
Hypervisor• APU: SMP Linux
• RPU: Baremetal (R51), FreeRTOS (R52)
Baremetal
© Copyright 2016 Xilinx.
Page 59
© Copyright 2016 Xilinx.
Page 60
© Copyright 2016 Xilinx.
Page 61
© Copyright 2016 Xilinx.
UltraZed-EG SOM
Xilinx Zynq
UltraScale+ MPSoC
DDR4 SDRAM
(2GB)
QSPI Flash
(64MB)
eMMC Flash
(8GB)
Gigabit Ethernet
PHY
USB 2.0
PHY
PMBus Voltage
Regulators
© Copyright 2016 Xilinx.
UltraZed-EG SOM Mechanical Dimensions
© Copyright 2016 Xilinx.
Page 64
• Zynq UltraScale +
CG
© Copyright 2016 Xilinx.
Page 65
Different Applications Have Different Processing Needs
Motion Control
Machine Vision
Application
Processor
x2
Real-Time
Processor
x2
Real-Time
Processor
x2
Application
Processor
x4
Graphics
ProcessorVideo Codec
ISM Applications
Scalable Common Architecture - Feature and cost optimized by application
© Copyright 2016 Xilinx.
Zynq® UltraScale+™ MPSoC: CG Devices
Application
Processor64-bit Dual-Core Application
Processor64-bit Quad-Core
Zynq® UltraScale+™ MPSoC: EG & EV Devices
Real-Time
Processors32-bit Dual-Core
Platform & Power
ManagementGranular Power Control
Functional Safety
Configuration &
Security UnitAnti-Tamper & Trust
Industry Standards
Fabric AccelerationCustomizable Engines
High Speed Connectivity
Video Codec8K4K (15fps)
4K2K (60fps)
High Speed
PeripheralsKey Interfaces
Graphics
ProcessorARM Mali-400MP2
Memory
SubsystemHigh Bandwidth
Low Latency
© Copyright 2016 Xilinx.
Hig
h-E
nd
Mid
-Ran
ge
Lo
w-E
nd
Page 67
Extending the Zynq® Portfolio
Dual-core ARM® Cortex™-A9
28nm Artix®-7 FPGA
Dual-core ARM Cortex-A9
28nm Kintex®-7 FPGA
Dual-Core ARM Cortex-R5
Dual-Core ARM Cortex-A53
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali™-400 MP2
16nm FinFET+ Logic
Dual-Core ARM Cortex-R5
Quad-Core ARM Cortex-A53
ARM Mali-400 MP2
H.264/H.265 Video Codec
16nm FinFET+ Logic
© Copyright 2016 Xilinx.
Page 68
Completing the Zynq UltraScale+ MPSoC Portfolio
Seven New CG Devices for Increased Market Reach
EV Devices for Applications Requiring a Video Codec
Extended Range of EG Devices for Greater Flexibility
Dual-Core RPU
Dual-Core APU
Quad-Core APU
Dual-Core RPU
GPU
Quad-Core APU
Dual-Core RPU
GPU
VCU
Processor Scalability to meet diverse market requirements
© Copyright 2016 Xilinx.
Page 69
Zynq UltraScale+ MPSoC Device Migration Table
Zynq® UltraScale+™ MPSoC
Pkg mmCG Devices EG Devices EV Devices
ZU2CG ZU3CG ZU4CG ZU5CG ZU6CG ZU7CG ZU9CG ZU2EG ZU3EG ZU4EG ZU5EG ZU6EG ZU7EG ZU9EG ZU11EG ZU15EG ZU17EG ZU19EG ZU4EV ZU5EV ZU7EV
A484 19 X X X X
A625 21 X X X X
C784 23 X X X X X X X X X
B900 31 X X x X X X X X X
C900 31 X X x X X
B1156 35 X X x X X
C1156 35 x x X X
B1517 40 X X X
F1517 40 x x X X
C1760 42.5 X X X
D1760 42.5 X X
E1924 45 X X
© Copyright 2016 Xilinx.
Page 70
16nm UltraScale+ Is Now In Production
Expanding On Our One Year Lead at 16nm
KU3P, KU5P, KU9P
Devices
VU3P
DeviceZU2, ZU3, ZU6, ZU9
EG/CG Devices
© Copyright 2016 Xilinx.
Page 71
Roadmap
Where are the FPGA /
SOC technology
taking us – what is the
future ?
© Copyright 2016 Xilinx.
Bandwidth-Hungry Applications Drive Memory SolutionsGrowing bandwidth gap between commodity memory solutions vs. requirements of high-end systems
4K/8K Multi-Pass
Video Processing
HPC Analytics &
Image Recognition
Network Function
Virtualization
& Bridging
2008 2011 2014 2017
Ethernet Video DSP Capability DDR
Bandw
idth
Year
Ethernet
Video
DSP Capability
DDR
Ethernet Trend10G 40G 100G 400G
Video Trend1080P 2K 4K 8K
DDR Trend2,133 (DDR3) 2,667 (DDR4)
FPGA DSP Trend2,000 (40nm) 12,000 (16nm)
A revolutionary increase in
memory bandwidth is needed
© Copyright 2016 Xilinx.
Obtaining Superior Bandwidth-per-Watt
DDR-4 DIMM
Standard
commodity
memory used in
Servers and PC’s.
Bandwidth 21.3 GB/s
Depth 16 GB
Price / GB $
PCB Req High
pJ / bit ~27
Latency Med
HMC
Hybrid-Memory Cube
Serial DRAM
Bandwidth 160 GB/s
Depth 4 GB
Cost / GB $$$
PCB Req Med
pJ / bit ~30
Latency High
Bandwidth 12.8 GB/s
Depth 2 GB
Cost / GB $$
PCB Req High
pJ / bit ~40
Latency Low
Bandwidth 460 GB/s
Depth 8 GB
Cost / GB $$
PCB Req None
pJ / bit ~7
Latency Med
RLDRAM-3
Low Latency DRAM for
packet buffering
applications
HBM
High Bandwidth Memory
DRAM integrated into
the FPGA package
* Single DDR4 DIMM * Two x36 RLDRAM-3 * Single HMC Device * Single FPGA with HBM
© Copyright 2016 Xilinx.
Introducing Virtex UltraScale+ HBM Devices20X more bandwidth than a DDR4 DIMM
DRAM stacks integrated
using SSI Technology
Dedicated hardened
interface to the HBM for
maximized bandwidth
Built on the proven Virtex
UltraScale+ FPGA platform
Memory Controller uses
AXI interface for easy
integration using Vivado IPIHBM Gen2 represents the
highest DRAM bandwidth
available
Hardened Cache
Coherent Interconnect
(CCIX) Ports
© Copyright 2016 Xilinx.
Built Using Proven Assembly Technology
Xilinx pioneered CoWoS (SSI Technology)
back in 28nm
– This is the 3rd generation of Xilinx using CoWoS
(ChipOnWaferOnSubstrate)
CoWoS is the lowest risk assembly
for Virtex UltraScale+ HBM
CoWoS is the de facto standard
assembly for HBM integration
– GPU vendors are already using this assembly
White Paper circa 2012
© Copyright 2016 Xilinx.
Page 76
Virtex® UltraScale+™ HBM FPGAs
Device Name VU31P VU33P VU35P VU37P
Logic
System Logic Cells (K) 970 970 1,915 2,860
CLB Flip-Flops (K) 887 887 1,751 2,615
CLB LUTs (K) 444 444 876 1,308
Memory
Max. Distributed RAM (Mb) 12.5 12.5 24.6 36.7
Total Block RAM (Mb) 23.6 23.6 47.3 70.9
UltraRAM (Mb) 90 90 180 270
HBM DRAM (Gb) 32 64 64 64
HBM AXI Ports 32 32 32 32
Clocking Clock Management Tiles (CMTs) 4 4 8 12
Integrated IP
DSP Slices 2,880 2,880 5,952 9,024
PCIe® Gen3 x16 / Gen4 x8 4 4 5 6
CCIX Ports(2) 4 4 4 4
150G Interlaken 0 0 2 4
100G Ethernet w/ RS-FEC 2 2 5 8
I/OMax. Single-Ended HP I/Os 208 208 416 624
GTY 32.75Gb/s Transceivers 32 32 64 96
Speed Grades Extended(1) -1, -2L, -3 -1, -2L, -3 -1, -2L, -3 -1, -2L, -3
Footprint(1) Dimensions (mm) HP I/O, GTY 32.75Gb/s
Packaging
H1924 45x45 208, 32
H2104 47.5x47.5 208, 32 416, 64
H2892 55x55 416, 64 624, 96Notes:
1. All packages are 1.0mm ball pitch.
2. A CCIX port requires the use of a PCIe Gen3 x16 / Gen4 x8 block
© Copyright 2016 Xilinx.
56G PAM4 Transceivers Coming to 16nm“There Is One More Thing…”
Page 77
C
O
N
F
I
D
E
N
C
E
56G Test ChipJan 2016
(Demo Video)
4th Generation
Adaptive RX Equalization
Proven
Foundation
Virtex
UltraScale+Swap GTYs for GTMs
Test Chips
in Progress
More Details Later
This Year
Timed with
Optics
Availability
© Copyright 2016 Xilinx.
Page 78
The First All Programmable RFSoC
Integrated RF-Class Analog
Technology
Full Programmability Across the
Analog-Digital Signal Chain
Delivering up to 50-70% Power
and Footprint Reduction
© Copyright 2016 Xilinx.
Page 79
Reduced Power, Form Factor, and Design Cycle
Power
Form Factor
Design Cycle
I/O Timing Closure
Virtex® UltraScale™ VU35P
HBM
RoleIPSec, SSL, Firewall,
GZIP, OSV, SHA-1/2
HBM ControllerPCIe/
CCIX
400GE
MAC
NIC w/Half the Height & LengthAll Programmable Device
1.75 Watts
2.25 Watts
1.75 Watts
ADC
DAC
ADC
DAC
Tra
nsce
ive
rsT
ran
sce
ive
rs
JESD204Converter
Interface IP
JESD204Converter
Interface IP
Analog DesignAnalog Interface Analog Design
System DesignSystem Design
1 Watt
1 Watt
Digital DesignEmbedded Design
Digital
DesignProcessing
System
ADC
DAC
ADC
DAC
2.25 Watts
© Copyright 2016 Xilinx.
Page 80
Advantages of All Programmable RFSoC
RF Sampling for Platform Flexibility
• RF-design moved to the digital domain for full programmability
• Reduces & minimizes analog signal processing components
Shorter Design Cycle
• Simplified system design with fewer components
• Eliminates JES204B/C analog interface design
Dramatic System Footprint Reduction
• Eliminates discrete converters
• Enables scalability for increasing channel count
Reduced System Power
• Reduces data converter power
• Eliminates FPGA-to-Analog interface power
© Copyright 2016 Xilinx.
Prior Experience with Analog Design & IntegrationFully Integrated Test Chip
12-bit 4 GSPS ADCs
14-bit 6.4 GSPS DACs
Published
Research Results
2014
Integrated ADC & DAC
with Virtex-7 FPGA
28nm Test Chip
Designed & Validated
2012
16nm FinFET Test Chip
Designed & Validated
2016
Page 81
© Copyright 2016 Xilinx.
Page 82
Development tool’s for
FPGA / SOC
now and the future
© Copyright 2016 Xilinx.
Vivado Design Suite
Page 83
High-level
Synthesis
Standards based
IP reuse
Fast simulation and HW co-simulation
IP
Integrator
Tcl SDC
ISimVivado
Runtime
3X
230+ LogiCORE & SmartCore IP
© Copyright 2016 Xilinx.
Page 84
SDSoC: HW Acceleration from C/C++ Applications
Move C/C++ functions to hardware
Full system generation including driver
and hardware connectivity
System-level debug and profile
Rapid HW partitioning and exploration
C/C++ Applications
System-level Profiling
Specify Functions for
Acceleration
Full System Generation
Performance
Estimation
© Copyright 2016 Xilinx.
Page 85
Before SDSoC: HW/SW Partition Exploration
PL
PS
ApplicationSDKC/C++
DriverSDK, OS ToolsC
IP IntegratorIPI projectDatamover
PS-PL interface
IPVivadoHLS
Verilog, VHDL
HW-SW partition
spec
Met
Req
?
Involves Multiple Disciplines to Explore Architecture
© Copyright 2016 Xilinx.
Page 86
SDSoC: Full-system Generation from Exploration
C/C++
Select functions
for PL
PL
PS
IP
Application
Driver
SDSoC
Datamover
PS-PL interface
Met
Req
?
C/C++ Applications to System in hours
Func1();
Func2();
Func3();
© Copyright 2016 Xilinx.
Easy to use Eclipse IDE
One click to accelerate functions
in Programmable Logic (PL)
Optimized libraries
– Xilinx, ARM and Partners
– DSP, Video, fixed point, linear
algebra, BLAS, OpenCV
Support for Linux, FreeRTOS
and baremetal
– Additional OS support in future
releases
SDSoC: Embedded C/C++ Applications Programming Experience
C/C++ Development
Page 87
© Copyright 2016 Xilinx.
Rapid system performance estimation
– Full system estimation (programmable
logic, data communication, processing
system)
– Reports SW/HW cycle level performance
and hardware utilization
Automated performance
measurement
– Runtime measurement by instrumentation
of cache, memory, and bus utilization
SDSoC: System Level Profiling
Page 88
© Copyright 2016 Xilinx.
Rapid software configurable
application acceleration using
C/C++
– Automated function acceleration in
programmable logic
– Up to 100X increase in performance
vs. software
– System optimized for latency, bandwidth,
and hardware utilization
SDSoC: Full System Optimizing Compiler
Page 89
© Copyright 2016 Xilinx.
Page 90
© Copyright 2016 Xilinx.
Machine learning is using exposure to data to learn and not programming of rulesMultiLayer Neural Network to develop intelligent systemsCNN or Convolutional Neural Networks are using for image detection
Page 91
© Copyright 2016 Xilinx.
Page 92
© Copyright 2016 Xilinx.
Page 93
© Copyright 2016 Xilinx.
For deployment you always need 3 things !
Page 94
• Framework - Free & Open Source SW environment used to train and optimize you network model
© Copyright 2016 Xilinx.
Page 95
© Copyright 2016 Xilinx.
Page 96
Frameworks
Libraries and Tools
Development Kits
DNN
CNNGoogLeNet
SSD
FCN …
© Copyright 2016 Xilinx.
reVISION: Enabling Software Defined Development Flow
System Optimizing
Compiler Machine Learning
Scheduling of Pre-Optimized
Neural Network Layers
Optimized Accelerators
& Data Motion Network
.prototxt
& Trained
Weights
DNN
CNNGoogLeNet
SSD
FCN …
© Copyright 2016 Xilinx.
reVISION: Enabling Software Defined Development Flow
C/C++/OpenCL
Creation
Profiling to Identify
Bottlenecks
System Optimizing
Compiler
Computer Vision
Machine Learning
Scheduling of Pre-Optimized
Neural Network Layers
Optimized Accelerators
& Data Motion Network
.prototxt
& Trained
Weights
DNN
CNNGoogLeNet
SSD
FCN …