beyond cmos computing: the interconnect …beyond cmos computing: the interconnect challenge irene...
TRANSCRIPT
Beyond CMOS Computing:The Interconnect Challenge
Irene M. Qualters
With significant contributions by S. Basu and Y. Solihin
National Science Foundation
November 29, 2017
1
In the beginning….John von Neumann (1945)
The First Draft of a Report on the EDVACSec. 12 Capacity of the Memory M. General Principles
It will be seen that it is desirable to have a capacity of minor cycles which is a power of two . This makes the choices of 8,000 or 2,000 minor cycles of a convenient approximate size: They lie very near to powers of two. We consider accordingly these two total memory capacities: 8,196 = 213 or 2,048 = 211 minor cycles, i.e. 262,144 = 218 or 65,536 = 216 units. For the purposes of the discussions which follow we will use the first higher estimate…..
This result deserves to be noted. It shows in a most striking way where the real difficulty, the main bottleneck, of an automatic very high speed computing device lies: At the memory.
2
Evolution of transistors and memory
Photo credits: Fairchild Camera and Instrument Corporation, Intel Corporation (Note that images are not to scale), Nvidia Corporation
Magnetic core memory 256-bit bipolar FairchildSRAM Illiac IV
SanDisk SSD module for IBMDEC VAX memory withIntel 1103 DRAM
Photo credits: http://www.computerhistory.org/timeline/memory-storage/ , Micron
Micron NVDIMM -N32 GB DRAM/64GB NAND flash
3
SRC ‘Emerging Interconnect Technologies’
Workshop Summary Report
Workshop Date: June 18, 2012
Location: Stanford University
Organized by the SRC’s Interconnect & Packaging Sciences Area
Co-Leaders: Rod Augur (GLOBALFOUNDRIES), James Clarke (Intel), Jon Candelaria (SRC)
4
Interconnect technology identified as largest single factor limiting both performance and power as well as reliability
Credit: N. Magen (Intel), et al; SLIP Workshop 2004 (from A. Naeemi, Georgia Tech)
5
Interconnect Performance, Reliability, and Cost Trends Rapidly Deteriorating – Design space is disappearing
Credit: O. Aubel (GLOBALFOUNDRIES) (from Rod Augur;
SRC Emerging Interconnect Technology Workshop, 2013)
Credit: James Clarke (Intel); SRC Emerging Interconnect Technology Workshop, 2013)
Total Cross-Sectional Resistance (TCR) scaling
Credit: James Clarke (Intel); SRC Emerging Interconnect Technology Workshop, 2012)
Interconnect scaling limiting circuit performance
6
Themes….
• Resistance of Metallic Nanowires
• Nanowire Interconnects
• Plasmonics and Nanophotonics for Chip-scale Information Processing and Transport
• Carbon-Based Interconnects….reducing line resistance
• Quantum Metallics…resistivity engineering: Phonon scattering, grain boundary scattering, widewall scattering
• Inverse Design of Dielectric Materials…electrical reliability and packaging reliability
7
Prioritized Research Gaps and Recommendations: (not in any particular order)
➢ Conductor materials research that aims to minimize or eliminate the effects of grain boundary scattering
➢ Research aimed at discovering new conductor materials that are either self-cladding or do not require (diffusion)
barriers
➢ Conductor materials research that is aimed at significantly surpassing the electromigration and stress voiding lifetime
of copper
➢ Research to identify conductor materials that exhibit ‘specular surface scattering’, including research that
comprehends the potential synergies between nanowire research and phonon engineering
➢ Novel processes and structures research that discovers alternatives to conventional dual-damascene scaling
methods, including directed self-assembly (i.e., ‘bottom-up’) or other similar approaches which avoid patterned (i.e.,
top-down’) issues such as edge damage, misalignment challenges, etc.
➢ Novel dielectric materials research
➢ Techniques that minimize or eliminate the edge effects that cause increased resistance
➢ Optimization and integration of a substrate material that optimizes the resistance and physical characteristics of
graphene nanoribbons (e.g., high-k/low-k hybrid substrates, charge screening effect incorporation, etc.)
➢ Methods to achieve 100% metallic CNTs and/or post-processing techniques for conversion to this state
➢ Methods to improve the scalable placement and routing of connected CNT interconnect systems
➢ Methods to repeatably lower the contact resistance for CNTs and GNRs
➢ Novel circuit and system architectures that maximize the ‘design window’ for interconnect systems considering
performance, energy, cost, and reliability criteria
8
Workshop on Emerging Technologies for InterconnectsFinal Report
David H. Albonesi, Avinash Kodi, Vladimir Stojanovic, Editors
July 15, 2013Contributors:David Albonesi, Alyssa Apsel, Rajeev Balasubramonian, Keren Bergman, Shekhar Borkar,
Pradip Bose, Luca Carloni, Patrick Chiang, Shaya Fainman, Ron Ho, Avinash Kodi, Michal
Lipson, Jose Martinez, Li-Shiuan Peh, Milos Popovic, Rajeev Ram, Vladimir Stojanovic.
9
Working Group Leaders:
Electrical Interconnect Architecture:Shekhar Borkar, Pradip Bose
Photonic Interconnect Architecture:Keren Bergman, Norm Jouppi
Crosscutting Tools:Luca Carloni, Shaya Fainman
From 1970s-2004, Moore’s Law scaling drove performance improvements
10
Communication begins to dominate the increasingly constrained power budget
Figure 2: Comparison of the energy scaling trends for computation and global communication. At
45nm, the energy for each was roughly equal. At 7nm, the energy for global communication is
projected to be 3.75X that used for computation. [Source: Shekhar Borkar] 11
Processor-memory performance gap continues to increase
12
Interconnect Research ChallengesApplication Drivers:
• Big Data : e.g. astrophysics; genomics; medical imaging
• Graph Processing : unstructured data
• Streaming: face recognition, UAV
Credit: Y. Hezaveh, Stanford U: ALMA(NRAO/ESO/NAO); NASA/ESA/Hubble
• System Architecture• Network-on-chip(NoC) architecture, resilience
• Electronics Microarchitecture, Circuits, and Devices• Packaging, power, performance, density, reliability, heterogeneity
• Nanophotonics, Microarchitecture, Circuits and Devices• Technologies, topologies
• Cross-cutting tools – integrated design environment• System modeling, physical design models, interdisciplinary
13
Summary of Recommendations: “No Magic Bullet”1. A holistic research approach should be supported that spans the entire stack from devices to algorithms/applications.
2. Research is needed in NoC architectures for many-core systems, including work in hierarchical, heterogeneous systems employing mixed technologies, network switching approaches, and 3D integration.
3. The memory sub-system bottleneck must be addressed through improving the efficiency of memory data movement throughout the system...
4. Interconnect network resilience is a significant pending concern, and requires efforts in understanding the defect and failure modes of the components of emerging interconnect technologies, and the development of cross-layer mitigation approaches that are cost effective and energy efficient.
5. Significant work is needed in developing new electrical solutions in the areas of novel circuits, new materials, devices/circuits/architecture co-design, packaging, and power delivery and management.
6. Nanophotonic networks hold great potential and research is required at multiple levels, including novel devices and circuits, network topologies, easing parallel system programmability, power management, and improving resiliency.
7. Significant work on crosscutting modeling tools should be supported, including cost performance modeling tools and behavioral/cross-domain simulation tools that accelerate design and integration.
8. The creation of industry-relevant photonic technology should be supported by strengthening device and platform fabrication opportunities and forming a national center for nanophotonic platform fabrication.
9. Joint programs should be created with industry (perhaps with SRC mediation) to explore photonic integration opportunities in the in the definition of processes beyond 22 nm and special customizations at older generations.
“Processor chips since around 2000 are power, not area limited. All of the power is spent moving data around. It is important to optimize the entire interconnect system – the wire, the circuit, and the NoC together – not just each of the three in isolation.”
B. Dally (NVIDIA), 2012 (also quoting C. Moore (AMD), 2011) 14
Architecture 2030 Workshop @ISCA 2016Luis Ceze, Mark D. Hill, Thomas F. Wenisch
15
Opportunities identified by ARCH2030
16
TPU Block Diagram
Source: https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu
Opportunity: Accelerate tools and frameworks to democratize specialized hardware design
Source: NVIDIA
Volta Features Blazing Fast 16Gbps GDDR6 Memory
17
To date specialized hw viable only for huge markets
• NRE costs and time are prohibitive
• New tools: efficient simulation and architecture exploration support for heterogeneous architectures and emerging technologies
• New Interfaces: programmable Fabrics/Better Abstractions -> e.g. synthesizable IP blocks, chiplets integrated at manufacture
• Research readiness is high: 175 papers in 2016 major architecture conferences; GPUs and application-specific accelerators; machine learning
18
Center for Domain-Specific Computing (CDSC)Expeditions award (0853165 ), InTrans award (1436827 )
Reinman
(UCLA)
Palsberg
(UCLA)
Sadayappan
(Ohio-State)
Sarkar
(Associate Dir)
(Rice)
Vese
(UCLA)
Potkonjak
(UCLA)
• A diversified faculty team: 8 in Comp. Sc. & Eng; 1 in EE; 2 in medical school; 1 in applied math
• 15-20 postdocs and graduate students in four universities – UCLA, Rice, Ohio-State, and UC Santa Barbara
Aberle
(UCLA)
Baraniuk
(Rice)Bui
(UCLA)Cong (Director)
(UCLA)
Cheng
(UCSB)Chang
(UCLA)
www.cdsc.ucla.edu.
Customizable Heterogeneous Platform (CHP)
$ $ $ $
FixedCore
FixedCore
FixedCore
FixedCore
CustomCore
CustomCore
CustomCore
CustomCore
ProgFabric
ProgFabric
ProgFabric
ProgFabric
DRAM
DRAM
I/O
CHP
CHP
CHP
Reconfigurable RF-I busReconfigurable optical busTransceiver/receiverOptical interface
Overview of the Customizable Heterogeneous Platform (CHP)
CHP mapping
Source-to-source CHP mapper
Reconfiguring & optimizing backend
Adaptive runtime
Domain-specific-modeling
(healthcare applications)
CHP creationCustomizable computing engines Customizable interconnects
Architecture modeling
Customization settingDesign once Invoke many times
A general methodology for designing “supercomputer-in-a-box” with 100X performance/power improvement
13
The First Innovation Transitions (InTrans) Award from Intel + NSF
Opportunity: Adopt “cloud” abstraction for cross-layer architectural innovation• Virtualization: Transparent
introduction of hardware and software innovations; focused optimization
• Scaling benefits of specialized architectures across markets : • Microsoft “Catapult; Cavium
“Thunderx”; Google “Tensor Processing Unit
24
Source: Microsoft
Opportunity: Going Vertical
3D integration: new dimension of scalability; heterogeneous manufacturing technologies-> energy efficiency, higher bandwidth, lower latency
• Resurgence of 90’s interest in “near-data computing” and “processing-in-memory” architectures
• Rethinking memory and storage hierarchies (e.g. persistent data objects)?
• Manufacturability challenges: high reliability and yield
25
3D packaging….Nonvolatile memory….Carbon Nanotubes/Graphene…
Credit: Intel
Instead of relying on silicon-based devices, researchers and Stanford University and MIT have built a new chip that uses carbon nanotubes and resistive random-access memory (RRAM) cells. The two are built vertically over one another, making a new, dense 3-D computer architecture with interleaving layers of logic and memory. This work was funded by DARPA, NSF, SRC, STARnetSONIC, and member companies of the Stanford SystemX Alliance.
Credit: MIT
26
New 3-D chip combines computing and data storage
Credit: MIT July 5, 2017
27
The results are published today in the journal Nature, by lead author Max Shulaker, an assistant professor of electrical engineering and computer science at MIT. Shulaker began the work as a PhD student alongside H.-S. Philip Wong and his advisor Subhasish Mitra, professors of electrical engineering and computer science at Stanford. The team also included professors Roger Howe and Krishna Saraswat, also from Stanford.
• “Compatible with today’s silicon infrastructure, both in terms of fabrication and design” says Howe.
• The key in this work is that carbon nanotube circuits and RRAM memory can be fabricated at much lower temperatures, below 200 C” .
• “…researchers took advantage of the ability of carbon nanotubes to also act as sensors…over 1 million carbon nanotube-based sensors, used to detect and classify ambient gases”
• “In addition to improved devices, 3-D integration can address another key consideration in systems: the interconnects within and between chips,” Saraswat adds.
Evolution of Research: Nano-engineered Computing Systems Technology (N3XT)
28Source: Mohamed M. Sabry Aly et al, “Energy-Efficient Abundant-Data Computing: The N3XT 1,000x”, IEEE Computer, vol. 48, no.12, 2015, pp. 24-33
Opportunity: Architectures closer to Physics• Emerging data choices beyond traditional memory/storage hierarchy with
fundamentally different cost, density, latency, throughput, reliability and endurance tradeoffs.
• Carbon nanotubes
• Quantum computing and superconducting logic
• Biological substrates, DNA computing/archival storage/self assembly, biomolecules such as proteins for computing
29
“DNA-based storage has the potential to be the ultimate archival storage solution: it is extremely dense and durable. While this is not practical yet due to the current state of DNA synthesis and sequencing, both technologies are improving at an exponential rate with advances in the biotechnology industry. Given the impending limits of silicon technology, we believe that hybrid silicon and biochemical systems are worth serious consideration: time is ripe for computer architects to consider incorporating biomolecules as an integral part of computer design. DNA-based storage is one clear example of this direction.” *
* A DNA-Based Archival Storage System. James Bornholt, Randolph Lopez, Douglas M. Carmean, Luis Ceze, Georg Seelig, and Karin Strauss. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2016
Opportunity: Machine Learning Workloads
• Large-scale model training
• Fast-turn around new models
Source: Yan Solihin/Internet 30
Observations
• Identify application drivers/space to drive architecture
• 3D integration with heterogeneous technologies is a powerful and necessary bridging capability to the future
• Cross-layer design tools, abstractions, and multidisciplinary teams are needed to promote innovation
31
References
• von Neumann, John (1945), First Draft of a Report on the EDVAC (PDF), retrieved August 24, 2011
• SRC ‘Emerging Interconnect Technologies’ Workshop Summary Report
• ‘Emerging Interconnect Technologies’ by Prof. K. Saraswat/Stanford
• Workshop on Emerging Technologies for Interconnects Final Report
• New 3_D chip combines computing and data storage
• 21st Century Computer Architecture
• Energy-Efficient Abundant-Data Computing: The N3XT 1,000x
• A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
32