beyond cmos computing: the interconnect …beyond cmos computing: the interconnect challenge irene...

Beyond CMOS Computing:The Interconnect Challenge

Irene M. Qualters

With significant contributions by S. Basu and Y. Solihin

National Science Foundation

November 29, 2017

1

In the beginning….John von Neumann (1945)

The First Draft of a Report on the EDVACSec. 12 Capacity of the Memory M. General Principles

It will be seen that it is desirable to have a capacity of minor cycles which is a power of two . This makes the choices of 8,000 or 2,000 minor cycles of a convenient approximate size: They lie very near to powers of two. We consider accordingly these two total memory capacities: 8,196 = 213 or 2,048 = 211 minor cycles, i.e. 262,144 = 218 or 65,536 = 216 units. For the purposes of the discussions which follow we will use the first higher estimate…..

This result deserves to be noted. It shows in a most striking way where the real difficulty, the main bottleneck, of an automatic very high speed computing device lies: At the memory.

2

https://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC

Evolution of transistors and memory

Photo credits: Fairchild Camera and Instrument Corporation, Intel Corporation (Note that images are not to scale), Nvidia Corporation

Magnetic core memory 256-bit bipolar FairchildSRAM Illiac IV

SanDisk SSD module for IBMDEC VAX memory withIntel 1103 DRAM

Photo credits: http://www.computerhistory.org/timeline/memory-storage/ , Micron

Micron NVDIMM -N32 GB DRAM/64GB NAND flash

3

http://www.computerhistory.org/siliconengine/timeline/

http://www.computerhistory.org/siliconengine/timeline/

http://www.computerhistory.org/timeline/memory-storage/

SRC ‘Emerging Interconnect Technologies’

Workshop Summary Report

Workshop Date: June 18, 2012

Location: Stanford University

Organized by the SRC’s Interconnect & Packaging Sciences Area

Co-Leaders: Rod Augur (GLOBALFOUNDRIES), James Clarke (Intel), Jon Candelaria (SRC)

4

Interconnect technology identified as largest single factor limiting both performance and power as well as reliability

Credit: N. Magen (Intel), et al; SLIP Workshop 2004 (from A. Naeemi, Georgia Tech)

5

Interconnect Performance, Reliability, and Cost Trends Rapidly Deteriorating – Design space is disappearing

Credit: O. Aubel (GLOBALFOUNDRIES) (from Rod Augur;

SRC Emerging Interconnect Technology Workshop, 2013)

Credit: James Clarke (Intel); SRC Emerging Interconnect Technology Workshop, 2013)

Total Cross-Sectional Resistance (TCR) scaling

Credit: James Clarke (Intel); SRC Emerging Interconnect Technology Workshop, 2012)

Interconnect scaling limiting circuit performance

6

Themes….

• Resistance of Metallic Nanowires

• Nanowire Interconnects

• Plasmonics and Nanophotonics for Chip-scale Information Processing and Transport

• Carbon-Based Interconnects….reducing line resistance

• Quantum Metallics…resistivity engineering: Phonon scattering, grain boundary scattering, widewall scattering

• Inverse Design of Dielectric Materials…electrical reliability and packaging reliability

7

Prioritized Research Gaps and Recommendations: (not in any particular order)

➢ Conductor materials research that aims to minimize or eliminate the effects of grain boundary scattering

➢ Research aimed at discovering new conductor materials that are either self-cladding or do not require (diffusion)

barriers

➢ Conductor materials research that is aimed at significantly surpassing the electromigration and stress voiding lifetime

of copper

➢ Research to identify conductor materials that exhibit ‘specular surface scattering’, including research that

comprehends the potential synergies between nanowire research and phonon engineering

➢ Novel processes and structures research that discovers alternatives to conventional dual-damascene scaling

methods, including directed self-assembly (i.e., ‘bottom-up’) or other similar approaches which avoid patterned (i.e.,

top-down’) issues such as edge damage, misalignment challenges, etc.

➢ Novel dielectric materials research

➢ Techniques that minimize or eliminate the edge effects that cause increased resistance

➢ Optimization and integration of a substrate material that optimizes the resistance and physical characteristics of

graphene nanoribbons (e.g., high-k/low-k hybrid substrates, charge screening effect incorporation, etc.)

➢ Methods to achieve 100% metallic CNTs and/or post-processing techniques for conversion to this state

➢ Methods to improve the scalable placement and routing of connected CNT interconnect systems

➢ Methods to repeatably lower the contact resistance for CNTs and GNRs

➢ Novel circuit and system architectures that maximize the ‘design window’ for interconnect systems considering

performance, energy, cost, and reliability criteria

8

Workshop on Emerging Technologies for InterconnectsFinal Report

David H. Albonesi, Avinash Kodi, Vladimir Stojanovic, Editors

July 15, 2013Contributors:David Albonesi, Alyssa Apsel, Rajeev Balasubramonian, Keren Bergman, Shekhar Borkar,

Pradip Bose, Luca Carloni, Patrick Chiang, Shaya Fainman, Ron Ho, Avinash Kodi, Michal

Lipson, Jose Martinez, Li-Shiuan Peh, Milos Popovic, Rajeev Ram, Vladimir Stojanovic.

9

Working Group Leaders:

Electrical Interconnect Architecture:Shekhar Borkar, Pradip Bose

Photonic Interconnect Architecture:Keren Bergman, Norm Jouppi

Crosscutting Tools:Luca Carloni, Shaya Fainman

From 1970s-2004, Moore’s Law scaling drove performance improvements

10

Communication begins to dominate the increasingly constrained power budget

Figure 2: Comparison of the energy scaling trends for computation and global communication. At

45nm, the energy for each was roughly equal. At 7nm, the energy for global communication is

projected to be 3.75X that used for computation. [Source: Shekhar Borkar] 11

Processor-memory performance gap continues to increase

12

Interconnect Research ChallengesApplication Drivers:

• Big Data : e.g. astrophysics; genomics; medical imaging

• Graph Processing : unstructured data

• Streaming: face recognition, UAV

Credit: Y. Hezaveh, Stanford U: ALMA(NRAO/ESO/NAO); NASA/ESA/Hubble

• System Architecture• Network-on-chip(NoC) architecture, resilience

• Electronics Microarchitecture, Circuits, and Devices• Packaging, power, performance, density, reliability, heterogeneity

• Nanophotonics, Microarchitecture, Circuits and Devices• Technologies, topologies

• Cross-cutting tools – integrated design environment• System modeling, physical design models, interdisciplinary

13

Summary of Recommendations: “No Magic Bullet”1. A holistic research approach should be supported that spans the entire stack from devices to algorithms/applications.

2. Research is needed in NoC architectures for many-core systems, including work in hierarchical, heterogeneous systems employing mixed technologies, network switching approaches, and 3D integration.

3. The memory sub-system bottleneck must be addressed through improving the efficiency of memory data movement throughout the system...

4. Interconnect network resilience is a significant pending concern, and requires efforts in understanding the defect and failure modes of the components of emerging interconnect technologies, and the development of cross-layer mitigation approaches that are cost effective and energy efficient.

5. Significant work is needed in developing new electrical solutions in the areas of novel circuits, new materials, devices/circuits/architecture co-design, packaging, and power delivery and management.

6. Nanophotonic networks hold great potential and research is required at multiple levels, including novel devices and circuits, network topologies, easing parallel system programmability, power management, and improving resiliency.

7. Significant work on crosscutting modeling tools should be supported, including cost performance modeling tools and behavioral/cross-domain simulation tools that accelerate design and integration.

8. The creation of industry-relevant photonic technology should be supported by strengthening device and platform fabrication opportunities and forming a national center for nanophotonic platform fabrication.

9. Joint programs should be created with industry (perhaps with SRC mediation) to explore photonic integration opportunities in the in the definition of processes beyond 22 nm and special customizations at older generations.

“Processor chips since around 2000 are power, not area limited. All of the power is spent moving data around. It is important to optimize the entire interconnect system – the wire, the circuit, and the NoC together – not just each of the three in isolation.”

B. Dally (NVIDIA), 2012 (also quoting C. Moore (AMD), 2011) 14

Architecture 2030 Workshop @ISCA 2016Luis Ceze, Mark D. Hill, Thomas F. Wenisch

15

Opportunities identified by ARCH2030

16

TPU Block Diagram

Source: https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu

Opportunity: Accelerate tools and frameworks to democratize specialized hardware design

Source: NVIDIA

Volta Features Blazing Fast 16Gbps GDDR6 Memory

17

https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu

To date specialized hw viable only for huge markets

• NRE costs and time are prohibitive

• New tools: efficient simulation and architecture exploration support for heterogeneous architectures and emerging technologies

• New Interfaces: programmable Fabrics/Better Abstractions -> e.g. synthesizable IP blocks, chiplets integrated at manufacture

• Research readiness is high: 175 papers in 2016 major architecture conferences; GPUs and application-specific accelerators; machine learning

18

Center for Domain-Specific Computing (CDSC)Expeditions award (0853165 ), InTrans award (1436827 )

Reinman

(UCLA)

Palsberg

(UCLA)

Sadayappan

(Ohio-State)

Sarkar

(Associate Dir)

(Rice)

Vese

(UCLA)

Potkonjak

(UCLA)

• A diversified faculty team: 8 in Comp. Sc. & Eng; 1 in EE; 2 in medical school; 1 in applied math

• 15-20 postdocs and graduate students in four universities – UCLA, Rice, Ohio-State, and UC Santa Barbara

Aberle

(UCLA)

Baraniuk

(Rice)Bui

(UCLA)Cong (Director)

(UCLA)

Cheng

(UCSB)Chang

(UCLA)

www.cdsc.ucla.edu.

Customizable Heterogeneous Platform (CHP)

$ $ $ $

FixedCore

FixedCore

FixedCore

FixedCore

CustomCore

CustomCore

CustomCore

CustomCore

ProgFabric

ProgFabric

ProgFabric

ProgFabric

DRAM

DRAM

I/O

CHP

CHP

CHP

Reconfigurable RF-I busReconfigurable optical busTransceiver/receiverOptical interface

Overview of the Customizable Heterogeneous Platform (CHP)

CHP mapping

Source-to-source CHP mapper

Reconfiguring & optimizing backend

Adaptive runtime

Domain-specific-modeling

(healthcare applications)

CHP creationCustomizable computing engines Customizable interconnects

Architecture modeling

Customization settingDesign once Invoke many times

A general methodology for designing “supercomputer-in-a-box” with 100X performance/power improvement

13

The First Innovation Transitions (InTrans) Award from Intel + NSF

Opportunity: Adopt “cloud” abstraction for cross-layer architectural innovation• Virtualization: Transparent

introduction of hardware and software innovations; focused optimization

• Scaling benefits of specialized architectures across markets : • Microsoft “Catapult; Cavium

“Thunderx”; Google “Tensor Processing Unit

24

Source: Microsoft

Opportunity: Going Vertical

3D integration: new dimension of scalability; heterogeneous manufacturing technologies-> energy efficiency, higher bandwidth, lower latency

• Resurgence of 90’s interest in “near-data computing” and “processing-in-memory” architectures

• Rethinking memory and storage hierarchies (e.g. persistent data objects)?

• Manufacturability challenges: high reliability and yield

25

3D packaging….Nonvolatile memory….Carbon Nanotubes/Graphene…

Credit: Intel

Instead of relying on silicon-based devices, researchers and Stanford University and MIT have built a new chip that uses carbon nanotubes and resistive random-access memory (RRAM) cells. The two are built vertically over one another, making a new, dense 3-D computer architecture with interleaving layers of logic and memory. This work was funded by DARPA, NSF, SRC, STARnetSONIC, and member companies of the Stanford SystemX Alliance.

Credit: MIT

26

New 3-D chip combines computing and data storage

Credit: MIT July 5, 2017

27

The results are published today in the journal Nature, by lead author Max Shulaker, an assistant professor of electrical engineering and computer science at MIT. Shulaker began the work as a PhD student alongside H.-S. Philip Wong and his advisor Subhasish Mitra, professors of electrical engineering and computer science at Stanford. The team also included professors Roger Howe and Krishna Saraswat, also from Stanford.

• “Compatible with today’s silicon infrastructure, both in terms of fabrication and design” says Howe.

• The key in this work is that carbon nanotube circuits and RRAM memory can be fabricated at much lower temperatures, below 200 C” .

• “…researchers took advantage of the ability of carbon nanotubes to also act as sensors…over 1 million carbon nanotube-based sensors, used to detect and classify ambient gases”

• “In addition to improved devices, 3-D integration can address another key consideration in systems: the interconnects within and between chips,” Saraswat adds.

Evolution of Research: Nano-engineered Computing Systems Technology (N3XT)

28Source: Mohamed M. Sabry Aly et al, “Energy-Efficient Abundant-Data Computing: The N3XT 1,000x”, IEEE Computer, vol. 48, no.12, 2015, pp. 24-33

Opportunity: Architectures closer to Physics• Emerging data choices beyond traditional memory/storage hierarchy with

fundamentally different cost, density, latency, throughput, reliability and endurance tradeoffs.

• Carbon nanotubes

• Quantum computing and superconducting logic

• Biological substrates, DNA computing/archival storage/self assembly, biomolecules such as proteins for computing

29

“DNA-based storage has the potential to be the ultimate archival storage solution: it is extremely dense and durable. While this is not practical yet due to the current state of DNA synthesis and sequencing, both technologies are improving at an exponential rate with advances in the biotechnology industry. Given the impending limits of silicon technology, we believe that hybrid silicon and biochemical systems are worth serious consideration: time is ripe for computer architects to consider incorporating biomolecules as an integral part of computer design. DNA-based storage is one clear example of this direction.” *

* A DNA-Based Archival Storage System. James Bornholt, Randolph Lopez, Douglas M. Carmean, Luis Ceze, Georg Seelig, and Karin Strauss. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2016

http://homes.cs.washington.edu/~bornholt/papers/dnastorage-asplos16.pdf

Opportunity: Machine Learning Workloads

• Large-scale model training

• Fast-turn around new models

Source: Yan Solihin/Internet 30

Observations

• Identify application drivers/space to drive architecture

• 3D integration with heterogeneous technologies is a powerful and necessary bridging capability to the future

• Cross-layer design tools, abstractions, and multidisciplinary teams are needed to promote innovation

31

References

• von Neumann, John (1945), First Draft of a Report on the EDVAC (PDF), retrieved August 24, 2011

• SRC ‘Emerging Interconnect Technologies’ Workshop Summary Report

• ‘Emerging Interconnect Technologies’ by Prof. K. Saraswat/Stanford

• Workshop on Emerging Technologies for Interconnects Final Report

• New 3_D chip combines computing and data storage

• 21st Century Computer Architecture

• Energy-Efficient Abundant-Data Computing: The N3XT 1,000x

• A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

32

https://en.wikipedia.org/wiki/John_von_Neumann

https://sites.google.com/site/michaeldgodfrey/vonneumann/vnedvac.pdf?attredirects=0&d=1

https://linxconferences.com/wp-content/uploads/2016/04/01-01-Sarawsat-Emerging-Interconnect-Technologies-for-Nanoelectronics-1.pdf

http://weti.cs.ohiou.edu/WETI_Report.pdf

http://news.mit.edu/2017/new-3-d-chip-combines-computing-and-data-storage-0705

https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/21stcenturyarchitecturewhitepaper.pdf

http://ieeexplore.ieee.org/document/7368008/

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Catapult_ISCA_2014.pdf

beyond cmos computing: the interconnect …beyond cmos computing: the interconnect challenge irene...

Documents