computing beyond moore’s law: architecture and device innovations

53
0 Copyright 2016 FUJITSU Fujitsu Forum 2016 #FujitsuForum

Upload: fujitsu-global

Post on 06-Jan-2017

289 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Computing Beyond Moore’s Law: Architecture and Device Innovations

0 Copyright 2016 FUJITSU

Fujitsu Forum 2016

#FujitsuForum

Page 2: Computing Beyond Moore’s Law: Architecture and Device Innovations

1 Copyright 2016 FUJITSU

Computing Beyond Moore’s Law: Architecture and Device Innovations

TAKESHI HORIE Head of Computer Systems Laboratory FUJITSU LABORATORIES LTD.

Page 3: Computing Beyond Moore’s Law: Architecture and Device Innovations

2 Copyright 2016 FUJITSU

Why computing now?

Data explosion Data is generated by many IoT devices and the amount of data is exploding.

Computing creates knowledge and intelligence from data. But traditional computing cannot handle it.

End of Moore’s law

For 50 years we have enjoyed device technology scaling. But that is ending.

Fundamentally rethink new computing architecture

Page 4: Computing Beyond Moore’s Law: Architecture and Device Innovations

3 Copyright 2016 FUJITSU

Demand for Computing and Fujitsu Computer Systems

Page 5: Computing Beyond Moore’s Law: Architecture and Device Innovations

4 Copyright 2016 FUJITSU

Computer performance Since ENIAC was developed 70 year ago, computer performance is increasing twice every

1.5 years.

1,E+00

1,E+03

1,E+06

1,E+09

1,E+12

1930 1950 1970 1990 2010

ENIAC

Com

put

atio

ns

per

sec

ond

per

com

pute

r

ENIAC, 1946 U.S. federal government

2x / 1.5 years

Page 6: Computing Beyond Moore’s Law: Architecture and Device Innovations

5 Copyright 2016 FUJITSU

Computing demand for scientific applications Although computing has enabled applications in variety of fields, still much higher

computing power is required to solve complex problems of the real world.

Heart simulation Joint research with the

University of Tokyo

Tsunami simulation Joint research with Tohoku University

- International Research Institute for Disaster

Life science and drug manufacturing

Global change prediction for reducing disaster

Industrial innovation

New material and energy creation

Origin of matter and the universe

Page 7: Computing Beyond Moore’s Law: Architecture and Device Innovations

6 Copyright 2016 FUJITSU

Computing demand for financial applications

Tokyo Stock Exchange, Inc. (TSE) is one of the world's top trading market and lists around 3,800 brands. Daily trading value exceeds three trillion yen.

Trading volume is constantly increasing year by year

For high frequency trading, response time is reduced from 2ms to 500us in 5 years

0

100

200

300

400

500

600

700

800

900

Mill

ion

2015

Trading Volume in TSE 1st section

1949

Response Time of TSE

2ms

900μs

2010 2012 2015

500μs

Page 8: Computing Beyond Moore’s Law: Architecture and Device Innovations

7 Copyright 2016 FUJITSU

Fujitsu computer systems

1950 1960 1970 1980 1990 2000 2010

FACOM100 (1954)

FACOM230-10 (1965)

M-190 (1976)

M-780 (1985)

M-1800 (1990)

VPP-500 (1992)

FM V (1993)

OAYSYS100 (1980)

PRIMEHPC FX10 (2011)

VP-100 (1982)

FM TOWNS (1989)

PRIMEQUEST (2005)

GS21 (2002)

DS90 (1991)

Arrows (2011)

SPARC M10 (2013)

Supercomputer

Mainframe

Enterprise Servers

Ubiquitous Terminal

Page 9: Computing Beyond Moore’s Law: Architecture and Device Innovations

8 Copyright 2016 FUJITSU

SPARC64 XIfx

2000 - 2003 - 1999

SPARC64

V

SPARC64

GP

GS8900

GS21 600

GS8800B

SPARC64 VII

GS21

SPARC64

V +

SPARC64

VI

GS8800

GS21 900

Mainframe

Hig

h Perform

ance

Hig

h Reliab

ility

Store Ahead Branch History Prefetch

Single-chip CPU

Non-Blocking $ O-O-O Execution Super-Scalar

L2$ on Die

HPC-ACE System on Chip Hardware Barrier

Multi-core Multi-thread

2004 - 2007 2008 - 2011

SPARC64

GP

2012 - 2015

SPARC64 IXfx

Virtual Machine Architecture Software On Chip High-speed Interconnect

SPARC64 X

SPARC64 X+

Supercomputer

UNIX

$ECC Register/ALU Parity Instruction Retry $ Dynamic degradation RC/RT/History

SPARC64 VIIIfx

GS21 M2600

2016 -

K computer

SPARC64

SPARC64 II

GS8600

Fujitsu microprocessors

Page 10: Computing Beyond Moore’s Law: Architecture and Device Innovations

9 Copyright 2016 FUJITSU

Fujitsu provides many HPC solutions to satisfy various customer demands

Support for both supercomputers with original CPU and x86 cluster systems

Post-K will be developed with collaboration with RIKEN and ARM

Supercomputer PRIMEHPC

PRIMEHPC FX10 PRIMEHPC FX100

K computer

(Co-developed with RIKEN) Large-Scale SMP System

RX900

x86 Cluster

CX400/CX600(KNL) BX900/BX400

Post-K (Co developed with RIKEN and ARM)

Fujitsu high performance computing

Page 11: Computing Beyond Moore’s Law: Architecture and Device Innovations

10 Copyright 2016 FUJITSU

IoT and Data Explosion

Page 12: Computing Beyond Moore’s Law: Architecture and Device Innovations

11 Copyright 2016 FUJITSU

IoT connects everything By 2020, 50 billion devices will be connected and generate data constantly

1990 2010 2020 2000 Year

Bill

ion

s of

dev

ices

10

20

30

40

50

(src: CISCO)

Only 1 million PCs were

connected to the Internet

Number of devices exceeded

the world wide populations

More than 50 billion devices

in 2020

World wide populations

Page 13: Computing Beyond Moore’s Law: Architecture and Device Innovations

12 Copyright 2016 FUJITSU

Data explosion As amount of data is exploding, it exceeds capability of traditional ICT Need new processing to create valuable information from unstructured data

1990 2010 2020 2000 Year

Am

oun

t of

dat

a

1 ZB=1021

1 YB=1024

Amount of data will reach: 40 Zetta Byte by 2020 1 Yotta Byte by 2030

40 ZB 1 ZB 1 YB

Unstructured data IOT, sensors

Structured data Business data, RDB

Page 14: Computing Beyond Moore’s Law: Architecture and Device Innovations

13 Copyright 2016 FUJITSU

New information processing for data explosion

Information

Knowledge

Intelligence/ Knowledge

Volume

Quality for

Value

Numeric

Data Computing

Page 15: Computing Beyond Moore’s Law: Architecture and Device Innovations

14 Copyright 2016 FUJITSU

Technology Trend for Computing

Page 16: Computing Beyond Moore’s Law: Architecture and Device Innovations

15 Copyright 2016 FUJITSU

Microprocessor trend Tr. counts are growing exponentially following Moore’s law

Single thread performance •Increased by 60%/year (-2005) •Slowed down to +20%/year (2005-)

Power & operating frequency •Power restriction limits operating frequency (2005-)

Performance growth is limited by power consumption

Source: Stanford, K. Rupp

Tr, counts(K) Performance Frequency(MHz) Power(W) Core counts

Page 17: Computing Beyond Moore’s Law: Architecture and Device Innovations

16 Copyright 2016 FUJITSU

Memory trend

0,001

0,01

0,1

1

10

100

1000

2000 2002 2004 2006 2008 2010 2012 2014 2016

Year

(Source: ISSCC, VLSI Circuits & Tech., ASSCC, IEDM)

NAND +32%/Yr

DRAM +18%/Yr

MRAM +52%/Yr

PCM +95%/Yr

103

102

10

1

10-1

10-2

10-3

Mem

ory

IC C

apac

ity

[ G

b/d

ie ]

ReRAM +140%/Yr

ms us ns

SRAM DRAM

HDD SSD CPU Cache

Flash Magnetic

1000x Performance Gap

Access Time

Next Gen. Memory

Memory

Next generation memories are required to fill DRAM-NAND gap

DRAM density saturated. NAND Flash density growing with limited endurance

Big performance gap between DRAM and NAND Flash

Page 18: Computing Beyond Moore’s Law: Architecture and Device Innovations

17 Copyright 2016 FUJITSU

Moore’s law

Device technology scaling has brought higher performance as well as higher power efficiency for these 50 years.

The trade off line is determined by device technology at each generation. As technology scales, the trade-off line moves upward.

Technology node will reach 7nm in 2020. (physical limitation of current Tr. technology)

s: Scaling factor

Power efficiency*(Performance)2 = K∝s5

1

10

102

103

104

102 103 104 105

Performance (a.u.)

Pow

er e

ffic

ien

cy (

a.u

.) 1990 2000

2010 2020

Technology scaling will never be a driver for computing

Mobile

Server

Moore’s limit line advancement

Page 19: Computing Beyond Moore’s Law: Architecture and Device Innovations

18 Copyright 2016 FUJITSU

Computing innovations beyond Moore’s law

To overcome the limit of Moore’s law in terms of both performance and power efficiency, realize beyond-Moore’s law computing by two approaches

1

10

102

103

104

102 103 104 105

Performance (a.u.)

Pow

er E

ffic

ien

cy (

a.u

.)

Moore’s limit line

Beyond Moore’s Law

Moore’s Law

Computing architecture innovation

Device innovation

Page 20: Computing Beyond Moore’s Law: Architecture and Device Innovations

19 Copyright 2016 FUJITSU

Computing Architecture Innovation

Page 21: Computing Beyond Moore’s Law: Architecture and Device Innovations

20 Copyright 2016 FUJITSU

Data explosion and challenges

Overcome challenges to realize new information processing

40ZB(40*1021B)

Unstructured data

Structured data

2020 2030 2010 Year

Am

ou

nt

of

da

ta

Intelligence

電力,伝送, 集積,処理 の限界

2000

1YB (1024B)

Essence of Intelligence

Data

Information

knowledge

Challenges • Process Technology • Network Bandwidth • Power Consumption • Computing Power

Data explosion

Page 22: Computing Beyond Moore’s Law: Architecture and Device Innovations

21 Copyright 2016 FUJITSU

Computing architecture innovation

Create new computing paradigm for data explosion

40ZB(40*1021B)

Unstructured data

Structured data

2020 2030 2010 Year

Am

oun

t of

dat

a

Intelligence

電力,伝送, 集積,処理 の限界

2000

1YB (1024B)

Essence of Intelligence

Data

Information

knowledge

Challenges • Process Technology • Network Bandwidth • Power Consumption • Computing Power

Data explosion New Computing

Architecture Moore’s

Law Computing

Hyperconnected Cloud

Cloud Computing

System

Page 23: Computing Beyond Moore’s Law: Architecture and Device Innovations

22 Copyright 2016 FUJITSU

Hyperconnected Cloud R&D vision and strategy: “Hyperconnected Cloud”

Web scale ICT provides computing and data processing power through service-oriented connection

AI and security are embedded at every layer to create knowledge in safe and secure society

Page 24: Computing Beyond Moore’s Law: Architecture and Device Innovations

23 Copyright 2016 FUJITSU

New computing architecture

Conventional computing

Neural computing (Inference)

Neural computing (Learning)

Accelerators

Brain inspired computing

Supercomputers

Quantum computers

Spec

iali

zati

on

Processing Numeric

Media

Knowledge

Intelligence

Evolving from numeric computing to intelligence computing

End of Moore’s Law

Page 25: Computing Beyond Moore’s Law: Architecture and Device Innovations

24 Copyright 2016 FUJITSU

Approach for new computing architecture

Conventional computing

Neural computing (Inference)

Neural computing (Learning)

Brain inspired computing

Supercomputers

Quantum computers

Processing Numeric

Media

Knowledge

Conventional Computing

Evolving from numeric computing to intelligence computing

Conventional Computing

Spec

iali

zati

on

Intelligence

Page 26: Computing Beyond Moore’s Law: Architecture and Device Innovations

25 Copyright 2016 FUJITSU

Conventional computing

Neural computing (Inference)

Neural computing (Learning)

Accelerators

Brain inspired computing

Supercomputers

Quantum computers

Processing Numeric

Media

Knowledge

Conventional Computing

Evolving from numeric computing to intelligence computing

Conventional Computing

Domain Specific

Computing

Spec

iali

zati

on

Intelligence

Approach for new computing architecture

Page 27: Computing Beyond Moore’s Law: Architecture and Device Innovations

26 Copyright 2016 FUJITSU

Conventional computing

Neural computing (Inference)

Neural computing (Learning)

Accelerators

Brain inspired computing

Scientific computing

Quantum computers

Processing Numeric

Media

Knowledge

Conventional Computing

Domain Specific

Computing

Evolving from numeric computing to intelligence computing

Conventional Computing

Domain Specific

Computing

New Computing Paradigm

Spec

iali

zati

on

Intelligence

Approach for new computing architecture

Page 28: Computing Beyond Moore’s Law: Architecture and Device Innovations

27 Copyright 2016 FUJITSU

Conventional computing

Neural computing (Inference)

Neural computing (Learning)

Accelerators

Brain inspired computing

Scientific computing

Quantum computers

Processing Numeric

Media

Knowledge

Conventional Computing

Domain Specific

Computing

New Computing Paradigm

Evolving from numeric computing to intelligence computing

Conventional Computing

Domain Specific

Computing

New Computing Paradigm

Future Computing

Technologies

Spec

iali

zati

on

Intelligence

Approach for new computing architecture

Page 29: Computing Beyond Moore’s Law: Architecture and Device Innovations

28 Copyright 2016 FUJITSU

Achieve extremely high performance, simple operation and low cost by specializing hardware and software in specific application domains Optimize architecture to the characteristics of the specific domain

Optimize hardware and software to the major functions of the domain

What is domain specific computing?

Media Search Big Data Analysis

Control, Compression

Encryption, Attack Detection

Domain Specific Computing

Hardware configuration optimized for the domain

Combinatorial Optimization

Domain Specific

Page 30: Computing Beyond Moore’s Law: Architecture and Device Innovations

29 Copyright 2016 FUJITSU

Three areas for domain specific computing

Media Search Big Data Analysis

Control, Compression

Encryption, Attack Detection

Domain Specific Computing

Hardware configuration optimized for the domain

Combinatorial Optimization

Domain Specific

Media processing

Rivalling quantum computing

Neural computing

Page 31: Computing Beyond Moore’s Law: Architecture and Device Innovations

30 Copyright 2016 FUJITSU

Computing Architecture Innovation

Rivalling Quantum Computing for Combinatorial Optimization Demonstration 1

Page 32: Computing Beyond Moore’s Law: Architecture and Device Innovations

31 Copyright 2016 FUJITSU

What is combinatorial optimization?

Power delivery Disaster recovery Investment portfolio

City City

City

City

Combinatorial optimization Find the shortest distance of tour course ?

Number of combinations: (N-1)!/2 e. g., 32 cities 1033 order combinations

Combinatorial explosion

Page 33: Computing Beyond Moore’s Law: Architecture and Device Innovations

32 Copyright 2016 FUJITSU

Fast Slow

Applicable to practical problems

Limitation of problems

Conventional processor

Quantum Computer *

Our goal

* Quantum Annealing type

Strategy to solve combinatorial optimization

Create high-speed and widely applicable architecture

• Locating power grid failure

• Pick-up and delivery of 2000 depots

• Locating failures in 20-breaker power grid

• Map coloring

Page 34: Computing Beyond Moore’s Law: Architecture and Device Innovations

33 Copyright 2016 FUJITSU

Architecture to meet usability and scalability for combinatorial optimization Solve practical problems by using CMOS digital design Realize scalability for larger problems and speed enhancement

Features Minimize the volume of date to move in parallel and hierarchical structure Accelerate search for paths by parallel score calculation and transition facilitation

Proposed new computing architecture

Multiple engines for larger problems

Further speed up achieved by parallelism

Speed up by parallel score calculation and transition facilitation

Press release on Oct. 20th 2016

Page 35: Computing Beyond Moore’s Law: Architecture and Device Innovations

34 Copyright 2016 FUJITSU

Evaluation of our prototype

12,000 speedup confirmed by using 32-city traveling salesman problem

Engine performance evaluated using FPGA implementation

0.1

1

10

100

1,000

10,000

2 x

Tim

e to

sol

uti

on (

sec)

Conventional processor

F P G A Parallel Score

Calculation

1000 x

6 x

Transition Facilitation

T h i s W o r k s

12,000 x

*3.5-GHz Intel Xeon E5

Page 36: Computing Beyond Moore’s Law: Architecture and Device Innovations

35 Copyright 2016 FUJITSU

Current status and future plan High-speed, widely applicable architecture for optimization

Operates 12,000 times faster than conventional processor

1,000,000 times speedup envisioned using higher-layer parallelism

Engine

Integrating many engines

Upper layer parallelism

Achieved 12,000 times speedup using internal-engine parallelism

Further speed up and larger network size by using upper layers

Page 37: Computing Beyond Moore’s Law: Architecture and Device Innovations

36 Copyright 2016 FUJITSU

Ecosystem of combinatorial optimizer

Ecosystem of Combinatorial Optimizer

Architecture

Software Development Environment

Application

Research Institutes

Fujitsu Universities

Application to Practical Problem

Combinatorial Optimizer Engine

High-Speed Engine Scalable Solution

Next Step

Combinatorial Optimizer

Make SDK with the Engine available for joint research project

Delivery, Distribution

Manufacturing, CAD

Decision Making AI

Page 38: Computing Beyond Moore’s Law: Architecture and Device Innovations

37 Copyright 2016 FUJITSU

Computing Architecture Innovation

Neural Computing Demonstration 5

Page 39: Computing Beyond Moore’s Law: Architecture and Device Innovations

38 Copyright 2016 FUJITSU

Neural computing comes back again Deep Learning algorithm and enhanced computing capability have enabled much higher

object recognition rate than ever since 2012.

Features Results Input image

Feature extraction

Classification

Manual design

Features Results Input image

Feature extraction

Classification

Automatic extraction(Deep Learning)

Automatic

0.00

0.05

0.10

0.15

0.20

0.25

0.30

2011 2012 2013 2014 2015

Neural computing

Conventional machine learning algorithm

Large difference

Improving every year

1y ny2y

ijw

Output

Input

Learning Inference

Neural network (Feedforwad)

Gen

eral

ob

ject

rec

ogn

itio

n r

ate

Page 40: Computing Beyond Moore’s Law: Architecture and Device Innovations

39 Copyright 2016 FUJITSU

Computing for deeper neural network To achieve higher accuracy, neural network has been deeper and larger Processing speed: computing for learning with deeper neural network is time consuming

Processing capacity: limited memory size on GPU is critical for larger neural network

0

2

4

6

8

10

12

14

16

18

1998 ~

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

Mem

ory

Size

[G

B]

Year

GPU Memory Size

NN Size(Batch=8)

ResNet

AlexNet

VGGNet LeNet

~16GB Neural network size trend

Page 41: Computing Beyond Moore’s Law: Architecture and Device Innovations

40 Copyright 2016 FUJITSU

Fastest learning w/ HPC technology

Developed high-speed technology to process deep learning

Using "AlexNet," 64 GPUs in parallel achieve 27 times the speed of a single GPU for world's fastest processing

Press release on Aug. 9th 2016

1.8x faster

Conventional

Same accuracy 64 GPUs 1 GPU

27x faster learning speed (60x faster execution speed)

Our approach

(64 GPUs)

(64 GPUs)

Page 42: Computing Beyond Moore’s Law: Architecture and Device Innovations

41 Copyright 2016 FUJITSU

Doubles deep learning neural network scale Developed technology to streamline

internal memory of GPUs to support growing neural network scale that works to heighten machine learning accuracy

Enabled neural network machine learning of a scale up to twice what was capable with previous technology

Response after press release

“How A New Technology Promises To Make Learning More Powerful Than It Already Is” By Kelvin Murae, Forbes

4% more accuracy

Conventional Our approach

Same memory

2x more images

Press release on Sep. 21st 2016

Page 43: Computing Beyond Moore’s Law: Architecture and Device Innovations

42 Copyright 2016 FUJITSU

Computing Architecture Innovation

Media Processing Demonstration 2

Page 44: Computing Beyond Moore’s Law: Architecture and Device Innovations

43 Copyright 2016 FUJITSU

Needs for image retrieval

Routinely create and store numerous documents that contain images like presentation materials.

Stored massive image materials are not reused sufficiently.

To search for documents, 10% of work-time is wasted at offices

Needs more intuitive search method “Search by image” increases productivity

Page 45: Computing Beyond Moore’s Law: Architecture and Device Innovations

44 Copyright 2016 FUJITSU

Partial image retrieval

Find images based on matches with a part of the query image

Query image Search results

・Partial match ・Enlarged/Reduce image

Search Massive image DB Results

General-purpose server takes a long processing time for massive calculations of partial matching

Requires acceleration of partial image retrieval to search a target image intuitively and efficiently

Page 46: Computing Beyond Moore’s Law: Architecture and Device Innovations

45 Copyright 2016 FUJITSU

Image search acceleration system We develops technology for instantaneous searches of a target image

from a massive volume of images

Query by Image

Results Server

Database

I found it!

Partial image retrieval engine

CPU FPGA

Matching

Feature Extraction

I/O Processing

Overall Control

Client

Visual, intuitive user interface

Press release on Feb. 2nd 2016

Page 47: Computing Beyond Moore’s Law: Architecture and Device Innovations

46 Copyright 2016 FUJITSU

Demonstration

Page 48: Computing Beyond Moore’s Law: Architecture and Device Innovations

47 Copyright 2016 FUJITSU

Performance

Performance Search performance : more than 50 times

Power consumption: less than 1/30†

Cubic volume of space: less than 1/50†

† for equivalent search performance

Conventional server

Media domain specific server

200 Image/sec

12,000 Image/sec Th

roug

hpu

t More than 50 times

“Search by image” makes document creation more productive

Page 49: Computing Beyond Moore’s Law: Architecture and Device Innovations

48 Copyright 2016 FUJITSU

Device Innovation

Page 50: Computing Beyond Moore’s Law: Architecture and Device Innovations

49 Copyright 2016 FUJITSU

Device innovations for beyond Moore’s law

Novel Packaging Technology

System in Package to be replaced by new and different types of integration and scaling

• 2.5D integration with Interposer

• 3D stacked ICs

Beyond CMOS

New technology that may take the place of silicon CMOS technology

• New channel materials : Compound Semiconductor, Graphene and CNTs (Carbon nanotube)

• New principle devices:Tunneling FET, Spin FET, Mott FET, …

Device innovation accelerates further innovation in architecture

Page 51: Computing Beyond Moore’s Law: Architecture and Device Innovations

50 Copyright 2016 FUJITSU

Summary

Page 52: Computing Beyond Moore’s Law: Architecture and Device Innovations

51 Copyright 2016 FUJITSU

Computing architecture innovations The demand for computing performance is unlimited We will continue to innovate computing architecture and penetrate new

applications with data explosion. P

enet

rati

on

Graphics

Processing

Computing paradigm

shift

Vector

Neural computing

Accelerators

Brain inspired computing

Quantum computers

Page 53: Computing Beyond Moore’s Law: Architecture and Device Innovations

52 Copyright 2016 FUJITSU