architectural musings - ibm

25
1 Architectural Musings Rethinking Computer Systems Architecture & Evaluation Christopher Vick [email protected] March 23, 2014

Upload: others

Post on 29-Nov-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architectural Musings - IBM

1

Architectural Musings Rethinking Computer Systems Architecture & Evaluation

Christopher Vick [email protected]

March 23, 2014

Page 2: Architectural Musings - IBM

2

§ Vision Talk

§  How should we analyze, reason about and evaluate Computer System Architecture in the 21st century?

§  What can history tell us about these questions? §  What does this mean for the research community?

§ Mobile computing and current technologies fundamentally

change key parameters and constraints for computer system architecture

§ Vast new opportunities for research of great interest to and great relevance for industry

Introduction

Page 3: Architectural Musings - IBM

3

Outline § Computer System Architecture § Then (Circa 1970)

§  Scarce Resources & Bottlenecks §  Optimizations §  Evaluation

§ Now (Mobile Computing Platforms) §  Scarce Resources & Bottlenecks §  Optimizations? §  Evaluation?

§ Questions?

Page 4: Architectural Musings - IBM

4

COMPUTER SYSTEM ARCHITECTURE

Page 5: Architectural Musings - IBM

5

Computer System Architecture § Hardware

§  The 5 classic components (Patterson & Hennessy) §  Input, Output, Memory, Datapath, Control

§ Software §  System Virtual Machine (Hypervisor, VM, or VMM) §  Operating System §  Compilers & Tools

§ Definitions §  The way components fit together §  The arrangement of the various devices in a complete computer system or

network §  The instruction set plus a model of the execution of the instruction set

(Amdahl et al)

§ Computer System Architecture §  The selection and combination of hardware and software components to

assemble an effective computer system

Page 6: Architectural Musings - IBM

6

Application Programs

Virtual Machine

Libraries

Multicore Execution Unit

Operating System

Interconnect

Drivers Memory Manager Scheduler

IO Devices Memory

Hypercall Interface

Software

Hardware

Combination

Page 7: Architectural Musings - IBM

7

Effective § An optimization problem

§  Many variables §  Selection of hardware/software components §  Selection of interfaces/interconnects

§  Many constraints §  Physical, sociological, technical & cost constraints

§ Scarce Resources and Bottlenecks §  Maximize utilization of scarce resources §  Minimize impact of bottlenecks

§ Evaluation §  How do you measure effectiveness? §  What effect does the evaluation have on the optimization?

Page 8: Architectural Musings - IBM

8

THEN (CIRCA 1970)

Photo 1

Photo 2

Page 9: Architectural Musings - IBM

9

Scarce Resources § CPU Cycles

§  CPUs expensive §  Slow clock rates

§ Memory Locations §  Random Access Memory expensive §  Address/Data paths into CPU expensive

§ Skilled Programmers §  Relatively new discipline §  Poor language and tools support

Photo 3

Page 10: Architectural Musings - IBM

10

Bottlenecks § Programmer Productivity

§  Software development slow and expensive §  Low level programming paradigms

§ Memory Latency §  RAM latency gated overall speed (~2-3 MHz) §  Small RAM backed by vastly slower storage

§  I/O Bandwidth §  Limited CPU connectivity §  Crude communication mechanisms

Photo 4

Page 11: Architectural Musings - IBM

11

Optimizations § Time Sharing

§  Effective sharing of limited resource

§ Virtual Memory §  Effective sharing, and backing with cheaper alternative

§ Hardware Improvements §  Smaller features provide more resource and faster clock §  Large Scale Integration §  Better signaling to improve bandwidth

§ High Level Programming Languages §  Broadens productive programmer community §  Abstracts away some hardware complexity

Page 12: Architectural Musings - IBM

12

Evaluation § Started with primitive measures

§  MIPS §  SLOC

§ Worked towards more sophisticated evaluation tools §  Hennessey & Patterson very influential §  SPEC CPU §  TPM §  Defect rate

§ Cost is always a factor

Page 13: Architectural Musings - IBM

13

Examples § Digital PDP 11

§  16-bit address space §  Orthogonal instruction set §  Memory mapped I/O §  Unix, DOS, many others

§  IBM System 370

§  24-bit address space §  Virtual Memory §  VMS, VM/370, DOS/VS §  Backward compatibility with System 360

Photo 5

Photo 6

Page 14: Architectural Musings - IBM

14

NOW (MOBILE COMPUTING)

Page 15: Architectural Musings - IBM

15

Scarce Resources § Energy

§  Fixed Energy Budget for mobile devices §  Thermal issues at all scales §  Tradeoff between performance and energy §  Shrinks no longer significantly improving consumption

§ Memory Bandwidth §  Providing bandwidth is expensive §  Memory interconnect consumes significant energy

Page 16: Architectural Musings - IBM

16

Bottlenecks § Memory Latency

§  Increasing gap between CPU speed and DRAM latency §  Physical distance to DRAM devices a factor

§ Concurrency §  Shortage of programmers who can handle this §  Inadequate language/tools support

§  I/O Bandwidth/Latency §  Wireless bandwidth lower than wired §  Consumes large amounts of energy

Photo 7

Page 17: Architectural Musings - IBM

17

Example § Samsung Galaxy S5

§  Processor: 2.5 GHz Qualcomm® Snapdragon™ 801 (Quad Core)

§  GPU: Qualcomm® Adreno 330 §  OS: Android™ 4.4.2 §  Memory RAM: 2 GB DDR2 §  Memory Storage: 16/32/64 GB onboard storage §  Display: 5” AMOLED 1920 x 1080 HD §  Network: LTE Cat 4, CDMA, UMTS/HSPA,

GSM/GPRS/EDGE §  Battery: 2600 mAh §  Camera (Main): 16.5 megapixel, Ultra HD §  Dimensions: 142 x 73 x 8.1mm

§ This is a General Purpose Computer!

Page 18: Architectural Musings - IBM

18

Optimizations? § Multi-core

§  Aggressive addition of cores and threads §  Hardware concurrency outstripping software §  New Concurrent Programming Models/Tools?

§ Memory Subsystem §  Significant contributor to total energy consumption §  Adding bandwidth is expensive §  New technologies addressing some energy issues

§ Wireless bandwidth enhancements (LTE Advanced,etc.) § Solutions from desktop/server or embedded worlds

may not directly apply in mobile space!

Page 19: Architectural Musings - IBM

19

Memory System Energy § Retaining data (one second)

§  DRAM: ~1-10 pJ/bit self-refresh §  SRAM: 1200+ pJ/bit, and rising over time [ITRS 2009]

§  4 pJ/bit (45nm LP, standby) [Barasinski et al., ESSCIRC ‘08] §  Flash, PCM, STT RAM…: Zero !

§ Moving Data §  32-bit value:

§  Recompute: 60 pJ (Razor) §  Send 1mm: 10 pJ §  Retain in cache for 1 ms: 38 pJ §  Retain in DRAM for 1 second: 32+ pJ

Photo 8

Page 20: Architectural Musings - IBM

20

§ Move less! §  Caches physically close to CPU §  Locality, locality, locality (the first rule of chip real estate)

§ Retain less! §  Power off unused caches lines [Kaxiras et al., ISCA ‘01] §  “Drowsy” caches [Flautner et al., ISCA ‘02] §  … with compiler analysis

[Zhang et al., Trans. Emb. Comp. Sys. 4(3) 2005] §  Don’t refresh unused DRAM §  … e.g. with garbage collection [Chen et al., CODES+ISSS ‘03]

Reducing Memory System Energy

Page 21: Architectural Musings - IBM

21

§ Maintaining the illusion of a single flat memory address space is too expensive §  On-chip caches can be major consumers of area and energy §  Coherence protocols are expensive and difficult to scale

•  Alternative: software-managed memory hierarchies –  Tightly-coupled memory (TCM), scratchpads –  Do not require tag memory, address comparison logic –  More area- and energy-efficient –  Help bridge gap between bandwidth and throughput

Extending the Memory Model

Page 22: Architectural Musings - IBM

22

§ Different programming paradigm: software explicitly orchestrates all transfers between on-chip and off-chip memory areas

§ Major implications on memory management §  Scratchpad allocation strategies §  Data partitioning strategies §  Dynamic relocation between scratchpad and DRAM to track the

program’s locality characteristics

§ Opportunities for compile-time and runtime optimization § Challenges in both Hardware and Software!

New Challenges and Opportunities

Page 23: Architectural Musings - IBM

23

Evaluation § Energy/Power

§  Both matter §  MIPS/Watt §  Battery life §  Hard to measure and lacking in precision

§ Performance §  Currently rather primitive

§  Linpack, CaffeineMark, CoreMark, Quadrant §  SPEC CPU §  Following similar track to early PC evaluation, so should get more sophisticated

§  Need to more accurately measure/reflect the utility of the device §  Balancing peak performance, throughput, battery life, etc.

§ Cost

Page 24: Architectural Musings - IBM

Thank You

Page 25: Architectural Musings - IBM

25

Photo Copyright Notices