ee 126 computer engineering · 2019-09-04 · ee 126 mark hempstead 3 instructor • instructor:...

38
1 EE 126 Mark Hempstead EE 126 Computer Engineering Fall 2017 Tufts University Instructor: Prof. Mark Hempstead [email protected]

Upload: others

Post on 27-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

1EE 126 Mark Hempstead

EE 126

Computer Engineering

Fall 2017

Tufts University

Instructor: Prof. Mark Hempstead

[email protected]

Page 2: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

2EE 126 Mark Hempstead

Lecture Outline

• Administrative details

• Why take EE 126? What you will learn?

• What is Computer Architecture?

• Moore’s Law and Future Challenges for Computer

Architects

• Information sheet

Page 3: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

3EE 126 Mark Hempstead

Instructor

• Instructor: Mark Hempstead ([email protected] ), Halligan Hall 235A

• Office Hours: – Mondays 3:30 pm – 4:30 pm

– Tuesdays 3:00 – 4:00 pm

• My Background– Tufts undergrad in Computer Engineering

– PhD at Harvard June 2009

– Research Intern at Intel

– Recently at ARM R&D in Cambridge UK

– Assistant Professor, Drexel University 2010 - 2015

Page 4: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

4EE 126 Mark Hempstead

Instructor: My Research

• Power Aware Computing and Low Power VLSI Design

• Accelerator-centric computing– Selecting accelerators using static characterization and ASTs

– Security of the thermal side-channel in many accelerator workloads

• Characterizing communication in workloads

• Memory systems– Cache replacement policies and prefetching

– Non-volitile memory technologies

• SynchroTrace for fast simulation and design exploration

• Energy efficient structures for high performance processors

• Power-agile computing systems

• Power modeling of mobile devices (Android phones)

SRAM2 SRAM1

MicrocontrollerMessage

Processor

FilterEvent Processor

TimerTester

2 mm

2 m

m

Page 5: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

◼ Power-Agile Computing for Android Smartphones

◼ Power consumption and computational needs change rapidly

◼ Combines hardware and software systems to automatically stay under energy constraints

◼ Selecting Hardware Accelerators for Energy-Efficient Computing◼ Future of computing is threatened

by increasing power density

◼ Traditional microprocessors are not enough. New application specific hardware is required

◼ Using software compilers and high-level synthesis to discover accelerators before design begins

Prof. Mark Hempstead

Associate Professor

Electrical and Computer Engineering “Energy-Efficient Computing from Hardware to Software”

Tufts Computer Architecture Lab

Improving the energy consumption of smartphones

Accelerating common application with hardware

Energy-Performance

Tradeoff

Out-of-CoreAccelerators

Page 6: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

6EE 126 Mark Hempstead

Resources

• Text: "Computer Organization and

Design" by Patterson & Hennessy (5th

Ed 2013)

– Morgan Kaufmann

– Print Book ISBN : 9780124077263

– eBook ISBN : 9780124078864

• The material in the 4th revised Ed of

the textbook is the same as our edition

but the homework problems are

different.

Page 7: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

7EE 126 Mark Hempstead

Prerequisites

• ES 4 Digital Logic– Binary Addition– Logic Gates and Flip-Flops– Design of combinational logic– Design of state machines– Implementing and debugging digital systems at multiple ways

(schematic, truth table, state diagram, RTL)

• Assembly programming and basic machine organization; EE 14 ( Proc lab) or COMP 40

– ISAs and instructions– Assembly programming– Interrupts and interrupt routines– Basic Caches and interacting with memory (load-store)

• VHDL or Verilog and experience with large digital designs– ES 4 with EE 26 (Digital lab) recommended

• C Programming, UNIX• Compilers, OS, Circuits/VLSI background is a plus but not required

Page 8: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

8EE 126 Mark Hempstead

Course Expectations

• Homework Assignments

– Completed Individually.

– Submitted during class on paper.

• Quizzes (4 over the semester)

• Midterm + Final

– Midterm is scheduled when the calendar says it is

– Final will be comprehensive. During the exam period.

• Labs

– VHDL Implementation of a processor

– Handouts will be provided this week

Sucks up all your time

New this year – pipeline

tracker ☺

Page 9: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

9

Why a pipeline tracker?

• It’s your lightweight intro to verification

• 2016 industry survey

– 55% of engineers have the title “verification eng”

– 35% are design – but spend ½ of their time in verif!

– CAGR = 10% for verif. eng, 4% for design eng.

• Turn VHDL-lab lemons into lemonade

– Less work than before (if you use the tracker)

– Add a useful skill to your resume

– Probably do a little debug competition later

EE 126 Mark Hempstead

Page 10: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

10EE 126 Mark Hempstead

Grading

• Grade Formula– Quizzes – 10%

– Midterm – 20%

– Final – 30%

– Labs + final project – 30%

– Homework – 10%

• Late days for HW/Lab assignments – 5 late days per quarter per student

– After all late days are used, the grade will be reduced by (10% multiplied by the number of days late).

• Lab makeup policy– Resubmit labs for up ½ credit lost

– Must be submitted before turning in the next lab

Page 11: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

11EE 126 Mark Hempstead

Topics of Study & why we care

• Get through the basics of modern processor design– single-threaded 5-stage pipeline; 1980s technology

• Learn about pipelined systems– everything is pipelined

• Understand the interfaces between architecture

and system software (compilers, OS)– Essential to understand OS/compilers/PL

– For everyone else, it can help you write better code!

• Implement your own processor in VHDL– As previously discussed…

Page 12: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

12EE 126 Mark Hempstead

After this course…

• Computer architects strive to give maximum

performance with programmer abstraction

– Compilers, OS part of this abstraction

– e.g. pipelining, superscalar, speculative execution, branch

prediction, caching, virtual memory…

• Technology has brought us to an inflection point

– Multiple processors on a single chip -- Why?

• Design complexity, ILP/pipelining-limits, power dissipation, etc

– How to provide the abstraction?

– Some burden will shift back to programmers

Page 13: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

13EE 126 Mark Hempstead

Estimated Schedule

• Review of Assembly Programming and Machine Organization

– Instructions and ISAs

– The ALU and single cycle implementation

– Introduce the MIPS ISA

– 5-stage Pipelining, hazards, branches

• Memory Hierarchy and Caches

– Associative caches

– Cache coherence

• Security holes; superscalar processors; multi-cores

The class calendar

is always the up-

to-date schedule

Page 14: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

14

Slide Credits

• Many of the slides and teaching materials have

been adapted from the work of others:

• Elsevier publishing company supporting material

for Patterson & Hennessy text.

• Mary Jane, Irwin PSU. CSE 431

• David Brooks, Harvard

EE 126 Mark Hempstead

Page 15: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

15

Review: Some Basic Definitions

• Kilobyte – 210 or 1,024 bytes (KB or KiB)

• Megabyte– 220 or 1,048,576 bytes (MB or MiB)

– sometimes “rounded” to 106 or 1,000,000 bytes

• Gigabyte – 230 or 1,073,741,824 bytes

– sometimes rounded to 109 or 1,000,000,000 bytes

• Terabyte – 240 or 1,099,511,627,776 bytes

– sometimes rounded to 1012 or 1,000,000,000,000 bytes

• Petabyte – 250 or 1024 terabytes

– sometimes rounded to 1015 or 1,000,000,000,000,000 bytes

• Exabyte – 260 or 1024 petabytes

– Sometimes rounded to 1018 or 1,000,000,000,000,000,000 bytes

Page 16: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

16

Quick quiz

• 1015 shops = ?

• One million aches = ?

• 1012 bulls = ?

• Reminders:– Kilobyte – 210 or 1,024 bytes (KB or KiB)

– Megabyte– 220 106 bytes

– Gigabyte – 230 109 bytes

– Terabyte – 240 1012 bytes

– Petabyte – 250 or 1015 bytes

– Exabyte – 260 or 1018 bytes

EE 126 Mark Hempstead

1 pet shop

1 terrible

1 MegaHertz

Page 17: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

17EE 126 Mark Hempstead

Application

Trends

What is Computer Architecture?

Prog. Lang,

CompilersOperating

System

Applications

(AI, DB,

Graphics)

Instruction Set Architecture

Microarchitecture

System Architecture

VLSI/Hardware

Implementations

Technology

Trends

Hardware

Software

Where does this course fit into the world of computing?

Page 18: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

18

Below the Program

• System software– Operating system – supervising program that interfaces the user’s

program with the hardware (e.g., Linux, MacOS, Windows)• Handles basic input and output operations• Allocates storage and memory• Provides for protected sharing among multiple applications

– Compiler – translate programs written in a high-level language (e.g., C, Java) into instructions that the hardware can execute

• Which of these two software layers “should” care about computer architecture?

Systems SW

Applications softwareHardware

126

Page 19: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

Below the Program, Con’t• High-level language program (in C)

swap (int v[], int k)(int temp;

temp = v[k];v[k] = v[k+1];v[k+1] = temp;

)

• Assembly language program (for MIPS)swap: sll $2, $5, 2

add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31

• Machine (object, binary) code (for MIPS)000000 00000 00101 0001000010000000

000000 00100 00010 0001000000100000

. . .

C compiler

assembler

one-to-many

one-to-one

Page 20: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

Advantages of Higher-Level Languages ?

• What are some advantages?

• As a result, very little programming is done today at

the assembler level

l Allow the programmer to think in a more natural language and for their intended use (Fortran for scientific computation, Cobol for business programming, Lisp for symbol manipulation, Java for web programming, …)

l Improve programmer productivity – more understandable code that is easier to debug and validate

l Improve program maintainability

l Allow programs to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine)

l Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine

Page 21: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

Instruction Set Architecture (ISA)

• ISA, or simply Architecture – the abstract interface

between the hardware and the lowest level software that

encompasses all the information necessary to write a

machine language program, including instructions,

registers, memory access, I/O, …– Enables implementations of varying cost and performance to

run identical software

– A great business idea – but how well do you think it works?

• The combination of the basic instruction set (the ISA)

and the operating system interface is called the

application binary interface (ABI)

– ABI – The user portion of the instruction set plus the operating

system interfaces used by application programmers. Defines a

standard for binary portability across computers.

Page 22: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

Under the Covers• Five classic components of a computer – input,

output, memory, datapath, and control

❑ datapath + control = processor (CPU)

Page 23: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

23

History of the proc world

• and why the future might be interesting…

EE 126 Mark Hempstead

Page 24: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

Courtesy, Intel ®

Moore’s Law

Moore’s Law is

the tail wagging a

very big dog!

❑ In 1965, Intel’s Gordon Moore predicted that the number of transistors that can be integrated on single chip would double about every two years

Page 25: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

25

Technology Scaling Road Map

(ITRS)Year 2004 2006 2008 2010 2012 2014 2017

Feature size

(nm)

90 65 45 32 22 14 10

Intg.

Capacity

(BT)

2 4 8 16 33 83 162

• Fun facts about 45nm transistors

– 30 million can fit on the head of a pin

– You could fit more than 2,000 across the width of a human hair

– If car prices had fallen at the same rate as the price of a single transistor has since 1968, a new car today would cost about 1 cent

Page 26: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

26

Another Example of Moore’s Law Impact

DRAM capacity growth over 3 decades

Page 27: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

27

What would you do with endless

transistors?

• Your ideas?

Page 28: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

29

But What Happened to Clock Rates and Why?

❑ Clock rates hit a “power wall”

1

10

100

1000

10000

Clo

ck R

ate

(M

Hz)

0

20

40

60

80

100

120

Po

wer

(W

atts

)

Page 29: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

30

30

[Taylor, DAC and DaSi 2012]

Power Density creating the “Dark Silicon Problem”

Page 30: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

31EE 126 Mark Hempstead

How have we used these transistors?

• More functionality on one chip

– Early 1980s – 32-bit microprocessors

– Late 1980s – On Chip Level 1 Caches

– Early/Mid 1990s – 64-bit microprocessors, superscalar (ILP)

– Late 1990s – On Chip Level 2 Caches

– Early 2000s – Chip Multiprocessors, On Chip Level 3 Caches

– Early 2010s – Many-Core, SoC integration, specialized hardware

• What is next?

– How much more cache can we put on a chip? (Itanium2)

– How many more cores can we put on a chip? (Niagara, etc)

– What else can we put on chips? (Accelerators)

Page 31: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

32

Example: Intel Kaby Lake Quad Core

(Core i7/i5 7400-7700)

• Introduced August 2016

• Quad core out-of-order (14-19 stages of pipeline)

– Supports 8 threads

• 64-bit datapath

• 14nm technology

• Three levels of caches (L1, L2, L3) on chip

• Integrated memory controller

• Integrated graphics

• 3.6 GHz clock turbo boost up to 4/2 GHz

EE 126 Mark Hempstead

https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake

Page 32: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

33

Example Processor: Apple A10 Fusion

• Introduced 2016

– iPhone 7

• 3.3 Billion Transistors

• 16 nm technology

• Integrated GPU

• 4 cores

– 2 high power 2.34 GHz

ARMv8-A cores

– 2 Energy-efficient cores

Page 33: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

34

Example Processor: Apple A11 Bionic

A10 Fusion A11 Bionic

Phone IPhone 7 IPhone 8, 10

Technology 16nm 10nm

Number of cores 4 (two slow, two fast) 6 (four slow, two fast)

Number of transistors 3.3B 4.3B

Freq 2.34 GHz 2.4 GHz

Has a TV ad No Yes

• https://www.youtube.com/watch?v=QN1jHqIFEbQ

• Bionic: dedicated neural-net hardware accelerator, powers

FaceID & other tasks

Page 34: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

35EE 126 Mark Hempstead

• Old Conventional Wisdom: Power is free, Transistors expensive

• New Conventional Wisdom: “Power wall” Power expensive, transistors are free (Can put more on chip

than can afford to turn on)

• Old CW: Sufficiently increasing Instruction Level Parallelism via compilers, innovation (Out-of-order,

speculation, VLIW, …)

• New CW: “ILP wall” law of diminishing returns on more HW for ILP

• Old CW: Multiplies are slow, Memory access is fast

• New CW: “Memory wall” Memory slow, multiplies fast

(200 clock cycles to DRAM memory, 4 clocks for multiply)

• Old CW: Uniprocessor performance 2X / 1.5 yrs

• New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall

– Uniprocessor performance now 2X / 5(?) yrs

Sea change in chip design: multiple “cores”

(2X processors per chip / ~ 2 years)

• More simpler processors are more power efficient

Crossroads: Conventional Wisdom in

Comp. Arch

Page 35: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

36

“For the P6, success criteria included performance

above a certain level and failure criteria included

power dissipation above some threshold.”

Bob Colwell, Pentium Chronicles

Page 36: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

37EE 126 Mark Hempstead

Summary

• Welcome to EE 126

• Architecture is the “glue” between system

software/applications and VLSI implementations

• Need to create abstractions to deal with

complexity

Page 37: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

Questions?

EE 126 Mark Hempstead

Page 38: EE 126 Computer Engineering · 2019-09-04 · EE 126 Mark Hempstead 3 Instructor • Instructor: Mark Hempstead (mark@ece.tufts.edu ), Halligan Hall 235A • Office Hours: – Mondays

39EE 126 Mark Hempstead

Information Sheet

• Please fill this out

• Designed to provide an understanding of your

background and experience

• Be honest … this is not graded