accelerating asynchronous programs through event sneak peek gaurav chadha, scott mahlke, satish...

46
Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical Engineering and Computer Science

Upload: erin-ellis

Post on 15-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Accelerating Asynchronous Programs through Event Sneak Peek

Gaurav Chadha, Scott Mahlke, Satish Narayanasamy17 June 2015

University of MichiganElectrical Engineering and Computer Science

Page 2: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Internet-of-ThingsMobile Web

Servers (node.js) Sensor networks

Asynchronous programs are ubiquitous

Page 3: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Asynchronous programming hides I/O latency

Synchronous Sequential model

Task 1

Task 2

Task 3

Asynchronousmodel

Waiting for I/O

speedup

Page 4: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Asynchronous programming is well-suitedto handle wide array of asynchronous inputs

• Computation is driven by events• The Hollywood Principle (“Don’t call us, we’ll call you”)

Page 5: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Illustration:Asynchronous Programming Model

Event Queue

Pop an event for execution

Web

Waits on events

LooperThread

onClick

getLocation

onImageLoad

Page 6: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Conventional architecture is not optimized for asynchronous programs

Processor View

Asynchronousmodel

Short events execute varied tasks

Large instruction footprint

Destroys cache locality

Little hot code causespoor branch prediction

Event Queue

Page 7: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Large performance improvement potentialin asynchronous programs

PARSEC

SPECint 2006

Web Apps

L1-I mpki

24

2.3

0.7

L1-D miss rate

4.4

5.5

1.3

Branch Mis-prediction rate

9.8

8.4

6.3

Maximum Performance Improvement (%)

Web Apps

52 69 79

Page 8: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Execute asynchronous program on a specialized Event Sneak Peek (ESP) core

Heterogeneous

Multi-core Processor

CPU

Page 9: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Execute asynchronous program on a specialized Event Sneak Peek (ESP) core

Heterogeneous

Multi-core Processor

ESPCPU

Browser EngineAsynchronous

JavaScript Events

Parse CSS

Layout Render

Parse CSS

Layout Render

Zhu & Reddi, ISCA ‘14

WebCore

Parse

Layout Render

CSS

Page 10: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

How to customize a core for asynchronous programs?

Page 11: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

HTML5 asynchronous programming modelguarantees sequential execution of events

Looper Thread

Event Queue

Page 12: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Opportunity:Event-Level Parallelism (ELP)

Event Queue

Advance knowledge offuture events

Events arefunctionally independent

How to exploit this ELP?

Page 13: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

#1: Parallel Execution

Event Queue

Not provably independent

Page 14: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

#2: Optimistic Concurrency

Event Queue

Speculative parallelization (e.g., transactions)

>99% of event pairs conflict Primarily, low-level memory dependencies

– Maintenance code– Memory pool recycling– …

Page 15: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

98% of events “match” with a 99% accuracy– Control flow paths– Addresses

Speculative pre-execution

Good match

Event Queue

Observation

Page 16: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

How to customize a core for asynchronous programs?

Exploit ELP using speculative pre-execution

Page 17: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

ESP Design:Expose event-queue to hardware

Event Queue

Software

Hardware

ISA

H/WEvent Queue

Page 18: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

H/W Event QueueLLC miss

LLC miss

ESP Design:Speculatively pre-execute future events on stalls

Isolate

millions of instructions

Warm-Up

Memoize

Trigger

speedup

Page 19: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Realizing ESP design

Isolation Memoization Triggering

Correctness– Isolate speculative updates

Performance– Avoid destructive interference between execution contexts

Page 20: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

RRAT

Isolation of multiple execution contexts

PC

Fetch Unit

L1-I cache

ESP

PC

Register State Memory State Branch Predictor

Core Pipeline

Page 21: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

D-CacheletI-Cachelet

Isolation of multiple execution contexts

Cachelets isolate speculative updates Performance:

– Avoid L1 pollution – Capture 95% of reuse

L1-I Cache

L1-D Cache

Register State Memory State Branch Predictor

ESP

Page 22: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Isolation of multiple execution contexts

PIR tracks path history

Isolating PIR is adequate

Register State Memory State Branch Predictor

Predictor Tables

PIR

PIR

Branch Predictor

ESP

Page 23: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Realizing ESP design

Isolation Memoization Triggering

Warm-up during speculative pre-execution is ineffective Future events might execute millions of instructions later

Page 24: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

D-ListI-List

Record instruction and data addresses, along with instruction count

Memoization of architectural bottlenecks

Addresses Branches

I-CacheletL1-I Cache

D-Cachelet L1-D Cache

ESP

Page 25: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Record branch outcomes Branch address, directions and targets, instruction count

Predictor Tables

PIR

PIR

Branch Predictor

Memoization of architectural bottlenecks

B-List

Addresses Branches

ESP

Page 26: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Realizing ESP design

Isolation Memoization Triggering

Use memoized lists Launch timely prefetches Warm-up branch predictor ahead of branches

Page 27: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Triggering timely prefetches using memoized information

ESP

~100 instr.Start Prefetches

>

Current Instr. Count

AddressInstr. Count

Prefetch

Prefetch

Page 28: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Baseline Architecture

L2 cache

NL-I

NL-D,S

Core Pipeline

PIR

Branch PredictorPr

edic

tor

RRAT

PC

Fetch Unit

L1-I Cache

L1-D Cache

Page 29: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Event Queue

ESP Architecture

L2 cache

NL-I

NL-D,S

Core Pipeline

PIR

Branch PredictorPr

edic

tor

RRAT

PC

Fetch Unit

L1-I Cache

L1-D Cache

ESP Mode

ESP

Page 30: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Event Queue

ESP Architecture

L2 cache

NL-I

NL-D,S

Core Pipeline

PIR

Branch PredictorPr

edic

tor

RRAT

PC

Fetch Unit

ESP Mode

PC

ESP

L1-I Cache I-Cachelet L1-D

CacheD-Cachelet

PIR

Page 31: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Event Queue

ESP Architecture

L2 cache

Core Pipeline

Branch PredictorPr

edic

tor

RRAT

PC

Fetch Unit

ESP Mode

PC

ESP

L1-I Cache I-Cachelet L1-D

CacheD-Cachelet

NL-I

NL-D,S D-ListI-List

PIR

PIR

B-List

Page 32: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

ESP Architecture

Event Queue

ESP Mode

L2 cache

NL-I

NL-D,S

Core Pipeline

ESP-1ESP-2

PIR

PIR

PIR

Branch Predictor

B-List

Pred

icto

r

RRAT

PC

Fetch Unit

PC PC

I-List

I-CacheletL1-I Cache

L1-D CacheD-Cachelet

D-List

Page 33: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Methodology

Timing: Trace-driven simulator, Sniper Sim– Instrumented Chromium– Collected and simulated traces of JavaScript events

Energy: McPAT and CACTI

Page 34: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Architectural Model

Core: 4-wide issue, OoO, 1.66 GHz

L1-(I,D) Cache: 32 KB, 2-way

L2 Cache: 2 MB, 16-way

Energy Modeling: Vdd = 1.2 V, 32 nm

Page 35: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Limitations of Runahead

Event Queue

Speculative pre-execution

Data cache miss

Reduces data cache misses – Not a significant problem in web applications

Cannot mitigate I-cache missesDoes not exploit ELP – No notion of events– Future events are a rich source of independent instructions

[Dundas, et. al. ’97, Mutlu, et. al. ‘03]

Page 36: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Events are short

Action # Events # Instructions Event Size (instr) Web App

Buy headphones 7,787 433 million 55k amazon

53k bing

91k cnn

232k facebook

372k gdocs

472k gmaps

56k pixlr

Short events execute varied tasks

Large instruction footprint

Destroys cache locality

Little hot code causespoor branch prediction

Page 37: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

ESP outperforms other designs

21.8

12.5

14.0

ESP

Runahead

Baseline

Performance improvement w.r.t. no prefetching (%)

Baseline : Next-line (NL) + Stride

Page 38: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

ESP outperforms other designs

32.1

21.3

14.0

ESP + NL

Runahead + NL

Baseline

Performance improvement w.r.t. no prefetching (%)

Baseline : Next-line (NL) + Stride

Page 39: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

ESP

Max

I-Cache Branch Predictor D-Cache

Largest performance improvementcomes from improved I-cache performance

Performance Improvement (%)

52 69 79

21 28 32

Page 40: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

ESP consumes less static energy, butexpends more dynamic energy

ESP executes 21% more instructions, but consumes only 8% more energy

NL

ESP

0 0.2 0.4 0.6 0.8 1 1.2

Static Energy Dynamic Energy

Energy consumed w.r.t. no prefetching

Page 41: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Hardware area overhead

Cachelets

Lists

Registers

12.6KB

1.2KB

ESP-1 ESP-2

Page 42: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Summary

Accelerators for asynchronous programs

ESP exploits Event-Level Parallelism (ELP)– Expose event queue to hardware– Speculatively pre-execute future events

Performance: 16%

Page 43: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Accelerating Asynchronous Programs through Event Sneak Peek

Gaurav Chadha, Scott Mahlke, Satish Narayanasamy17 June 2015

University of MichiganElectrical Engineering and Computer Science

Page 44: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Jumping ahead two events is sufficient

Normal

ESP1 ESP2 ESP3 ESP4 ESP5 ESP6 ESP7 ESP80

1

10

100

1000

10000

Max 95% 85%

# ca

che

lines

Page 45: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Impact of JS execution on response time

Chow, et. al., ’14

JavaScript

DOMCSS

Network

Server

Page 46: Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

Client delay

Chow, et. al., ’14