reconfigurable caches and their application to media processing parthasarathy (partha) ranganathan...

26
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Sarita Adve Dept. of Computer Science Norman P. Jouppi Western Research Laboratory Compaq Computer Corporation Palo Alto, California

Upload: claud-terry

Post on 12-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches and their Application to Media

Processing

Parthasarathy (Partha) Ranganathan

Dept. of Electrical and Computer Engineering Rice University Houston, Texas

Sarita AdveDept. of Computer Science

University of Illinois at Urbana ChampaignUrbana, Illinois

Norman P. JouppiWestern Research LaboratoryCompaq Computer CorporationPalo Alto, California

Page 2: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -2- Partha Ranganathan

Motivation (1 of 2)

Different workloads on general-purpose processors

Scientific/engineering, databases, media processing, …

Widely different characteristics

Challenge for future general-purpose systems

Use most transistors effectively for all workloads

Page 3: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -3- Partha Ranganathan

Motivation (2 of 2)

Challenge for future general-purpose systems

Use most transistors effectively for all workloads

50% to 80% of processor transistors devoted to cache

Very effective for engineering and database workloads

BUT large caches often ineffective for media workloads

Streaming data and large working sets [ISCA 1999]

Can we reuse cache transistors for other useful work?

Page 4: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -4- Partha Ranganathan

Reconfigurable Caches

Flexibility to reuse cache SRAM for other activities

Several applications possible

Simple organization and design changes

Small impact on cache access time

Contributions

Page 5: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -5- Partha Ranganathan

Reconfigurable Caches

Flexibility to reuse cache SRAM for other activities

Several applications possible

Simple organization and design changes

Small impact on cache access time

Application for media processing

e.g., instruction reuse – reuse memory for computation

1.04X to 1.20X performance improvement

Contributions

Page 6: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -6- Partha Ranganathan

Outline for Talk

Motivation

Reconfigurable caches

Key idea

Organization

Implementation and timing analysis

Application for media processing

Summary and future work

Page 7: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -7- Partha Ranganathan

Reconfigurable Caches: Key Idea

Dynamically divide SRAM into multiple partitions

Use partitions for other useful activities

On-chip SRAMCache

Partition A - cache

Partition B - lookup

Current use of on-chip SRAM

Proposed use of on-chip SRAM

Cache SRAM useful for both conventional and media

workloads

Key idea: reuse cache transistors!

Page 8: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -8- Partha Ranganathan

Reconfigurable Cache Uses

Number of different uses for reconfigurable caches

Optimizations using lookup tables to store patterns

Instruction reuse, value prediction, address prediction, …

Hardware and software prefetching

Caching of prefetched lines

Software-controlled memory

QoS guarantees, scratch memory area

Cache SRAM useful for both conventional and media

workloads

Page 9: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -9- Partha Ranganathan

Key Challenges

How to partition SRAM?

How to address the different partitions as they change?

Minimize impact on cache access (clock cycle) time

On-chip SRAMCache

Partition A - cache

Partition B - lookup

Current use of on-chip SRAM

Proposed use of on-chip SRAM

Associativity-based partitioning

Page 10: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -10- Partha Ranganathan

Conventional Cache Organization

Tag Index

State Tag Data

Block

Compare SelectData out

Hit/miss

Address

Way 1

Way 2

Page 11: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -11- Partha Ranganathan

Associativity-Based PartitioningAddress

Tag Index

State Tag Data

Block

Compare SelectData out

Hit/miss

Way 1

Way 2

Partition 1

Partition 2Tag Index Block

Choose

Partition at granularity of “ways”

Multiple data paths and additional state/logic

Page 12: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -12- Partha Ranganathan

Reconfigurable Cache Organization

Associativity-based partitioning

Simple - small changes to conventional caches

But # and granularity of partitions depends on associativity

Alternate approach: Overlapped-wide-tag partitioning

More general, but slightly more complex

Details in paper

Page 13: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -13- Partha Ranganathan

Other Organizational Choices (1 of 2)

Ensuring consistency of data at repartitioning

Cache scrubbing: flush data at repartitioning intervals

Lazy transitioning: Augment state with partition information

Addressing of partitions - software (ISA) vs. hardware

On-chip SRAMCache

Partition A

Partition B

Current use of on-chip SRAM

Proposed use of on-chip SRAM

Page 14: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -14- Partha Ranganathan

Other Organizational Choices (2 of 2)

Method of partitioning - hardware vs. software control

Frequency of partitioning - frequent vs. infrequent

Level of partitioning - L1, L2, or lower levels

Tradeoffs based on application requirements

On-chip SRAMCache

Partition A

Partition B

Current use of on-chip SRAM

Proposed use of on-chip SRAM

Page 15: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -15- Partha Ranganathan

Outline for Talk

Motivation

Reconfigurable caches

Key idea

Organization

Implementation and timing analysis

Application for media processing

Summary and future work

Page 16: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -16- Partha Ranganathan

Conventional Cache Implementation

Tag and data arrays split into multiple sub-arrays

to reduce/balance length of word lines and bit lines

VALID OUTPUT

TAG ARRAY

DATA ARRAY

ADDRESS

DATA

WORD LINES

BIT LINES

COLUMN MUXES

SENSE AMPS

COMPARATORS

OUTPUT DRIVER

MUX DRIVERS

OUTPUT DRIVERS

DECODERS

Page 17: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -17- Partha Ranganathan

Associate sub-arrays with partitions

Constraint on minimum number of sub-arrays

Additional multiplexors, drivers, and wiring

Changes for Reconfigurable CacheADDRESS

VALID OUTPUT

TAG ARRAY

DATA ARRAY

DATA

WORD LINES

BIT LINES

COLUMN MUXES

SENSE AMPS

COMPARATORS

OUTPUT DRIVER

MUX DRIVERS

OUTPUT DRIVERS

DECODERS

[1:NP]

[1:NP]

[1:NP]

[1:NP]

[1:NP]

Page 18: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -18- Partha Ranganathan

Impact on Cache Access Time

Sub-array-based partitioning

Multiple simultaneous accesses to SRAM array

No additional data ports

Timing analysis methodology

CACTI analytical timing model for cache time (Compaq WRL)

Extended to model reconfigurable caches

Experiments varying cache sizes, partitions, technology, …

Page 19: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -19- Partha Ranganathan

Impact on Cache Access Time

Cache access time

Comparable to base (within 1-4%) for few partitions (2)

Higher for more partitions, especially with small caches

But still within 6% for large caches

Impact on clock frequency likely to be even lower

Page 20: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -20- Partha Ranganathan

Outline for Talk

Motivation

Reconfigurable caches

Application for media processing

Instruction reuse with media processing

Simulation results

Summary and future work

Page 21: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -21- Partha Ranganathan

Instruction reuse/memoization [Sodani and Sohi, ISCA 1997]

Exploits value redundancy in programs

Store instruction operands and result in reuse buffer

If later instruction and operands match in reuse buffer,

skip execution;

read answer from reuse buffer

Application for Media Processing

cache

partition

cache

partition

cache

partition

Few changes for implementation with reconfigurable caches

Page 22: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -22- Partha Ranganathan

Simulation Methodology

Detailed simulation using RSIM (Rice)

User-level execution-driven simulator

Media processing benchmarks

JPEG image encoding/decoding

MPEG video encoding/decoding

GSM speech decoding and MPEG audio decoding

Speech recognition and synthesis

Page 23: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -23- Partha Ranganathan

System Parameters

Modern general-purpose processor with ILP+media extensions

1 GHz, 8-way issue, OOO, VIS, prefetching

Multi-level memory hierarchy

128KB 4-way associative 2-cycle L1 data cache

1M 4-way associative 20-cycle L2 cache

Simple reconfigurable cache organization

2 partitions at L1 data cache

64 KB data cache, 64KB instruction reuse buffer

Partitioning at start of application in software

Page 24: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -24- Partha Ranganathan

Impact of Instruction Reuse

Performance improvements for all applications (1.04X to 1.20X)

Use memory to reduce compute bottleneck

Greater potential with aggressive design [details in paper]

0

40

80

120

State-of-art

with IR State-of-art

with IR State-of-art

with IR

No

rma

lize

d e

xe

cu

tio

n t

ime Memory

CPU

JPEG decode MPEG decode Speech synthesis

100

84

10089

10092

Page 25: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -25- Partha Ranganathan

Goal: Use cache transistors effectively for all workloads

Reconfigurable Caches: Flexibility to reuse cache SRAM

Simple organization and design changes

Small impact on cache access time

Several applications possible

Instruction reuse - reuse memory for computation

1.04X to 1.20X performance improvement

More aggressive reconfiguration currently under investigation

Summary

Page 26: Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University

Reconfigurable Caches -26- Partha Ranganathan

More information available at

http://www.ece.rice.edu/~parthas

[email protected]