reconfigurable caches and their application to media processing parthasarathy (partha) ranganathan...
TRANSCRIPT
Reconfigurable Caches and their Application to Media
Processing
Parthasarathy (Partha) Ranganathan
Dept. of Electrical and Computer Engineering Rice University Houston, Texas
Sarita AdveDept. of Computer Science
University of Illinois at Urbana ChampaignUrbana, Illinois
Norman P. JouppiWestern Research LaboratoryCompaq Computer CorporationPalo Alto, California
Reconfigurable Caches -2- Partha Ranganathan
Motivation (1 of 2)
Different workloads on general-purpose processors
Scientific/engineering, databases, media processing, …
Widely different characteristics
Challenge for future general-purpose systems
Use most transistors effectively for all workloads
Reconfigurable Caches -3- Partha Ranganathan
Motivation (2 of 2)
Challenge for future general-purpose systems
Use most transistors effectively for all workloads
50% to 80% of processor transistors devoted to cache
Very effective for engineering and database workloads
BUT large caches often ineffective for media workloads
Streaming data and large working sets [ISCA 1999]
Can we reuse cache transistors for other useful work?
Reconfigurable Caches -4- Partha Ranganathan
Reconfigurable Caches
Flexibility to reuse cache SRAM for other activities
Several applications possible
Simple organization and design changes
Small impact on cache access time
Contributions
Reconfigurable Caches -5- Partha Ranganathan
Reconfigurable Caches
Flexibility to reuse cache SRAM for other activities
Several applications possible
Simple organization and design changes
Small impact on cache access time
Application for media processing
e.g., instruction reuse – reuse memory for computation
1.04X to 1.20X performance improvement
Contributions
Reconfigurable Caches -6- Partha Ranganathan
Outline for Talk
Motivation
Reconfigurable caches
Key idea
Organization
Implementation and timing analysis
Application for media processing
Summary and future work
Reconfigurable Caches -7- Partha Ranganathan
Reconfigurable Caches: Key Idea
Dynamically divide SRAM into multiple partitions
Use partitions for other useful activities
On-chip SRAMCache
Partition A - cache
Partition B - lookup
Current use of on-chip SRAM
Proposed use of on-chip SRAM
Cache SRAM useful for both conventional and media
workloads
Key idea: reuse cache transistors!
Reconfigurable Caches -8- Partha Ranganathan
Reconfigurable Cache Uses
Number of different uses for reconfigurable caches
Optimizations using lookup tables to store patterns
Instruction reuse, value prediction, address prediction, …
Hardware and software prefetching
Caching of prefetched lines
Software-controlled memory
QoS guarantees, scratch memory area
Cache SRAM useful for both conventional and media
workloads
Reconfigurable Caches -9- Partha Ranganathan
Key Challenges
How to partition SRAM?
How to address the different partitions as they change?
Minimize impact on cache access (clock cycle) time
On-chip SRAMCache
Partition A - cache
Partition B - lookup
Current use of on-chip SRAM
Proposed use of on-chip SRAM
Associativity-based partitioning
Reconfigurable Caches -10- Partha Ranganathan
Conventional Cache Organization
Tag Index
State Tag Data
Block
Compare SelectData out
Hit/miss
Address
Way 1
Way 2
Reconfigurable Caches -11- Partha Ranganathan
Associativity-Based PartitioningAddress
Tag Index
State Tag Data
Block
Compare SelectData out
Hit/miss
Way 1
Way 2
Partition 1
Partition 2Tag Index Block
Choose
Partition at granularity of “ways”
Multiple data paths and additional state/logic
Reconfigurable Caches -12- Partha Ranganathan
Reconfigurable Cache Organization
Associativity-based partitioning
Simple - small changes to conventional caches
But # and granularity of partitions depends on associativity
Alternate approach: Overlapped-wide-tag partitioning
More general, but slightly more complex
Details in paper
Reconfigurable Caches -13- Partha Ranganathan
Other Organizational Choices (1 of 2)
Ensuring consistency of data at repartitioning
Cache scrubbing: flush data at repartitioning intervals
Lazy transitioning: Augment state with partition information
Addressing of partitions - software (ISA) vs. hardware
On-chip SRAMCache
Partition A
Partition B
Current use of on-chip SRAM
Proposed use of on-chip SRAM
Reconfigurable Caches -14- Partha Ranganathan
Other Organizational Choices (2 of 2)
Method of partitioning - hardware vs. software control
Frequency of partitioning - frequent vs. infrequent
Level of partitioning - L1, L2, or lower levels
Tradeoffs based on application requirements
On-chip SRAMCache
Partition A
Partition B
Current use of on-chip SRAM
Proposed use of on-chip SRAM
Reconfigurable Caches -15- Partha Ranganathan
Outline for Talk
Motivation
Reconfigurable caches
Key idea
Organization
Implementation and timing analysis
Application for media processing
Summary and future work
Reconfigurable Caches -16- Partha Ranganathan
Conventional Cache Implementation
Tag and data arrays split into multiple sub-arrays
to reduce/balance length of word lines and bit lines
VALID OUTPUT
TAG ARRAY
DATA ARRAY
ADDRESS
DATA
WORD LINES
BIT LINES
COLUMN MUXES
SENSE AMPS
COMPARATORS
OUTPUT DRIVER
MUX DRIVERS
OUTPUT DRIVERS
DECODERS
Reconfigurable Caches -17- Partha Ranganathan
Associate sub-arrays with partitions
Constraint on minimum number of sub-arrays
Additional multiplexors, drivers, and wiring
Changes for Reconfigurable CacheADDRESS
VALID OUTPUT
TAG ARRAY
DATA ARRAY
DATA
WORD LINES
BIT LINES
COLUMN MUXES
SENSE AMPS
COMPARATORS
OUTPUT DRIVER
MUX DRIVERS
OUTPUT DRIVERS
DECODERS
[1:NP]
[1:NP]
[1:NP]
[1:NP]
[1:NP]
Reconfigurable Caches -18- Partha Ranganathan
Impact on Cache Access Time
Sub-array-based partitioning
Multiple simultaneous accesses to SRAM array
No additional data ports
Timing analysis methodology
CACTI analytical timing model for cache time (Compaq WRL)
Extended to model reconfigurable caches
Experiments varying cache sizes, partitions, technology, …
Reconfigurable Caches -19- Partha Ranganathan
Impact on Cache Access Time
Cache access time
Comparable to base (within 1-4%) for few partitions (2)
Higher for more partitions, especially with small caches
But still within 6% for large caches
Impact on clock frequency likely to be even lower
Reconfigurable Caches -20- Partha Ranganathan
Outline for Talk
Motivation
Reconfigurable caches
Application for media processing
Instruction reuse with media processing
Simulation results
Summary and future work
Reconfigurable Caches -21- Partha Ranganathan
Instruction reuse/memoization [Sodani and Sohi, ISCA 1997]
Exploits value redundancy in programs
Store instruction operands and result in reuse buffer
If later instruction and operands match in reuse buffer,
skip execution;
read answer from reuse buffer
Application for Media Processing
cache
partition
cache
partition
cache
partition
Few changes for implementation with reconfigurable caches
Reconfigurable Caches -22- Partha Ranganathan
Simulation Methodology
Detailed simulation using RSIM (Rice)
User-level execution-driven simulator
Media processing benchmarks
JPEG image encoding/decoding
MPEG video encoding/decoding
GSM speech decoding and MPEG audio decoding
Speech recognition and synthesis
Reconfigurable Caches -23- Partha Ranganathan
System Parameters
Modern general-purpose processor with ILP+media extensions
1 GHz, 8-way issue, OOO, VIS, prefetching
Multi-level memory hierarchy
128KB 4-way associative 2-cycle L1 data cache
1M 4-way associative 20-cycle L2 cache
Simple reconfigurable cache organization
2 partitions at L1 data cache
64 KB data cache, 64KB instruction reuse buffer
Partitioning at start of application in software
Reconfigurable Caches -24- Partha Ranganathan
Impact of Instruction Reuse
Performance improvements for all applications (1.04X to 1.20X)
Use memory to reduce compute bottleneck
Greater potential with aggressive design [details in paper]
0
40
80
120
State-of-art
with IR State-of-art
with IR State-of-art
with IR
No
rma
lize
d e
xe
cu
tio
n t
ime Memory
CPU
JPEG decode MPEG decode Speech synthesis
100
84
10089
10092
Reconfigurable Caches -25- Partha Ranganathan
Goal: Use cache transistors effectively for all workloads
Reconfigurable Caches: Flexibility to reuse cache SRAM
Simple organization and design changes
Small impact on cache access time
Several applications possible
Instruction reuse - reuse memory for computation
1.04X to 1.20X performance improvement
More aggressive reconfiguration currently under investigation
Summary
Reconfigurable Caches -26- Partha Ranganathan
More information available at
http://www.ece.rice.edu/~parthas