loop instruction caching for energy-efficient …loop instruction caching for energy-efficient...

19
Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate School of Informatics Kyoto University

Upload: others

Post on 30-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

Loop Instruction Caching for Energy-Efficient Embedded Processors

Ji GuDepartment of Communications & Computer Engineering

Graduate School of InformaticsKyoto University

Page 2: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

2

OutlineOutline

1. Background2. Research overview3. DLIC: a single-task based approach4. PLIC: a multi-task based approach5. Conclusions

Page 3: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

3

BackgroundBackground

Processors in data centers consume 1.5% of the global energyWhere does the processor energy go?• Caches are energy-consuming due to instruction/data supply

Processor power Instruction supply power

Page 4: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

4

Research Problem (1/2)Research Problem (1/2)

Observed behavior of embedded applications[1]• 77% of execution time spent in loops• 47% of execution time spent in loops of size 64 or less• 46% of execution time spent in loops that iterate 5 times or more

Loop behavior can be exploited for low-energy design

[1] J. Villarreal et al. A Study on the Loop Behavior of Embedded Programs. University of California,Riverside. Technical Report UCR-CSE-01-03, 2001.

Page 5: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

5

Research Problem (2/2)

Caching decoded instructions for most of loops, including large, complicated and nested loops

• to avoid repeated instruction fetching and decoding operations as much as possible

A

H

I

E

C

FDL1 L3 L4 L5

B

L2

G

Page 6: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

6

Design Overview

IF IDDLIC

EXE

IF/IDstall

load EXEstall

MEM WB

EXEsrc

DLIC: Decoded Instruction Loop CacheHardware/Software Co-design• Using customized hardware design• Using software to control the operation of DLIC

Page 7: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

7

Software DesignSoftware Design

brbH

E

C

F

slp

elp

brf

Four special instructions: slp, brb, brf, elp• Inserted into program code at design time – statically• Controlling DLIC operations at run time - dynamically

H

E

C

F

Loop 1

Loop 2

Page 8: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

8

IF IDDLIC

EXE

IF/IDstall

load EXEstall

MEM WB

EXEsrc

Page 9: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

9

Hardware Design: Hierarchical Cache Table

Decoded Instruction Word Format

control word branch memory target address

flag c_index

opcode control word

dlic_index branch cache target address

DLIC Index Table

Control Word Dictionary Table

Branch Cache Target Table

Instruction Format

opcode

operand

operand

Page 10: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

10

Results

Nor

mal

ized

ene

rgy

cons

umpt

ion

adpc

mbc

ntblo

wfish

crc32 de

sjpe

gqs

ortraw

caud

ioraw

daud

io rc4rijn

dael

salsa sh

astr

ingse

arch

AVG

77% Instr. fetch and decode Red.66% Energy Saving1.4% Performance Overhead

0

0.2

0.4

0.6

0.8

1

1.2

adpc

mbc

ntblo

wfish

crc32 de

sjpe

gqs

ortraw

caud

ioraw

daud

io rc4rijn

dael

salsa sh

astr

ingse

arch

AVG

DIB DLIC

Ji Gu, Hui Guo and Tohru Ishihara. DLIC: Decoded Loop Instructions Caching for Energy-Aware Embedded Processors. To appear in ACM TECS, accepted March 2012.

Page 11: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

11

Loop Caching for MultitaskingLoop Caching for Multitasking

Processors increasingly used in multitasking systems• Several tasks running on a single processor• Tasks executed in time-interleaved fashion• Inter-task interference in cache memories• High energy consumption

Loop caching: reduce the inter-task interference in the I-cache by reducing the I-cache accesses

Page 12: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

12

Hardware Design for Context SwitchHardware Design for Context Switch

task ID partition ID

PLIC

L-PC

P0

Pn-1

Pi

From OS/Task Scheduler

Task ID

Task State Table

instruction

Tagless I-cache

Partitioned Loop Instruction Cache (PLIC): • Tasks allocated to different partitions: no interference • Task State Table for context switch

Conventional context switch by OS

Updating task state table during context switch

Page 13: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

13

Case StudyCase Study

A case study of multitasking application of 5 tasks:• adpcm, jpeg, rawdaudio, sha, stringsearch

Processor specified at RTL level for simulation (ISS)

1KB PLIC, 8KB I-cache• CACTI, DesignCompiler used for energy/area evaluation

Round Robbin task scheduling, with switching intervals of 5K, 10K, and 20K cycles

Page 14: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

14

ResultsResults

Reduction: 50% I-cache access, 6~18% I-cache miss, 36% I-cache energy

Page 15: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

15

ConclusionsConclusions

Loops are common in applications of most embedded systemsDLIC: reduce instruction fetch/decode power

Software-controlled SPM-like structure for decoded instructions 66% (up to 87%) energy saved with performance overhead of 1.4%

PLIC: reduce I-cache access/miss for multitasking systemA low-cost Task State Table for context switch at hardware level Reduction: 50% I-cache access, 6~18% I-cache miss, 36% I-cache energy

Page 16: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

16

Thank you!Thank you!

Page 17: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

17

DLIC Overall Architecture

Page 18: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

18

PLIC Overall ArchitecturePLIC Overall Architecture

Page 19: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate

19

ASIPmeisterSimplescalar

GCC

VHDL(Syn.)

VHDL(Sim.) Object code

SynopsysDesign

CompilerModelSim

ISA (PISA)

HW eval. area, energy,

delay

Application

HW/SWco-design

DLIC

SW eval.performance

execution trace

CACTI

I-cache,Memory

profiling

Experimental Setup