energy efficient and reliable embedded computing systems · 0 fault tolerant and low-power embedded...

50
University of Southampton ICSA, School of Informatics, University of Edinburgh, Nov.15-2007 Energy efficient and reliable embedded computing systems Energy efficient and reliable Energy efficient and reliable embedded computing systems embedded computing systems Bashir M. Al-Hashimi [email protected] Bashir M. Al-Hashimi [email protected]

Upload: vonga

Post on 11-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

University

of Southampton

ICSA, School of Informatics, University of Edinburgh, Nov.15-2007

Energy efficient and reliable embedded computing systems

Energy efficient and reliable Energy efficient and reliable

embedded computing systemsembedded computing systems

Bashir M. Al-Hashimi

[email protected]

Bashir M. Al-Hashimi

[email protected]

OutlineOutline

� Overview of research activities focusing on

0 Automated low-power system-level design flows

0 Fault tolerant and low-power embedded systems

� Outline of other current projects

� Concluding remarks and future problems

� Overview of research activities focusing on

0 Automated low-power system-level design flows

0 Fault tolerant and low-power embedded systems

� Outline of other current projects

� Concluding remarks and future problems

Power Minimization (background) Power Minimization (background)

System

Behavioural (transformations, multiple Vt)

Architecture (clock gating, power gating)

Logic (pass-transistor)

Gate (min-size transistors, topology)

Technology (strained silicon, dual Vt process)

Design LevelsDesign Levels

Dynamic power=0.5 VDynamic power=0.5 V22dd dd FFclkclk CCLL EEswsw

Leakage power (subLeakage power (sub--threshold, gate oxide) threshold, gate oxide)

Dynam

ic p

ow

er re

ductio

n

Sta

tic p

ow

er re

ductio

n

Software Power Optimisation (background)Software Power Optimisation (background)

� Individual instructions: fundamental unit of SW, similar to gates in HW

� Program transformation techniques aims to reduce

0 use of instructions (MOV, JUMP,..) that consume large amount of supply current

0 memory access and unnecessary instruction fetching instruction involves memory consume energy > instructions involve registers

� Operating system techniques

0 Identify program regions where processor can be slowed down to reduce dynamic power through V/F scaling

0 Shut down unused parts of system to reduce leakage power

� Individual instructions: fundamental unit of SW, similar to gates in HW

� Program transformation techniques aims to reduce

0 use of instructions (MOV, JUMP,..) that consume large amount of supply current

0 memory access and unnecessary instruction fetching instruction involves memory consume energy > instructions involve registers

� Operating system techniques

0 Identify program regions where processor can be slowed down to reduce dynamic power through V/F scaling

0 Shut down unused parts of system to reduce leakage power

Embedded Computing Systems, Embedded Computing Systems, SoCSoC, , MulticoreMulticore,..,..

... ... ... ...

int main(int argc, char **argv) {Build_Interface();EventTrigger();while (!end_bs(&bs)) {

... ... ... ... ... ... ...

decode_info(&bs, &fr_ps);}

}

sync = seek_sync(&bs, SYNC_WORD);

... ... ... ...

void EventTrigger() {InitTriggerControl();while (Event) ReadEvent();switch (Event) {

case 0: ... ... ...case 1: ... ... ...

}

}

... ... ... ... ...

System design requires optimisation in both hardware

(computation units, memory, communication) and

software (application and system)

CPU

ASIC

DSP

FPGA

MEM

MEM

CA

N-B

usIn

terf

ace

Inte

rfa

ce

Inte

rfa

ce

Inte

rfa

ce

DVS-Cont

DVS-Cont

C1

C3C2

C4

C5

I/O DAC ADC

Speaker Mic

Display &Keyboard

SystemSystem--Level Design Flow* Level Design Flow*

*M. Schmitz, B. Al*M. Schmitz, B. Al--HashimiHashimi, P. , P. ElesEles, , ““System level design techniques for energy embedded systemsSystem level design techniques for energy embedded systems””, , KluwerKluwer Publishers, 2004Publishers, 2004

0101010010101010001001

001001010000100100010101010001001

HW SW

Specification

Co-Synthesis

HW Synthesis SW Synthesis

Design Flow: SpecificationDesign Flow: Specification

Abstract Graph

Representation:

e.g.:-Finite State Machine

-Petri Net

-Task Graph

High-Level Languages:

e.g.: - SystemC

- VHDL / Verilog

- C / C++ / JAVA

for (i=0;i<x;i++) {

a[i] += b[i];

}

Capturing system functionality using conceptual modelsCapturing system functionality using conceptual models

Design Flow: CoDesign Flow: Co--synthesis synthesis

0101010010101010001001

001001010000100100010101010001001

HW SW

Specification

Co-Synthesis

HW Synthesis SW Synthesis

SYSTEM-LEVEL

CO-SYNTHESIS

Evaluation

Architecture Allocation

Task Mapping

Scheduling

Energy Management

User

driven

Oute

r lo

op

inner

loop

Design Flow: CoDesign Flow: Co--SynthesisSynthesis

Architecture

Allocation:

CPU

CPU

ASICBu

s

Application

Mapping:

CPU

CPU

ASICBu

s

Activity

Scheduling:

t

CPU1

ASIC

CPU2

Energy

Management:

t

CPU1

ASIC

CPU2

Identify the Identify the ““mostmost”” suitable system architecturesuitable system architecturesufficient performance to satisfy timing constraint, at the samesufficient performance to satisfy timing constraint, at the same timetime

cost and energy reduced to minimumcost and energy reduced to minimum

--Task execution order to meet timing constraint Task execution order to meet timing constraint

-- not trivial! not trivial!

--Which tasks done in HW or SWWhich tasks done in HW or SW

-- size of task (coarse, fine)size of task (coarse, fine)

-- Appropriate componentsAppropriate components

-- HW > performance < energy SW HW > performance < energy SW

Tasks ProfilingTasks Profiling

� Partitioning and Scheduling require tasks execution time and energy cost

0 In HW (HDL model of task)

• Execution time through simulation

• Energy estimation through power analysis on synthesised designs

0 In SW (coding of task)

• Execution time through instruction-set simulator

• Energy estimation= average power (pre characterised)*no. clock cycles*frequency

� Partitioning and Scheduling require tasks execution time and energy cost

0 In HW (HDL model of task)

• Execution time through simulation

• Energy estimation through power analysis on synthesised designs

0 In SW (coding of task)

• Execution time through instruction-set simulator

• Energy estimation= average power (pre characterised)*no. clock cycles*frequency

Commercial and academics timing and power estimation tools existCommercial and academics timing and power estimation tools exist

Design Flow: HW/SW SynthesisDesign Flow: HW/SW Synthesis

0101010010101010001001

001001010000100100010101010001001

HW SW

Specification

Co-Synthesis

HW Synthesis SW Synthesis Transformation of system

architecture into physical

implementation

Design Flow: HW/SW SynthesisDesign Flow: HW/SW Synthesis

Co-Synthesis

HW Synthesis:

(ASICs & FPGAs)

High-level Synthesis

RTL model

Logic Synthesis

Gate-level model

Layout Synthesis

Behavioural Spec.

SW Synthesis:

(GPPs & ASIPs)

Compiler

Assembler code

Assembler

Machine code

High-level Language

Co-Simulation

Co-Verification

Energy ManagementEnergy Management

� Dynamic Power Management

0 Idle components shutdown

� Dynamic Voltage and Frequency Scaling

- Non-uniform workload, MPEG order of magnitude >MP3

- Introduce slack times (deadline-finish time) and used to reduce processor performance to save energy

- Adapt processor performance through V/F scaling

� Dynamic Power Management

0 Idle components shutdown

� Dynamic Voltage and Frequency Scaling

- Non-uniform workload, MPEG order of magnitude >MP3

- Introduce slack times (deadline-finish time) and used to reduce processor performance to save energy

- Adapt processor performance through V/F scaling

Task1

Task2

Task3

0.7V,60mW,150MHz

1.3V,450mW,600MHz

1.6V,900mW,800MHz

Energy Management: DVSEnergy Management: DVS--ProcessorProcessor

f. Reg.

µP Core

I/O SRAM

DC/DCConverter

VCO

+Battery

BUS

df

ddV

cf

Examples:

• ARM (IEM)

• AMD (PowerNow)

• INTEL (Xscale)

• TRANSMETA

Intel Xscale

1.1V/150MHz

1.6V/600MHz

2.0V/800MHz

Design Flow: CoDesign Flow: Co--synthesis synthesis

0101010010101010001001

001001010000100100010101010001001

HW SW

Specification

Co-Synthesis

HW Synthesis SW Synthesis

SYSTEM-LEVEL

CO-SYNTHESIS

Evaluation

Architecture Allocation

Task Mapping

Scheduling (offline/online)

Energy Management

User

driven

Oute

r lo

op

inner

loop

--Mapping and Scheduling NPMapping and Scheduling NP--hard problemshard problems

-- Heuristic methods employedHeuristic methods employed

DVS Example (Scheduling)DVS Example (Scheduling)

MEM MEM

MEM

τ0τ4 τ5

τ3 τ6

τ2

τ1

Interface

Bus

InterfaceDVS-PE0 PE1

Interface DVS-PE2

Target architecture and mapping

τ τ4

τ0

τp=

2.0

ms

τ6θ6

3 τ54,5θ

τ21

=1.4ms

=1.6ms

Task graph

PE2200.3τ6

PE0900.2τ5

PE0700.4τ4

PE2400.1τ3

PE1150.4τ2

PE1200.3τ1

PE1100.3τ0

Mapped toPower (mW)Exe. Time (ms)Task

Task properties (communication cost=0)

DVS Example (cont..) DVS Example (cont..)

PE0

PE1

PE2

4τ τ

5

0ττ1 τ

2

(mW)P

t(ms)

E=71 Jµ

1 1.61.4

0.3 0.6 1

10.7 1.4 1.6

Slack=0.4

Off-line schedule executed

at nominal voltage

(3.3V)/frequencyτ τ4

τ0

τ

p=

2.0

ms

τ6θ6

3 τ54,5θ

τ21

=1.4ms

=1.6ms

PE0

PE1

PE2

4ττ 5

0ττ

1 τ 2

τ 3 τ 6

PE=65.6 Jµ

1 1.4 1.6

10.3 0.6

(mW)

t(ms)

1.14 1.4 1.6

Scaled execution

(task 3 and 6)

Apply DVS

V3=2.1V, V6=2.34V, 8%

DVSDVS--Schedule OptimisationSchedule Optimisation

� To increase the possible energy savings we can further optimise the execution order

� To increase the possible energy savings we can further optimise the execution order

PE0

PE1

PE2

4τ τ5

τ1τ2

t

(mW)

P E=71 Jµ

0.7 1.1 1.3

10.70.3

(ms)1.1 1.4 1.6

Slack=0.3PE0

PE1

PE2

0ττ1τ2

3τ6τ

t

(mW)P

E=53.9 Jµ

1.1 1.4 1.6

10.70.3

0.7 1.25

(ms)

4τ 5τ

Apply DVS

*Schmitz, M. T., Al-Hashimi, B. M. Eles, P. Iterative schedule optimisation for voltage scalable distributed embedded systems.

ACM Transactions on Embedded Computing Systems 3(1):pp. 1-36, Feb.04.

Design Flow: CoDesign Flow: Co--synthesis synthesis

0101010010101010001001

001001010000100100010101010001001

HW SW

Specification

Co-Synthesis

HW Synthesis SW Synthesis

SYSTEM-LEVEL

CO-SYNTHESIS

Evaluation

Architecture Allocation

Task Mapping

Scheduling

Energy Management

User

driven

Oute

r lo

op

inner

loop

MultiMulti--Mode Embedded SystemsMode Embedded Systems� Emerging embedded systems work across a set of different interacting applications

� Smart Phone consisting of three applications:

0 GSM Phone, MP3 Player, Digital Camera (JPEG compression/decompression)

� Specification model captures both mode interaction and functionality

GSM

codec

+ RLCSearch

NetworkRadio

Link

Control

Take

PhotoPhoto

decode

Search+NetworkPhoto

+RLC

+ RLC

MP3 playMP3 play+Network

Search

decode

Photo

Take

Pho

to

Network found

Network lost

Network found

Network found

Network found

Network lost

Show

Photo Photo

Show

Incoming CallUser request /

TerminateCall

audioPlay

audioPlay TerminateTerminate

audio audio

TerminateTerminate

Photo

Photo Phototaken taken

Take Photo

INIT

Operational

mode

Transitions

DigiCam

GSM

MP3

Specific

ation m

odel

0.740.09

0.01

0.010.1

0.01

0.02

0.02

*Schmitz, M. T., Al-Hashimi, B. M. , Eles, P. Co-Synthesis of Energy-Efficient Multi-Mode Embedded Systems with Consideration of Mode Execution Probabilities.

IEEE TCAD, 24(2),pp.153-170, Feb.05

Mode Execution ProbabilitiesMode Execution Probabilities

Typical mode activation profile of a mobile phone:Typical mode activation profile of a mobile phone:

24hSearch RLC Calling

2% 88% 10%Execution Prob.:

Depending on the application, the time spend in a

certain operational mode is user-typical!

MultiMulti--Mode ExampleMode Example

Ψ =0.11 Ψ =0.92

η1=A

η2=B

η3=C

η4=D

O1 O2

τ3

τ2

τ1

η6=Fη5=E

τ4

6τ5 τ

Application specified

by two interacting modes

PE2 (HW)PE1 (SW)

2.800.0322.21424F

2.100.0151.81530E

2.450.0473.11326D

2.750.0231.61632C

3.000.0122.21428B

2.400.01021020A

area (mm2)

dyn. energy (mJ)

exe. time (ms)

dyn. energy (mJ)

exe. time (ms)

Task type

MultiMulti--Mode Example (Cont..) Mode Example (Cont..)

Ψ =0.11 Ψ =0.92

η1=A

η2=B

η3=C

η4=D

O1 O2

τ3

τ2

τ1

η6=Fη5=E

τ4

6τ5 τ

Application specified

by two interacting modes

τ1 τ2 τ 3 τ4 τ5 τ6

O2O1

τ1 τ2 τ3 τ4 τ5 τ6

O2O1

τ1

τ2

τ3

A

B

C

D

E

F

τ4

τ6

τ5

Optimised without

mode consideration

τ1

τ 4

τ2

τ3

τ6τ5

A

C

B

D

E

F

Optimised with

mode consideration

area=6

PE1

MappingString (PE):

MappingString (PE):

area=6

PE1

PE2 PE2

1 1 1 1 2 21 1 2 1 2 1

CL1

CL1

Mode:Mode:

26.72mJ 15.74mJ, 41%

Leakage powerLeakage power

� Reduced using adaptive body biasing (ABB) by increasing the processor threshold voltage and decreasing its frequency

� Simultaneous reduction of dynamic and leakage power in MPSoC*

� Considering the overheads (energy, time) imposed by changing voltage levels

� Reduced using adaptive body biasing (ABB) by increasing the processor threshold voltage and decreasing its frequency

� Simultaneous reduction of dynamic and leakage power in MPSoC*

� Considering the overheads (energy, time) imposed by changing voltage levels

*Andrei, *Andrei, ElesEles, , PengPeng, Schmitz, Al, Schmitz, Al--HashimiHashimi, , ““Energy optimization of multiprocessor systems on chip by voltageEnergy optimization of multiprocessor systems on chip by voltage selectionselection””, IEEE TVLSI, March 2007, IEEE TVLSI, March 2007

LOPOCOS LOPOCOS

� Explore different system architectures

0 Optimisation targets

• Energy-efficiency

• Cost

� Explore different system architectures

0 Optimisation targets

• Energy-efficiency

• Cost

Allocation Evaluation

Task MappingEvaluation

CommunicationMapping and

Schedule Evaluation

PV-DVS Algorithm

based List SchedulingMapping & PriorityCommunication

Phrasing & Data structures

User driven

changesarchitectural

τ0 2τ

τ1τ3

+

Allocation & Mapping

31

0

2

2.5V, 2.7V

3.3V

1.6V

Schedule & Scaling Voltages

Task Mappingusing GA withImprovementStrategies

+Performance requirements

Designer knowledge based

initial allocation of components

Yes

Yes

Convergence? No

Acceptable solution? No

YesConvergence? No

Component Allocation

Comm. Mapping &Priority candidate

Allocation candidate

- SYSTEM COST

OUTPUT FOR EACH OPERATIONAL MODE

- ENERGY DISSIPATION

LOPOCOS

OMSM

INPUT

2

τ1τ

τ3

Task graphs

MEMsDVS-CPUs

DSPs

CPUsBusses

ASICs

FPGAs

TechnologyLibrary

QUALITY METRICS:

- AREA VIOLATION

Task mapping candidateTask Mapping and Core Allocation

Step 1

Step 2

Step 3

Step 4

System Specification

PriorityOptimisationusing GA

Comm.Mapping &

(Low-Power Co-Synthesis)

Smart Phone: Case Study Smart Phone: Case Study

� Smart Phone Specification

0 GSM Phone

0 MP3 player

0 Digital Camera

� System price: < $120

� Power consumption: < 1.6mW

� Smart Phone Specification

0 GSM Phone

0 MP3 player

0 Digital Camera

� System price: < $120

� Power consumption: < 1.6mW

Smart Phone: Case Study (cont..)Smart Phone: Case Study (cont..)A

rea

Pen

alty

Average Power (mW)

MEM

ASIC(55.2mm )

MEM

ASIC

Interface

CPUDVS

I2C-Bus

Interface

(ARM7DVS)25MHzIn

terface

CPU

(6052)

DVS

I2C-Bus

6.6MHz2

(150mm )2

0

1

2

3

4

5

6

7

8

9

10

0 61 2 3 4 5

Architecture 3Architecture 2 Price: $56

DVS

Price: $79

Interface

MultiMulti--Mode Embedded Systems (cont..)Mode Embedded Systems (cont..)

GSM

codec

+ RLCΨ =0.741Ψ =0.012

Ψ =0.023

Ψ =0.024

Ψ =0.15Ψ =0.016

Ψ =0.017

Ψ =0.090

Search

NetworkRadio

Link

Control

Take

PhotoPhoto

decode

Search+ NeworkPhoto

+ RLC

+ RLC

MP3 playMP3 play+ NeworkSearch

Network found

Network lost

Network found

Network found

Network found

Network lost

Show

Photo Photo

Show

Incoming CallUser request /

Terminate

Call

audioPlay

audioPlay TerminateTerminate

audio audio

TerminateTerminate

PhotoPhoto

Photo Phototaken taken

Take

Pho

to

Take Photo

decode

INIT

Radio Link Control

decode Photo

=0.0

25s

HD

deQ

IDCT

Tr.color

coeff.

256

256

256

256

RLC

=0.015sθ

θ =0.025s

φ

Schmitz, M. T., Al-Hashimi, B. M. , Eles, P. Co-Synthesis of Energy-Efficient Multi-Mode Embedded Systems with Consideration of Mode Execution Probabilities.

IEEE TCAD, 24(2),pp.153-170, Feb.05

Smart Phone: Case Study (cont..)Smart Phone: Case Study (cont..)

Are

a P

enal

ty

Average Power (mW)

no DVSDVS

MEM ASIC

ASIC

(55.2mm )

I2C-Bus

Interface

CPUDVS

(ARM7DVS)10MHz

Interface

Interface

(150mm )2

2

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6

Architecture 4Price: $112

LOPOCOS Synthesis ResultsLOPOCOS Synthesis Results

� Influence of the user behaviour� Influence of the user behaviour

Real-usage activation times

Aver

age

Pow

er (

mW

)

User A User B

0.9:0.1 0.1:0.9

0.5:0.5

II

I

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Mode 1

Mode 2

14

12

10

8

6

4

2 III

V

IV

VII

VI

Smart Phone: Case Study (cont..)Smart Phone: Case Study (cont..)

0

0.5

1

1.5

2

2.5

p(mW)

w/o prob.

with prob.

w/o prob.

with prob.

No DVS

DVS

� Specification consists of

0 3 applications 8 operational modes, w/o, 100/8% prob.

0 5—88 tasks and 0—137 communications

� Finding the “best“ solution took less than 8 hours

GSM

codec

+ RLCSearch

NetworkRadio

Link

Control

Take

PhotoPhoto

decode

Search+NetworkPhoto

+RLC

+ RLC

MP3 playMP3 play

+NetworkSearch

decode

Photo

Take

Pho

to

Network found

Network lost

Network found

Network found

Network found

Network lost

Show

Photo Photo

Show

Incoming CallUser request /

TerminateCall

audioPlay

audioPlay TerminateTerminate

audio audio

Terminate Terminate

Photo

Photo Phototaken taken

Take Photo

INIT

DigiCam

GSM

MP3

University

of Southampton

ICSA, School of Informatics, University of Edinburgh, Nov.15-2007

Reliable and Energy Efficient Embedded Computing Systems

Reliable and Energy Efficient Reliable and Energy Efficient

Embedded Computing SystemsEmbedded Computing Systems

Ejlali, Schmitz, Al-Hashimi, Miremadi, Rosinger, “ Combined Time and Information Redundancy for SEU-Tolerance in Energy-Efficient

Real-Time Systems”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(4), pp.323-335, April 2006

Background Background

� How to design embedded computing systems that can tolerate faults (transient, soft, SEU, bit-flip) is becoming important

0 Demand from industry (high reliability even in commodity microprocessors)

0 Technology scaling and power management are making designs more sensitive to faults (SEU sensitivity increases by 1-2 order of magnitude as Vdd reduces by 1V)

� Additional dimension to an already complex design problem, need to be done carefully?

� How to design embedded computing systems that can tolerate faults (transient, soft, SEU, bit-flip) is becoming important

0 Demand from industry (high reliability even in commodity microprocessors)

0 Technology scaling and power management are making designs more sensitive to faults (SEU sensitivity increases by 1-2 order of magnitude as Vdd reduces by 1V)

� Additional dimension to an already complex design problem, need to be done carefully?

Fault Tolerance Fault Tolerance

� Numerous techniques exists in Fault Tolerant Computing

0 Can they be applied directly to embedded computing systems?

0 Which FT technique or combination of techniques best match to specific requirements ?

� Same principles but different constraints

0 consume as little power as possible

0 cost (silicon area, execution time) as little as possible

0 Little or no performance degradation

� Numerous techniques exists in Fault Tolerant Computing

0 Can they be applied directly to embedded computing systems?

0 Which FT technique or combination of techniques best match to specific requirements ?

� Same principles but different constraints

0 consume as little power as possible

0 cost (silicon area, execution time) as little as possible

0 Little or no performance degradation

FaultFault--Tolerance: Time RedundancyTolerance: Time Redundancy

� Rollback recovery (re-execution of faulty tasks)� Rollback recovery (re-execution of faulty tasks)

Originalexecution

RecoveryExecution 1

RecoveryExecution 2

deadlineFault

detectSEU SEUs SEU Fault

detect

Non-faultyrun

slack

Reg.

0 1 1 1

1 1 0 0

1 1 1 1

Reg. 1

Reg. 2

Reg. 3

1

FaultFault--Tolerance: Time RedundancyTolerance: Time Redundancy

� Rollback recovery (re-execution of faulty tasks)� Rollback recovery (re-execution of faulty tasks)

Originalexecution

RecoveryExecution 1

RecoveryExecution 2

deadlineSEU SEUs SEU

Non-faultyrun

slack

� Time reserved for rollback-recoveries affects fault-tolerance and energy saving

Dynamic Voltage Scaling (DVS)Dynamic Voltage Scaling (DVS)

� Energy depends quadratic on frequency/voltage� Energy depends quadratic on frequency/voltage

Originalexecution

RecoveryExecution 1

RecoveryExecution 2

deadlineSEU SEUs SEU

Non-faultyrun

slack

Task executions can be extended

P

Fault tolerance/Energy TradeFault tolerance/Energy Trade--offoff

deadline

Originalexecution

RecoveryExecution 1

RecoveryExecution 2

SEU SEUs SEU

Non-faulty

P

Time Redundancy FT & Energy Management Time Redundancy FT & Energy Management

ConflictConflict

� Competing for the same resource

0Fault-tolerance requires slack time

0Dynamic voltage scaling requires slack time

� Competing for the same resource

0Fault-tolerance requires slack time

0Dynamic voltage scaling requires slack time

Carefully trading off between fault-tolerance

and energy management is necessary

� Our approach for reliable & energy efficient systems

- SEU (common), solve with information redundancy (error correction codes) and save slack time

- Multiple SEU (infrequent), solve with time redundancy (error detection + retransmission)

Information RedundancyInformation Redundancy

Reg.

IR-HW

OriginalRegisters

Single biterror corr.

Paritybits

Parity Generator

Data in

Data out

Information RedundancyInformation Redundancy

0 1

1 1

1 1

1 0

0 0

1 0

0 0

0 0

1 1 1 0

v-Parity bits

1

1

0

1

h-P

arity

bits

Reg.

IR-HW

Horizontal and vertical parity bits

Information RedundancyInformation Redundancy

0 1

1 1

1 1

1 0

0 0

1 0

1 0

0 0

1 1 1 0

1

1

0

1

v-Parity bits

h-P

arity

bits

Identified fault can be corrected during execution

Reg.

IR-HW

Horizontal and vertical parity bits

Energy efficient FTEnergy efficient FT

� FT through rollback-recoveries and information redundancy (depending on fault occurrence)

� FT through rollback-recoveries and information redundancy (depending on fault occurrence)

deadlineFault

detectSEU SEUs SEU

Non-faultyrun

Reg.

IR-HW

slack

SEUcorr.

SEUcorr.

RecoveryExecution 1

Originalexecution

Energy efficient FTEnergy efficient FT

� FT through rollback-recoveries and information redundancy

� FT through rollback-recoveries and information redundancy

Reg.

IR-HW

Higher energy savings

through DVS!

Energy

overhead

?

<

>

Originalexecution

RecoveryExecution 1

deadline

Non-faultyrun

slack

Key observations Key observations

� Using various ITC benchmarks synthesized and analysed (hardware, energy) using EDA tools and

different fault rates

0It is possible to improve embedded computing systems reliability to transient faults without compromising energy saving through DVS

� Employ information redundancy for SEU and time redundancy for multiple SEU

� Adaptive Body Bias used to reduce leakage power also increases SEU rate by up to 36%

� Using various ITC benchmarks synthesized and analysed (hardware, energy) using EDA tools and

different fault rates

0It is possible to improve embedded computing systems reliability to transient faults without compromising energy saving through DVS

� Employ information redundancy for SEU and time redundancy for multiple SEU

� Adaptive Body Bias used to reduce leakage power also increases SEU rate by up to 36%

NetworkNetwork--onon--ChipChip

� Future embedded computing systems will contain hundred’s of processors and memories blocks

� Bus communication may prove to be system bottleneck because

0 shared bandwidth and not compatible with the required Gbits/s bandwidth requirements

0 limited opportunities for parallelism and not compatible with the highly parallel system architectures

0 not scalable

� Network-on-chip attempts to solve the above issues

� Future embedded computing systems will contain hundred’s of processors and memories blocks

� Bus communication may prove to be system bottleneck because

0 shared bandwidth and not compatible with the required Gbits/s bandwidth requirements

0 limited opportunities for parallelism and not compatible with the highly parallel system architectures

0 not scalable

� Network-on-chip attempts to solve the above issues

NetworkNetwork--onon--Chip Chip

HW/SW CoHW/SW Co--Design with Design with NoCNoC Platform Platform

� Scheduling and mapping optimisation to include not only computation cost but also variable communication cost (power, latency, reliability,..) as there are a number of possible routing options in NoC

� NIRGAM SystemC cycle accurate simulator, analyse NoC in terms of routing algorithms and applications on various topologies.

http://nirgam.ecs.soton.ac.uk/

� Scheduling and mapping optimisation to include not only computation cost but also variable communication cost (power, latency, reliability,..) as there are a number of possible routing options in NoC

� NIRGAM SystemC cycle accurate simulator, analyse NoC in terms of routing algorithms and applications on various topologies.

http://nirgam.ecs.soton.ac.uk/

Final Thoughts Final Thoughts

� How to develop effective energy-efficient system-level automated design flows is reasonably well understood as demonstrated by the recent availability of SL-EDA tools

0 Extending SL flows to NoC platform is the next step?

� Low power and reliability are 2 key objectives when designing future embedded computing systems

0 Selecting the appropriate FT techniques fit for the application is important but not trivial

0 Developing application specific and *light weight* FT (employed only where needed) may be necessary to gain acceptance

� How to develop effective energy-efficient system-level automated design flows is reasonably well understood as demonstrated by the recent availability of SL-EDA tools

0 Extending SL flows to NoC platform is the next step?

� Low power and reliability are 2 key objectives when designing future embedded computing systems

0 Selecting the appropriate FT techniques fit for the application is important but not trivial

0 Developing application specific and *light weight* FT (employed only where needed) may be necessary to gain acceptance