towards a contract-based fault-tolerant scheduling framework for distributed real-time systems...

10
Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems Abhilash Thekkilakattil , Huseyin Aysan and Sasikumar Punnekkat Mälardalen Real-Time Research Centre, Mälardalen University, Sweden CONTESSE ARIES

Upload: scott-daniel

Post on 27-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Towards a Contract-based Fault-tolerant Scheduling Framework for

Distributed Real-time Systems

Abhilash Thekkilakattil, Huseyin Aysan and Sasikumar Punnekkat

Mälardalen Real-Time Research Centre, Mälardalen University,

Sweden

CONTESSE ARIES

Introduction

Co

mp

lexi

ty o

f re

al-t

ime

sys

tem

s

Component Based Software engineering

5 86

53 8

Reliability Requirements

Contracts for real-time components• Enable correct composition of components• Ensure correctness by construction

Pervasiveness of real-time Systems

Improving Reliability of Real-time Systems

• Zonal and functional hazard analyses• Checks if the redundancies indeed exist• Ensures that independent components are not affected by common

causes• Provides input to the design e.g., separation and segregation of

components

• Zonal analysis for software systems• Improves the reliability of software components• Removes failures on independent components due to common

causes• Inputs to the design e.g., allocation

• Transient errors: most widespread cause of failure• Solution: re-execute the failed component

Taken from toonpool.com

Problem

Allocation and scheduling of real-time components on a distributed platform

• Satisfy the re-execution requirements of critical components• Satisfy the distribution requirements of critical components• Maximize service to the non-critical components• Fulfill real-time requirements

Component Time Period (Ti)

Worst Case Execution Time (Ci)

Re-executions required ( Ri )

No. of re-executions required on a different node (mi)

Criticality

A 10 2 2 1 C

B 5 2 1 1 C

C 5 1 0 0 N

D 10 6 0 0 N

• Task allocation problem: an NP hard problem• We use known optimization methods: achieve efficient allocation

• Satisfying the reliability requirements: an NP hard problem • We simplify by introducing Feasibility Windows

• Feasibility Windows: temporal intervals for task executions• Fault Tolerant Feasibility Windows for critical components• Fault Aware Feasibility Windows for non-critical components

• Contracts for fault-tolerance• Contract: task parameters which provide the required guarantees• Offline contracts: offline guarantees for critical components• Online contracts: maximize service to non-critical components

Overview of the Solution

Method

•Allocate the components on the minimum number of processors

•Derive Fault Tolerant Feasibility Windows for critical components

•Derive Fault Aware Feasibility Windows for non critical components

•Derive contractual parameters to ensure that the executions are within the derived windows

• Minimum size of a window of a component=WCET of the component• Guarantees feasible execution of the component

• Feasibility windows of the same component are disjoint in time• Ensure timely execution in order to enable the re-execution• To preserve the order of execution of the component and its re-

executions

• While allocation the processor utilization demand during any interval should not exceed the size of the interval to avoid overloads

• New method to deal with offsets• Derived from the classical feasibility analysis by Baruah et. al

Optimization Formulation

Example

A2BB

B1 B1A1

D DBB

A B1 B1A1

AC C

C C

FT_FW(A2)FT_FW(B)FT_FW(B)

FT_FW(A) FT_FW(A1)

FT_FW(B1) FT_FW(B1)FA_FW(C) FA_FW(C)

Wo

rst

Cas

e

Maximum fault occurrence

Nod

e1

105 86

1053 8Nod

e2

53

10

10

5

8

86

Bet

ter

than

Wo

rst

Cas

e

Less than maximum fault occurrence

Nod

e1N

ode2

D D

Component Time Period (Ti)

Worst Case Execution Time (Ci)

Re-executions required ( Ri )

No. of re-executions required on a different node (mi)

Criticality

A 10 2 2 1 C

B 5 2 1 1 C

C 5 1 0 0 N

D 10 6 0 0 N

Conclusions

We have proposed a methodology for the allocation and scheduling of components with mixed criticalities which:

• Guarantees the re-execution requirements for the critical components: offline contracts

• Maximize the service to non-critical components: online contracts

• Scheduler independent• Allocation on the minimum number of processors

Future work includes• Feasibility of real-time components with offsets: complexity

reduction• Optimality

Thank You !

?