distributed fsm modeling and verification using maude

32
Online Transition Lecture/Presentation Zoom Summary/Homework/Mini Projects No change Term Project Demo video + Report 1

Upload: others

Post on 19-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed FSM Modeling and Verification Using Maude

Online Transition

• Lecture/Presentation

– Zoom

• Summary/Homework/Mini Projects

– No change

• Term Project

– Demo video + Report

1

Page 2: Distributed FSM Modeling and Verification Using Maude

Mini Project: RT-Gang

• One (parallel) real-time task at a time

• Equivalent to single-core fixed-priority RM scheduling

• Schedule best-effort tasks during slacks w/ throttling

2

Core 1

Core 2

Core 3

Core 4

t1 t2 t3 t2 t1

time

t1

release

completion

priority: t1 < t2 < t3

Idle or best-effort

real-timet1 t2 t3

[RTAS’19] Waqar Ali and Heechul Yun. “RT-Gang: Real-Time Gang Scheduling Framework for Safety-Critical Systems.” In RTAS, 2019

Page 3: Distributed FSM Modeling and Verification Using Maude

Real-Time Multi/Many-Core Architecture

Heechul Yun

3

Page 4: Distributed FSM Modeling and Verification Using Maude

Real-Time Multi/Many-Core Architecture

• Projects on Real-Time CPU Architectures

• Assigned Papers

– Deterministic Memory Abstraction and Supporting Multicore System Architecture. ECRTS, 2018

– BRU: Bandwidth Regulation Unit for Real-Time Multicore Processors, RTAS, 2020

4

Page 5: Distributed FSM Modeling and Verification Using Maude

Trends in Automotive E/E Systems

5

A. Hamann (Bosch). “Industrial Challenge: Moving from Classical to High-Performance Real-Time Systems.” WATER, 2018.

Sourc

e:

Bosch

Centralization & High-Performance HW

Page 6: Distributed FSM Modeling and Verification Using Maude

Modern System-on-a-Chip (SoC)

6

Core1 Core2 GPU NPU…

Memory Controller (MC)

Shared Cache

• Integrate multiple cores, GPU, accelerators

• Good performance, size, weight, power

• Challenges: time predictability

DRAM

Page 7: Distributed FSM Modeling and Verification Using Maude

Worst-Case Execution Time (WCET)

• Real-time scheduling theory is based on the assumption of known WCETs of real-time tasks

7

Image source: [Wilhelm et al., 2008]

Page 8: Distributed FSM Modeling and Verification Using Maude

Computing WCET

• Static analysis

– Input: program code, architecture model

– output: WCET

– Problem: architecture model is hard and pessimistic

• Measurement

– No guarantee on true worst-case

– But, widely used in practice

8

Page 9: Distributed FSM Modeling and Verification Using Maude

Memory Hierarchies, Pipelines, and Buses for Future Architectures in Time-Critical Embedded Systems

IEEE TCAD, 2009

9

Page 10: Distributed FSM Modeling and Verification Using Maude

“Problematic” CPU Features

• Architectures are optimized to reduce average performance

• WCET estimation is hard because of– Pipelining

– TLBs/Caches

– Super-scalar

– Out-of-order scheduling

– Branch predictors

– Hardware prefetchers

– Basically anything that affect processor state

10

Page 11: Distributed FSM Modeling and Verification Using Maude

Static Timing Analysis

11

968 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 7, JULY 2009

Fig. 1. Main components of a timing-analysis framework and theirinteraction.

A. Timing-Analysis Framework

Over the last several years, a more or less standard archi-

tecture for timing-analysis tools has emerged [11]–[13]. Fig. 1

shows a general view on this architecture. First, one can distin-

guish three major building blocks:

1) control-flow reconstruction and static analyses for control

and data flow;

2) microarchitectural analysis, which computes upper and

lower bounds on execution times of basic blocks;

3) global bound analysis, which computes upper and lower

bounds for the whole program.

The following list presents the individual phases and de-

scribes their objectives and problems. Note that the first four

phases are part of the first building block.

1) Control-flow reconstruction [14] takes a binary exe-

cutable to be analyzed, reconstructs the program’s control

flow, and transforms the program into a suitable interme-

diate representation. Problems encountered are dynami-

cally computed control-flow successors, e.g., stemming

from switch statements, function pointers, etc.

2) Value analysis [15], [16] computes an overapproximation

of the set of possible values in registers and memory loca-

tions by an interval analysis and/or congruence analysis.

This information is, among others, used for a precise data-

cache analysis.

3) Loop bound analysis [17], [18] identifies loops in the

program and tries to determine bounds on the number

of loop iterations, information which is indispensable to

bound the execution time. Problems are the analysis of

arithmetic on loop counters and loop-exit conditions, as

well as dependencies in nested loops.

4) Control-flow analysis [17], [19] narrows down the set

of possible paths through the program by eliminating

infeasible paths or to determine correlations between the

number of executions of different blocks using the results

of value-analysis results. These constraints will tighten

the obtained timing bounds.

5) Microarchitectural analysis [10], [20], [21] determines

bounds on the execution time of basic blocks by per-

forming an abstract interpretation of the program, taking

into account the processor’s pipeline, caches, and spec-

ulation concepts. Static cache analyses determine safe

approximations to the contents of caches at each program

point. Pipeline analysis analyzes how instructions pass

through the pipeline accounting for occupancy of shared

resources like queues, functional units, etc. Ignoring these

average-case-enhancing features would result in impre-

cise bounds.

6) Global bound analysis [22], [23] finally determines

bounds on execution time for the whole program. In-

formation about the execution time of basic blocks is

combined to compute the shortest and the longest paths

through the program. This phase takes into account in-

formation provided by the loop bound and control-flow

analyses.

The commercially available tool ai T by AbsInt, cf.

http://www.absint.de/wcet.htm, implements this architecture.

It is used in the aeronautics and automotive industries and

has been successfully used to determine precise bounds on

execution times of real-time programs [6], [7], [10], [24].

III. PIPELINES

For nonpipelined architectures, one can simply add up the

execution times of individual instructions to obtain a bound

on the execution time of a basic block. Pipelines increase

performance by overlapping the executions of different in-

structions. Hence, a timing analysis cannot consider individual

instructions in isolation. Instead, they have to be considered

collectively—together with their mutual interactions—to obtain

tight timing bounds.

The analysis of a given program for its pipeline behavior is

based on an abstract model of the pipeline. All components

that contribute to the timing of instructions have to be modeled

conservatively. Depending on the employed pipeline features,

the number of states the analysis has to consider varies greatly.

A. Contributions to Complexity

Since most parts of the pipeline state influence timing, the

abstract model needs to closely resemble the concrete hard-

ware. The more performance-enhancing features a pipeline has,

the larger is the search space. Superscalar and out-of-order

executions increase the number of possible interleavings. The

larger the buffers (e.g., fetch buffers, retirement queues, etc.),

the longer the influence of past events lasts. Dynamic branch

prediction, cachelike structures, and branch history tables in-

crease history dependence even more.

All these features influence execution time. To compute a

precise bound on the execution time of a basic block, the analy-

sis needs to exclude as many timing accidents as possible. Such

Authorized licensed use limited to: University of Florida. Downloaded on March 29,2010 at 12:54:04 EDT from IEEE Xplore. Restrictions apply.

Page 12: Distributed FSM Modeling and Verification Using Maude

Control Flow Graph (CFG)

• Analyze code

• Split basic blocks

• Compute per-block WCET

– use abstract CPU model

12

Page 13: Distributed FSM Modeling and Verification Using Maude

Timing Anomalies

• Locally faster != globally faster

13Image source: [Wilhelm et al., 2008]

Page 14: Distributed FSM Modeling and Verification Using Maude

WCET and Caches

• Take cache hits/misses into account?

– To reduce pessimism in WCET estimation

• How to know cache hits/misses of a given job?

– If we assume

• the path (instruction stream) is given

• the job is not interrupted.

• A known “good” cache replacement policy is used

– Then we can statically determine hits/misses

• But less so when “bad” replacement policies are used

14

Page 15: Distributed FSM Modeling and Verification Using Maude

Timing Anomalies

• Locally faster != globally faster

15Image source: [Wilhelm et al., 2008]

Page 16: Distributed FSM Modeling and Verification Using Maude

Multicore and Shared Memory

16

• Memory performance varies widely due to interference

• Task WCET can be extremely pessimistic

Core1 Core2 Core3 Core4

Memory Controller (MC)

Shared Cache

DRAM

Task 1 Task 2 Task 3 Task 4

I D I D I D I D

Page 17: Distributed FSM Modeling and Verification Using Maude

Effect of Memory Interference

• DNN control task suffers >10X slowdown

– When co-scheduling different tasks on on idle cores.

17

0

2

4

6

8

10

12

DNN (Core 0,1) BwWrite (Core 2,3)

Norm

aliz

ed E

xeuction T

ime

SoloCorun

DRAM

LLC

Core1 Core2 Core3 Core4

DNN BwWrite

Waqar Ali and Heechul Yun. “RT-Gang: Real-Time Gang Scheduling Framework for Safety-Critical Systems.” RTAS, 2019 (to appear)

Page 18: Distributed FSM Modeling and Verification Using Maude

Cache Denial-of-Service Attacks

18Michael G. Bechtel and Heechul Yun. “Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention.” In RTAS, 2019 (to appear, Outstanding Paper Award)

LLC

Core1 Core2 Core3 Core4

victim attackers

• Observed worst-case: >300X (times) slowdown

– On simple in-order multicores (Raspberry Pi3, Odroid C2)Difficult to guarantee predictable timing

Page 19: Distributed FSM Modeling and Verification Using Maude

Real-Time CPU Architectures

• PRET– UC Berkeley.

• MERASA/parMERASA project– EU

• ACROSS– EU

• ARAMIS– Germany

• EMC2– EU

19

Page 20: Distributed FSM Modeling and Verification Using Maude

FlexPRET: A Processor Platform for Mixed-Criticality Systems

RTAS, 2014

20

Page 21: Distributed FSM Modeling and Verification Using Maude

21

Page 22: Distributed FSM Modeling and Verification Using Maude

PRET Pipeline

22

FETCHDECOD

EREGACC MEM

EXECUTE

EXCEPT

FETCHDECOD

EREGACC MEM

EXECUTE

EXCEPT

FETCHDECOD

EREGACC MEM

EXECUTE

EXCEPT

FETCHDECOD

EREGACC MEM

EXECUTE

EXCEPT

FETCHDECOD

EREGACC MEM

EXECUTE

EXCEPT

FETCHDECOD

EREGACC MEM

EXECUTE

EXCEPT

FETCHDECOD

EREGACC MEM

EXECUTE

EXCEPT

FETCHDECOD

EREGACC MEM

EXECUTE

FETCHDECOD

EREGACC MEM

FETCHDECOD

EREGACC

FETCHDECOD

E

FETCH

t

THREAD#1

THREAD#2

THREAD#3

THREAD#4

THREAD#5

THREAD#6

1 clock

Thread 1, Instruction 1 Thread 1, Instruction 2

Page 23: Distributed FSM Modeling and Verification Using Maude

FlexPRET Pipeline

23

Page 24: Distributed FSM Modeling and Verification Using Maude

Hardware Support for WCET Analysis of Hard Real-Time

Multicore Systems

ISCA 2009

24

Page 25: Distributed FSM Modeling and Verification Using Maude

Analyzable Multicore Architecture

• Idea1: Bound interference on shared resources

– On-chip shared bus

– (shared) L2 cache

• Idea2: WCET computation mode

25

Page 26: Distributed FSM Modeling and Verification Using Maude

Architecture

26

Page 27: Distributed FSM Modeling and Verification Using Maude

Round-Robin Bus Arbitration

• UBD = (NHRT – 1) * Lbus

27

Page 28: Distributed FSM Modeling and Verification Using Maude

Atomic vs. Split-Transaction Bus

• …

28J. P. Shen and M. H. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors. Waveland Press, 2013.

Page 29: Distributed FSM Modeling and Verification Using Maude

Request vs. Job-level WCET Analysis

• Request-level analysis

– Assume worst-case interference for each access of the task under analysis

– Pessimistic as not all accesses will get interference

• Job-level analysis

– Assume the total number of competing memory access is known

– Can reduce pessimism

29

Page 30: Distributed FSM Modeling and Verification Using Maude

Summary

• Timing anomalies

– Locally fast != globally fast on non-timing compositional architectures (i.e., most architectures)

• Timing compositional architecture

– Free of timing anomalies

30

Page 31: Distributed FSM Modeling and Verification Using Maude

Discussion

• Why is this interesting?

• Are assumptions realistic?

– Task model

– Cache model

– Memory model

– CPU (pipeline) model

– Bus model

31

Page 32: Distributed FSM Modeling and Verification Using Maude

Acknowledgement

• Some slides are from:

– Prof. Rodolfo Pellizzoni, University of Waterloo

– Prof. Edward A. Lee, University of Berkeley

32