ieee transactions on computers, vol. xxx, no. xxx,...

14
IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 1 Multi-Threaded Reactive Programming— The Kiel Esterel Processor Xin Li and Reinhard von Hanxleden, Member, IEEE Abstract—The Kiel Esterel Processor (KEP) is a multi-threaded reactive processor designed for the execution of programs written in the synchronous language Esterel. Design goals were timing predictability, minimal resource usage, and compliance to full Esterel V5. The KEP directly supports Esterel’s reactive control flow operators, notably concurrency and various types of preemption, through dedicated control units. Esterel allows arbitrary combinations and nestings of these operators, which poses particular implementation challenges that are addressed here. Other notable features of the KEP are a refined instruction set architecture, which allows to trade off generality against resource usage, and a Tick Manager that minimizes reaction time jitter and can detect timing overruns. Index Terms—Reactive systems, concurrency, multi-threading, synchronous languages, Esterel, low-power design, predictability 1 I NTRODUCTION M ANY embedded systems belong to the class of reactive systems, which continuously react to in- puts from the environment by generating corresponding outputs. The programming of reactive systems typically requires the use of non-standard control flow constructs, such as concurrency and exception handling. Most pro- gramming languages, including languages such as C and Java that are commonly used in embedded systems, either do not support these constructs at all, or their use induces non-deterministic program behavior, regarding both functionality and timing. To address this difficulty, the synchronous language Esterel [2] has been developed to express reactive control flow in a concise, deterministic manner. This is valuable for the designer, but also poses implementation chal- lenges. As Esterel is a domain-independent program- ming (specification) language, there are a number of implementation alternatives, each with its advantages and drawbacks, see also Table 1. An Esterel program is typically validated via a simulation-based tool set, and then synthesized to an intermediate language, e. g., C or VHDL [1]. To build the real system, one typically uses a commercial off-the-shelf (COTS) processor for a software implementation, or a circuit is generated for a hardware implementation. HW/SW co-design strategies have also been investigated, for example in POLIS [5]. Reactive programs are often characterized by very fre- quent context switches; as it turns out, a context switch R. von Hanxleden is with the Department of Computer Science, Christian-Albrechts-Universit¨ at zu Kiel, 24098 Kiel, Germany. E-Mail: [email protected]. X. Li is with the Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities, Minneapolis, Minnesota, USA. E- Mail: [email protected]. Manuscript received June 8, 2007; revised October 30, 2009. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TCSI-2007-06-0223. Digital Object Identifier no. XXX. . after every three or four instructions is not uncom- mon [8]. This adds significant overhead to the traditional compilation approaches, as the restriction to a single program counter requires the program to manually keep track of thread control counters using state variables. Traditional OS context switching mechanisms would be even more expensive. Furthermore, the handling of pre- emptions requires a rather clumsy sequential checking of conditionals whenever control flow may be affected by a preemption. To address these difficulties, the recently emerging reactive processing approach strives for a direct imple- mentation of Esterel’s control flow and signal handling constructs. This provides hardware support for han- dling reactive control flow, alleviating the need for a compiler that sequentializes the code or for an OS that emulates concurrency. In this paper, we present the Kiel Esterel Processor (KEP) reactive architecture. The development of the KEP was driven by the desire to achieve predictable, competitive execution speeds at minimal resource usage, considering processor size and power usage as well as instruction and data memory. A key to achieve this goal is the instruction set archi- tecture (ISA) of the KEP, which allows the mapping of Esterel programs into compact machine code while keeping the processor compact. To keep the KEP simple and light-weight, it currently does not employ classical acceleration mechanism such as pipelining, other forms of instruction level parallelism, or caching. Such mecha- nism can be combined with reactive processing [9], but typically there is a trade-off between average-case per- formance and predictability. Still, the worst case reaction time of the KEP is typically improved by 4x compared to the MicroBlaze, a COTS RISC processor core, and energy consumption is also typically reduced to a quarter; see also Sec. 6.4. This paper presents a comprehensive overview of the KEP architecture and how it meets the challenge to accurately and efficiently implement the rich, strictly 0000–0000/00$00.00 c 200X IEEE

Upload: others

Post on 09-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 1

Multi-Threaded Reactive Programming—The Kiel Esterel Processor

Xin Li and Reinhard von Hanxleden, Member, IEEE

Abstract—The Kiel Esterel Processor (KEP) is a multi-threaded reactive processor designed for the execution of programs writtenin the synchronous language Esterel. Design goals were timing predictability, minimal resource usage, and compliance to full EsterelV5. The KEP directly supports Esterel’s reactive control flow operators, notably concurrency and various types of preemption, throughdedicated control units. Esterel allows arbitrary combinations and nestings of these operators, which poses particular implementationchallenges that are addressed here. Other notable features of the KEP are a refined instruction set architecture, which allows to tradeoff generality against resource usage, and a Tick Manager that minimizes reaction time jitter and can detect timing overruns.

Index Terms—Reactive systems, concurrency, multi-threading, synchronous languages, Esterel, low-power design, predictability

F

1 INTRODUCTION

MANY embedded systems belong to the class ofreactive systems, which continuously react to in-

puts from the environment by generating correspondingoutputs. The programming of reactive systems typicallyrequires the use of non-standard control flow constructs,such as concurrency and exception handling. Most pro-gramming languages, including languages such as Cand Java that are commonly used in embedded systems,either do not support these constructs at all, or their useinduces non-deterministic program behavior, regardingboth functionality and timing.

To address this difficulty, the synchronous languageEsterel [2] has been developed to express reactive controlflow in a concise, deterministic manner. This is valuablefor the designer, but also poses implementation chal-lenges. As Esterel is a domain-independent program-ming (specification) language, there are a number ofimplementation alternatives, each with its advantagesand drawbacks, see also Table 1. An Esterel program istypically validated via a simulation-based tool set, andthen synthesized to an intermediate language, e. g., C orVHDL [1]. To build the real system, one typically uses acommercial off-the-shelf (COTS) processor for a softwareimplementation, or a circuit is generated for a hardwareimplementation. HW/SW co-design strategies have alsobeen investigated, for example in POLIS [5].

Reactive programs are often characterized by very fre-quent context switches; as it turns out, a context switch

• R. von Hanxleden is with the Department of Computer Science,Christian-Albrechts-Universitat zu Kiel, 24098 Kiel, Germany. E-Mail:[email protected].

• X. Li is with the Department of Electrical and Computer Engineering,University of Minnesota, Twin Cities, Minneapolis, Minnesota, USA. E-Mail: [email protected].

Manuscript received June 8, 2007; revised October 30, 2009.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TCSI-2007-06-0223.Digital Object Identifier no. XXX. .

after every three or four instructions is not uncom-mon [8]. This adds significant overhead to the traditionalcompilation approaches, as the restriction to a singleprogram counter requires the program to manually keeptrack of thread control counters using state variables.Traditional OS context switching mechanisms would beeven more expensive. Furthermore, the handling of pre-emptions requires a rather clumsy sequential checkingof conditionals whenever control flow may be affectedby a preemption.

To address these difficulties, the recently emergingreactive processing approach strives for a direct imple-mentation of Esterel’s control flow and signal handlingconstructs. This provides hardware support for han-dling reactive control flow, alleviating the need for acompiler that sequentializes the code or for an OSthat emulates concurrency. In this paper, we presentthe Kiel Esterel Processor (KEP) reactive architecture.The development of the KEP was driven by the desireto achieve predictable, competitive execution speeds atminimal resource usage, considering processor size andpower usage as well as instruction and data memory.A key to achieve this goal is the instruction set archi-tecture (ISA) of the KEP, which allows the mappingof Esterel programs into compact machine code whilekeeping the processor compact. To keep the KEP simpleand light-weight, it currently does not employ classicalacceleration mechanism such as pipelining, other formsof instruction level parallelism, or caching. Such mecha-nism can be combined with reactive processing [9], buttypically there is a trade-off between average-case per-formance and predictability. Still, the worst case reactiontime of the KEP is typically improved by 4x compared tothe MicroBlaze, a COTS RISC processor core, and energyconsumption is also typically reduced to a quarter; seealso Sec. 6.4.

This paper presents a comprehensive overview of theKEP architecture and how it meets the challenge toaccurately and efficiently implement the rich, strictly

0000–0000/00$00.00 c© 200X IEEE

Page 2: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 2

Hardware Software Co-design Patched CustomProcessor Processor

ArchitectureSpeed ++ – + + +

Selected Berry [1], Berry et al. [2], Balarin et al. Roop et al. Li et al.References Edwards [3] Edwards et al. [4] [5] [6] [7]Flexibility – – ++ – +/– +

Esterel Compliance ++ ++ +/– – ++Logic Area ++/– + + – – +/–

Cost Memory ++ – – – + +Power Usage ++ – – – – +

Appl. Design Cycle – – ++ +/– ++ ++

TABLE 1Comparison of Esterel implementation alternatives. ++ represents best; – – means worst.

synchronous semantics of the Esterel language. Beyondearlier publications [7], [8], [10] that covered differentaspects of the KEP as realized in earlier generations(see Sec. 3), this paper also presents a detailed treat-ment of the interaction of concurrency and preemption(Sec. 5/5.2), and a fairly extensive review of the devel-opment of reactive processing so far. A full presenta-tion of the concrete design of the KEP, its validationand the experimental evaluation, in more detail than ispossible here, can be found in the dissertation of thefirst author [11]. Note also that this paper focuses onthe architecture and the ISA of the KEP. Closely relatedare the issues of code generation for the KEP and WorstCase Reaction Time (WCRT) analysis, both of which arecovered elsewhere in detail [12], [13].

The rest of this paper is organized as follows. The nextsection provides some basics on the Esterel language.Sec. 3 discusses related work. Sections 4 and 5 presentthe instruction set and the architecture of the KEP.The validation platform and experimental results arepresented in Sec. 6. The paper concludes in Sec. 7.

2 THE ESTEREL LANGUAGE

THE execution of an Esterel program is divided intological instants, or ticks, and communication within

or across threads occurs via signals. At each tick, asignal is either present (emitted) or absent (not emitted).The test for the presence of a signal is per defaultnon-immediate; for example, an await S first pauses forone tick, and only from the next tick on tests for thepresence of S. There are also immediate variants, forexample, await immediate S tests for S from the sameinstant onwards that the statement is entered. Esterel

statements are either transient, in which case they do notconsume logical time, or delayed, in which case executionis finished for the current tick. Per default statementsare transient, and these include for example emit, loop,present, or the preemption operators. Delayed statementsinclude pause, (non-immediate) await, and every.

2.1 Reactive control flow

Esterel’s parallel operator || groups statements in concur-rently executed threads. The parallel terminates when allits branches have terminated.

Esterel offers two types of preemption constructs,abortion and suspension. An abortion kills its body whena delay elapses. We distinguish strong abortion, whichkills its body immediately (at the beginning of a tick),and weak abortion, which lets its body receive control fora last time (abortion at the end of the tick). A suspensionfreezes the state of a body in the tick when the triggerevent occurs.

Esterel also offers an exception handling mechanismvia the trap/exit statements. An exception is declared witha trap scope, and is thrown with an exit statement. An exitT statement causes control flow to move to the end of thescope of the corresponding trap T declaration. This is sim-ilar to a goto statement, however, there are specific ruleswhen traps are nested or when the trap scope includesconcurrent threads. The following rules apply: if onethread raises an exception and the corresponding trapscope includes concurrent threads, then the concurrentthreads will be weakly aborted; if concurrent threadsexecute multiple exit instructions in the same tick, theoutermost trap will take priority.

Page 3: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 3

1 module EXAMPLE:2 input S, I , H;3 output O1, O2;4 signal A,R in5 % −−−−−−− Thread 06 every S do7 trap T1 in8 trap T2 in9 % −−−−−−− Thread 1

10 [ await I ;11 weak abort12 suspend13 sustain R;14 when H;15 when immediate A;16 emit O1;17 exit T1;18 ||19 % −−−−−−− Thread 220 await 2 tick ;21 present R then22 emit A;23 end present;24 exit T2; ];25 % −−−−−−− Thread 026 end trap;27 emit O2;28 end trap;29 end every;30 end signal;31 end module

(a) Esterel program

1 INPUT S, I, H2 OUTPUT O1, O23 SIGNAL A, R4 % −−−−−−−−−− Thread 05 EMIT TICKLEN, #206 AWAIT S7 A0: ABORT S, A18 T1S: T2S: PAR 3, P19 PAR 2, P2

10 PARE P311 % −−−−−−−−−− Thread 112 P1: AWAIT I13 WABORTI A, A214 SUSPEND H, A415 A3: EMIT R16 PRIO 117 PRIO 318 PAUSE19 GOTO A320 A4: A2: EMIT O121 EXIT T1E,T1S22 % −−−−−−−−−− Thread 223 P2: LOAD COUNT, #224 AWAIT TICK25 PRESENT R, A526 EMIT A27 A5: EXIT T2E,T2S28 % −−−−−−−−−− Thread 029 P3: JOIN 030 T2E: EMIT O231 T1E: HALT32 A1: GOTO A0

(b) KEP assembler

-S

O2

S I

R,A,O1

S I

RR, A, O1

S I

R

H

O2

(c) Sample execution trace, with inputs above and outputsbelow logical tick time line

Fig. 1. EXAMPLE: an Esterel module illustrating Esterelparallel, preemption, and exception statements.

2.2 An ExampleLet us consider the EXAMPLE module in Fig. 1a. After theinput/output/local signal declarations, an every blockrestarts its body whenever the input signal S is present—except for the initial tick, when S is ignored, as the everyis not “immediate.” Inside the every block are two nestedtrap scopes, an outer trap triggered by T1 (via an exit T1exception) and an inner T2 trap. The body of the innertrap contains two parallel threads. Thread 1 initiallywaits for the input signal I. Once I has become present(non-immediately), the sustain R13

1 statement continu-ously emits the local signal R. However, that sustain isweakly aborted by A (immediately), and suspended byH (non-immediately). Once A has triggered the abortion,

1. To aid readability, we here use the convention of subscriptinginstructions with the line number where they occur.

O1 is emitted and the exception T1 is thrown. Thread 2initially idles for two ticks, then emits A if R is present,and exits the T2 trap.

A possible execution trace is shown in Fig. 1c. Allsignals are absent at the initial tick. At the second tick,the presence of input signal S triggers the start of theevery body, which spawns Threads 1 and 2. Thread 1stays at the await I, since this is non-immediate, andThread 2 stays at the await 2 tick. At the third tick, Thread1 again cannot proceed, since I is absent, and Thread 2stays the second tick at the await 2 tick. At the fourth tick,Thread 1 again does not get an input I; Thread 2 canproceed, detects R as absent, throws exception T2, andterminates. This exception aborts Thread 1 and transferscontrol to the end of the trap T2 scope, hence O2 getsemitted and control moves back to the every S loop. Theother possible behaviors that follow the next occurrencesof S are shown in the remainder of the time line. For adetailed understanding, the reader might also consultthe Esterel primer [14].

3 RELATED WORK

THERE is a by now extensive body of research onthe efficient compilation of Esterel in the vari-

ous domains, a full discussion of which is beyondthe scope of this paper. See for example Potop-Butucaru/Edwards [15] for a good overview of softwaresynthesis approaches. We here focus on existing work inthe field of reactive processing, which is a rather recentdevelopment. An earlier overview was presented in [16].

3.1 Sequential reactive processingThe first reactive processor, called REFLIX [6], was pre-sented by Salcic, Roop et al. in 2002. The REFLIX isa patched processor, combining a traditional soft microcontroller core (FLIX) with a custom hardware block thatextends the instruction set of the FLIX by certain new,Esterel-like instructions; see also Table 1. Although theEsterel-style statements (instructions) supported by theREFLIX were very limited, it performed better than itscompetitors, i. e., the FLIX and other micro controllers.The REPIC [17] replaced the FLIX by the PIC processor,which is popular in the industry control domain. Both ofthese patched processors are limited by the control pathof the traditional processor. This prevents for examplethe proper handling of nested traps, as the control paththere depends on address ranges and parallel relationsof the traps.

In 2004, the authors presented the first prototypeof the KEP [18], now referred to as the “KEP1,” toour knowledge the first custom reactive processor fullydesigned from scratch. The KEP1 was also the first tocorrectly handle weak and strong abortion. It includedWatcher units that handle such abortions concurrently tothe regular control flow, i. e., without the need to executeextra instructions to check for the triggering of abortions(see also Sec. 5.2. However, the KEP1 did not provide full

Page 4: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 4

(a) Multi-processing (b) Multi-threading

Fig. 2. Comparison of concurrent reactive architectures.

concurrency, and logic and arithmetic expression werealso not supported.

In 2005, Z. Salcic et al. presented the REMIC, anothercustom processor [19]. In the same year, the KEP2 im-proved over the KEP1 in that it includes an interfaceblock that supports the PRE-operator, and can handlefurther Esterel-constructs such as variables and localsignals [10]. Furthermore, it contains an ALU and sup-ports some classical logic and arithmetic expression. TheKEP2 also includes a Tick Manager, which can provide aconstant logical tick length and detects timing overruns;see also Sec. 5.5.

3.2 Concurrent reactive processingPerhaps the most distinguishing feature for reactiveprocessors is whether and how they handle concurrency.The first generations of reactive processors did not sup-port Esterel’s concurrency operator directly. Executingconcurrent Esterel programs thus required to transformthem into an equivalent program with a flattened statespace, i. e., was sequentialized by constructing a “productautomaton.”

The EMPEROR [20], [21] introduced the multi-processing approach for handling concurrency directly.Here, every Esterel thread is mapped onto an inde-pendent processor to be executed, and a thread controlunit handles the synchronization and communicationbetween processors; see also Fig. 2a. The EMPERORuses a cyclic executive to implement concurrency, andallows the arbitrary mapping of threads onto processingnodes. This approach has the potential to speed upexecution relative to single-processor implementations.However, this execution model potentially requires toreplicate parts of the control logic at each processor, andis thus relatively hardware-intensive. Furthermore, it isdifficult to support the arbitrary nesting of concurrencyand preemption. A more efficient concurrency imple-mentation approach for reactive processing appears tobe multi-threading, which was first employed by theKEP3, presented in 2006 [7]; see also Fig. 2b and furtherexplanations in Sec. 5.1. As illustrated in Sec. 6, thisapproach scales well to high degrees of concurrency with

minimal resource overhead. By now the multi-threadingapproach has also been adopted by the STARPro [9],which further improves performance by pipelining.

Later in 2006, the KEP3a and its compiler were pre-sented [8]. The KEP3a has improved over the KEP3in that it supports exception handling and providescontext-dependent preemption handling instructions.This paper presents version 4 of the KEP (KEP4), whichcompared to earlier generations has an enriched controlpath for handling advanced mixed Esterel control struc-tures, and supports numerous options for generatingprocessor configurations. The KEP4 and the strl2kasmcompiler fully support Esterel V5. The subsequentlyevolved Esterel V7 has numerous extensions, mainly tosupport hardware design, but has the same underlyingreactive control flow operations; hence, extending theKEP approach for Esterel V7 appears to be mostly a com-piler issue and should not require fundamental changesof the architecture described here.

3.3 Compilation, WCRT analysis, co-design

Since multi-processing and multi-threaded reactive pro-cessors employ different strategies to handle Esterelconcurrency, their compilers, which synthesize an Esterelprogram to the target reactive processor codes, also usedifferent approaches to implement the communicationbetween threads.

For multi-processing, the EMPEROR Esterel Compiler2 (EEC2) [21] is based on a variant of the graph code(GRC) format [15], and appears to be competitive evenfor sequential executions on a traditional processor.However, their synchronization mechanism, which isbased on a three-valued signal logic, does not to takecompile-time scheduling knowledge into account, butreplaces it by repeating cycles through all threads un-til all signal values have been determined. Hence thecompiler needs to generate sync instructions, to ensurethat signals are not tested before they are emitted [20].In comparison, the multi-threaded implementation ap-proach implements interleaving by inserting prioritysetting instructions at the context switch point.

The compiler for the multi-threaded KEP employsa priority assignment approach that makes use of anovel concurrent control flow graph, the ConcurrentKEP Assembler Graph (CKAG). The worst-case size ofthe CKAG is quadratic in the size of the correspondingEsterel program; in practice, namely for a bounded abortnesting depth, it is linear [12]. Unlike earlier Esterelcompilation schemes, this approach avoids unnecessarycontext switches by considering each thread’s actualexecution state at run time. Furthermore, it avoids codereplication present in other approaches. The compilationfor the KEP is further summarized by Li et al. [8] andpresented in detail by Boldt [12].

Since one key characteristic of the KEP is its timingpredictability, it is feasible to perform a conservative,yet fairly accurate analysis of its WCRT. A first WCRT

Page 5: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 5

technique was presented in 2005 [10], for the (sequential)KEP2. An extension to concurrent WCRT analysis waslater presented by Boldt et al. [13]. The analysis ofthe WCRT is influenced by the KEP in two ways: theexact number of instructions for each statement and theway parallelism is handled. The analysis is performedon the CKAG. In a first step one computes whetherconcurrent threads terminate instantaneously, thereafterone computes for each statement how many instructionare maximally executable from it in one logical tick. Themaximal value over all nodes gives us the WCRT of theprogram.

The KEP has also been employed as a platform forHW/SW co-design. Gadtke et al. [22] present an ap-proach to accelerate reactive processing via an externallogic block that handles complex signal expressions. AnEsterel program is synthesized into a software compo-nent, running on the KEP, and a hardware component,consisting of simple combinational logic. The transfor-mation process involves a two-step procedure, whichfirst partitions the program at the source level andsubsequently performs the synthesis. An intermediatelogic minimization, at the source code level, facilitatesthe synthesis of compact logic blocks.

3.4 Further related work

The KEP series of processors focuses on reactive controlflow. There are also other architectures, such as the KielLustre Processor (KLP) [23], that focus on data-flow andits efficient execution via parallelization.

Timing predictability is also the focus of the PrecisionTimed Architecture [24] developed at UC Berkeley andColumbia University. This architecture is also multi-threaded, with true parallelism, and employs physicaltime for thread synchronization. The PRET-C proposal[25] extends a traditional core with an external threadscheduling unit, and uses a synchronous programmingmodel.

Related to the KEP ISA, there also have been recentproposals for expressing synchronous concurrency by asmall set of operators embedded in a host language, suchas LuSteraL [26] or SyncCharts in C [27]; see also theConclusions.

4 THE KEP INSTRUCTION SET ARCHITEC-TURE

A T the Esterel level, one distinguishes kernel state-ments and derived statements; the derived statements

are basically syntactic sugar, built up from the kernelstatements. Any set of Esterel statements from whichthe remaining statements can be constructed could beconsidered a valid set of kernel statements. The acceptedset of Esterel kernel statements has indeed evolvedover time; for example, the halt statement used to beconsidered a kernel statement, but is now consideredto be derived from loop and pause. We here adopt the

definition of kernel statements from the v5 standard [14].The process of expanding derived statements into equiv-alent, more primitive statements—which may or may notbe kernel statements—is also called dismantling. Whendesigning an instruction set architecture to implementEsterel-like programs, it would in principle suffice to justimplement the kernel statements—plus some additionsthat go beyond “pure” Esterel, such as valued signals,local registers, and support for complex signal and dataexpressions. However, we decided against that, in favorof an approach that includes some redundancy amongthe instructions to allow more compact and efficientobject code.

The resulting KEP ISA is summarized in Table 2,which also illustrates the relationship between Esterelstatements and the KEP instructions. The KEP uses a36-bit wide instruction word and a 32-bit data bus. Thecorresponding instruction encoding is described else-where [11]. The KEP ISA has the following character-istics:

• All the kernel Esterel statements, and some fre-quently used derived statements, can be mapped toKEP instructions directly. For the remaining Esterel(V5) statements there exist dismantling rules thatallow the compiler to generate KEP code, includinggeneral signal expressions (see [12]).

• The control statements are fully orthogonal, their be-havior matches the Esterel semantics in all executioncontexts.

• Common Esterel expressions, in particular all of thedelay expressions (i. e., standard, immediate, andcount delays), can be represented directly. Valuedsignals and other signal expressions, e. g., the previ-ous value of a signal and the previous status of asignal, are also directly supported.

• All instructions fit into one instruction word and canbe executed in a single instruction cycle, except forinstructions that contain count delay expressions,which need an extra instruction word and takeanother instruction cycle to execute.

The KEP also handles schizophrenic programscorrectly—if an Esterel statement must be executed mul-tiple times within a tick, the KEP simply does so [8].

4.1 General Code Generation

An Esterel program is compiled into a KEP assem-bler program (.kasm) by the KEP Esterel compiler(strl2kasm) [12], which uses the front-end of the CECfor parsing and module expansion. In a second step,the KEP assembler compiler (kasm2ko) [28] compiles theassembler program into binary machine code. This in-cludes, for example, the mapping of input/output/localsignals to signal registers, and the selection of appro-priate Watchers (see Sec. 5.2). The compiler also detectswhether the targeted KEP version does not have enoughresources available, e. g., not enough watchers or signals.

Page 6: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 6

Mnemonic, Operands Esterel Syntax Notes

INPUT[V] S input S [:integer] Input declaration.OUTPUT[V] S output S [:integer] Output declaration.SETV S, #data|reg Set the initial value of S; similar to EMIT, but does not affect presence.

PAR Prio, startAddr [, ID] [ Fork and join, specifying the address range and the priority for eachforked thread; see also Sec. 5.1.Optionally, one can specify the ID of the created thread.

PARE endAddr p1 || . . . || pn

JOIN Prio ]PRIO Prio Set the priority of the current thread.

[L|T][W]ABORT [n,] Sexp, endAddr [weak] abort . . . when n Sexp If Sexp is present, strongly/weakly abort the block ranging up toendAddr. The prefix [L|T] denotes the type of watcher to use, see alsoSec. 5.2. L: Local Watcher; T: Thread Watcher; none: general Watcher.[L|T][W]ABORTI Sexp, endAddr

[weak] abort . . . whenimmediate Sexp

SUSPEND[I] Sexp, endAddrsuspend . . . when[immediate] Sexp If Sexp is present, suspend the block ranging up to endAddr.

EXIT endAddr, startAddr trap T in . . . exit T . . . end trap Exit from a trap of specified scope. Unlike GOTO, check for concurrentEXITs and terminate enclosing ||.

PAUSE pauseWait for a signal. AWAIT TICK is equivalent to PAUSE.AWAIT [n,] Sexp await [n] Sexp

AWAIT[I] Sexp await [immediate] Sexp

CAWAITS awaitWait for several signals in parallel. A compound statement, bracketedby CAWAITS and CAWAITE, with one CAWAIT[I] instruction per signal.CAWAIT[I] S, addr case [immediate] Sexp do

CAWAITE end

SIGNAL S signal S in . . . end Initialize a local signal S.EMIT S [, {#data|reg}] emit S [(val)] Emit (valued) signal S.SUSTAIN S [, {#data|reg}] sustain S [(val)] Sustain (valued) signal S.PRESENT S, elseAddr present S then . . . end Jump to elseAddr if S is absent.

NOTHING nothing Do nothing.HALT halt Halt the program.GOTO addr loop . . . end loop Jump to addr.

EMIT TICKLEN, #data|reg Set the tick length.LOAD COUNT, n Load data for count delays.LOAD UINT32REG, #data32 Load a 32-bit immediate data to an intermediate register.

CLRC/SETC Clear/set carry bit.LOAD reg, n reg := n Load/store register.{SR[C]|SL[C]|NOTR} reg Shift (with carry)/negate.{ADD[C]|SUB[C]|MUL} reg, n +, -, * Add, subtract (with carry), multiply. Division must be emulated.{ANDR|ORR|NOTR|XORR} reg, n and, or, not, xor Logical operations.CMP[S] reg, n

>, <, <=, >=, <>, = Compare (with sign), branch conditionally.JW cond, addr

TABLE 2Overview of the KEP Esterel-type instruction set architecture. Esterel kernel statements are shown in bold. A signalexpression Sexp can be one of the following: 1. S: signal status (present/absent); 2. PRE(S): previous status of signal;3. TICK: always present. A numeral n can be one of the following: 1. #data: immediate data; 2. reg: register contents;

3. ?S: value of a signal; 4. PRE(?S): previous value of a signal.

The KEP assembler code for the EXAMPLE is shown inFig. 1b. Similar to the Esterel module, a KEP programalways starts at the input/output definition. Lines 1 to3 define input signals S and I, output signal O, andlocal signals A and R. The following EMIT TICKLEN, #20instruction assigns the tick length of this program asfixed 20 instruction cycles, as determined by the WCRTanalysis [13]; see also Sec. 5.5.

For the program body, the generation of the KEPassembler for the EXAMPLE module is in general straight-forward. As mentioned before, most common Esterelstatements can almost literally be translated into corre-sponding KEP instructions, and there are direct equiv-

alence rules for the remaining statements. In EXAMPLE,two dismantling rules, i. e., the every translation rule andthe sustain translation rule are employed. Note that dueto the signal dependencies involving R, we cannot usethe KEP’s atomic SUSTAIN instruction [12].

4.2 ConcurrencyA concurrent Esterel statement with n concurrent threadsjoined by the ||-operator is translated into KEP assembleras follows. First, threads are forked by a sequence of nPAR instructions and one PARE instruction. Each PARinstruction creates one thread and assigns it a non-negative priority Prio and a start address startAddr. The

Page 7: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 7

end address of the thread is either given implicitly by thestart address specified in a subsequent PAR instruction,or, if there is no more thread to be created, it is specifiedas endAddr in a PARE instruction. A thread address rangeranges from the start address to the end address. Thecode block for the last thread is followed by a JOINinstruction, which waits for the termination of all forkedthreads and concludes the concurrent statement. Theexample in Fig. 1b illustrates this: lines 12–21 constituteThread 1, Thread 2 spans lines 23–27, and the remaininginstructions belong to the main thread, Thread 0.

The main thread always has priority 1, which is thelowest possible priority. The priority of the other threadsis assigned when the thread is created (with the afore-mentioned PAR instruction), and can be changed subse-quently by executing a priority setting instruction (PRIO).A thread keeps its priority across delay instructions; thatis, at the start of a tick it resumes execution with thepriority it had at the end of the previous tick.

When a concurrent statement terminates, through reg-ular termination of all concurrent threads or via anexception/abort, the priorities associated with the ter-minated threads also disappear, and the priority of themain thread is restored to the priority upon entering theconcurrent statement.

The priority assigned during the creation of a threadand by a particular PRIO instruction is fixed. However,due to the non-linear control flow, it is still possible that agiven statement may be executed with varying priorities.In principle, the architecture would therefore allow afully dynamic scheduling. However, we here assumethat the given Esterel program can be executed with astatically determined schedule, which requires that thereare no cyclic signal dependencies. This is a commonrestriction, imposed for example by the Columbia Es-terel Compiler (CEC) [4]; see Li et al. [8] for furtherelaborations on causality/constructiveness and static vs.dynamic scheduling, and Lukoschus et al. [29] for anapproach to expand cyclic, yet constructive programsinto equivalent acyclic programs.

4.3 Handling Signal Dependencies

The signal-based communication employed by Estereldemands that signals have a unique presence/absencestatus throughout a tick, which is also why this strictlysynchronous semantics is sometimes referred to as fixed-point semantics. This implies that a signal status shouldonly be tested/read (e. g., with PRESENT) once all poten-tial emitters/writers (e. g., EMIT) have executed. Assuringthis also allows a proper reaction to signal absence. Wealso say that there is a dependency between the statementsthat (may) emit some signal (the dependency sources) andthe statements where that signal is tested (the dependencysinks) [30].

In a concurrent setting, this implies that threads mustbe executed in an order that obeys these dependencies.Hence, a non-trivial aspect of code generation is the

Fig. 3. Overview of the KEP architecture.

assignment of thread ids and priorities, as these governthe thread scheduling (see also Sec. 5.1). To understandhow these priorities are assigned, we consider the threadscheduling constraints that must be obeyed to run theEXAMPLE module faithful to Esterel’s semantics. The twothreads enclosed in the every block can communicateback and forth via the R and A signals, within a logicaltick, which makes thread scheduling non-trivial.

First, let us consider the dependency involving R. Itis clear which instruction is the dependency source: theEMIT R15 instruction. It is also obvious that the PRESENTR25 instruction is the dependency sink. This allows us toformulate the first dependency present in the EXAMPLEmodule: whenever the EMIT R15 and the PRESENT R25instructions are executed within the same logical tick,the execution of the EMIT must precede the execution ofthe PRESENT.

As for the dependency involving A, its dependencysource is the EMIT A26 in Thread 2. However, it is lessobvious which is the dependency sink, which we havedefined above as the “statements where these signals aretested.” Thread 1 reacts to A when it has entered theweak abort block, in that case A triggers the abort. Hence,whenever we execute a statement in that block rangingfrom WABORTI13 to the label A220, we should also watchfor the presence of A. However, closer inspection yieldsthat as this is a weak abort, it suffices to check at the endof each logical tick whether the block is aborted, thatis, whenever we execute a delayed instruction. In thiscase, the only delayed instruction in the abort block isthe PAUSE18, which therefore constitutes the dependencysink for A.

In the EXAMPLE, the first dependency is met by start-ing Thread 1 with a higher priority (3) than Thread 2(priority 2). The second dependency is met by the PRIO116 and PRIO 317 instructions, which hand control fromThread 1 to Thread 2 and back.

Dependency analysis at the Esterel level is decidable(unlike at the KASM level), but a proper analysis thatcovers all aspects of the Esterel language is rather in-tricate. A detailed treatment of this is found in thedocumentation of the strl2kasm compiler [12].

Page 8: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 8

5 THE KEP PROCESSOR ARCHITECTURE

NATURALLY, a given application has specific require-ments regarding the computational resources. In

the KEP, these resources include for example threadmanagement and preemption capabilities. The KEP de-sign is freely configurable, and scalable to arbitrary de-grees of concurrency, preemption nestings, signal counts,etc. A KEP can thus be configured specifically for a givenapplication. However, just as one may use a classicalprocessor with some fixed, conservatively large memoryresources for a range of applications, one may also usea “typical” KEP configuration without detailed priorknowledge about the application. For example, all thebenchmarks used here have been done with a fixed con-figuration, except when assessing hardware scalability.

The architecture of the KEP, shown in Fig. 3, isinspired by the three layers that constitute a reactiveprogram [2], i. e., the interface layer, the reactive kernel, andthe data handling layer. An Interface Block handles inputreception and output production. The classical computa-tions are performed by the Data Handling Block, consistingof the Register File, ALU and related components. Theimplementation of Esterel’s reactive control statementsrelies on the cooperation of the KEP’s Decoder & Controller,Reactive Block and Thread Block, which together form theReactive Core (RC). The RC contains dedicated hardwareunits to handle concurrency, preemption, exceptions, anddelays. In the following, we will briefly discuss each ofthese in turn.

5.1 Handling ConcurrencyTo implement concurrency, the KEP employs a multi-threaded architecture. Threads are scheduled in an inter-leaved fashion according to their statuses and dynam-ically changing priorities. The context of each threadis very light weight, it mainly consists of a programcounter (PC), its priority, and two status flags (seebelow). All data are shared, consistent with Esterel’sbroadcast semantics. The scheduler is very light-weight.Scheduling and context switching do not cost extrainstruction cycles, only changing a thread’s priority costsan instruction. The priority-based execution scheme al-lows on the one hand to enforce an ordering amongthreads that obeys the constraints given by Esterel’ssemantics (see Sec. 4.3), but on the other hand avoidsunnecessary context switches. If a thread lowers its pri-ority during execution but still has the highest priority,it simply keeps executing.

The Thread Block is responsible for managing threads,as illustrated in Fig. 4a. The SyncChart formalism [31]used here2 consists of hierarchical finite state machines,bold state borders denote initial state at each hierarchylevel. Upon program start, the main thread is enabled

2. Diagrams synthesized by the SyncChart editor of Kiel IntegratedEnvironment for Layout Eclipse Rich Client (KIELER), available at http://www.informatik.uni-kiel.de/rtsys/kieler/

(a) Status of the whole program, as managed by the ThreadBlock

(b) Execution status of a single thread

Fig. 4. Execution model of the KEP.

(forked), and the program is considered running. Subse-quently, for each instruction cycle (instrClock), the ThreadBlock decides which thread ought to be scheduled forexecution in this instruction cycle. It schedules the threadwith the highest priority among all active threads; ifthere are multiple threads that have highest priority,the thread with the highest id is scheduled. If there arestill enabled threads, but none is active anymore (seebelow), the next tick is started. If no threads are enabledanymore, the whole program is terminated.

The execution status of a single thread is illustratedin Fig. 4b. Two flags are needed to describe the statusof a thread. One flag indicates whether the thread isdisabled or enabled. Initially, only the main thread (Thread0) is enabled. Other threads become enabled wheneverthey are forked, and they become disabled again whenthey are joined after finishing all statements in theirbody, or when the preemption control tries to set itsprogram counter to a value which is out of the thread ad-dress range. The other flag indicates whether the threadshould still be scheduled within the current logical tick(the thread is active) or not (inactive). Active threads areinitially preempted, and become executing when they arescheduled.

The control of a thread can never exceed its addressrange, and if a thread still tries to do so, it will be ter-minated immediately. This mechanism allows a simplesolution for handling arbitrary preemption and concur-rency nests. For example, if a thread nest is aborted, eachthread will try to jump to the end of the abortion block,which will be beyond its address range, and hence thethread will be terminated.

5.2 Handling PreemptionAccording to the Esterel semantics, a preemption (abor-tion or suspension) is enabled when control is in its body,

Page 9: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 9

and disabled when control is outside of its body. When apreemption is enabled, the corresponding trigger signal iswatched and the module can react to the presence of it(is active). Otherwise, the signal does not cause preemp-tion. We call this scheme Inside/Outside Preemption RangeWatching (IOPRW).

The RC provides a configurable number of Watcherunits, which detect whether a signal triggering a pre-emption is present and whether the program counter(PC) is in the corresponding preemption body [7]. Ifduring execution of the program the PC is within thewatched range and the trigger signal is present, theWatcher triggers the corresponding changes in the controlflow. Note that Esterel allows the arbitrary nesting ofpreemption blocks of different types, for example astrong abortion may be nested within another weakabortion, which may be nested in a suspension. TheReactive Block is responsible for coordinating the Watcherblocks in a way that reflects the Esterel semantics. EachWatcher in the Reactive Block is assigned an index number,which also defines its priority. A Watcher can be overrid-den by another Watcher with higher priority. Consideringthe preemption nest structure, it becomes clear that thehigher priority preemption has a wider address rangewhich covers all the lower priority ones. Therefore, theearlier preemption instruction in a preemption nest willbe assigned to the higher priority Watcher. Note thatwith this approach, it is not necessary to continuouslyexecute special Checkabort instructions [17] that check onthe status of each watcher, meaning that the programdoes not slow down when entering a (nested) abortionblock. The Watcher modules operate autonomously, thusalso offering a certain type of concurrency (with true par-allelism) beyond the concurrency operator (implementedby interleaved multi-threading).

The KEP Watchers are designed to permit arbitrarynesting of preemptions, and also the combination withthe concurrency operator. However, in practice this of-ten turns out to be more general than necessary, andhence wasteful of hardware resources. Therefore, theKEP includes several versions of the Watcher, with acorrespondingly refined ISA. The least powerful, but alsocheapest variant is the Thread Watcher, which belongsto a thread directly, and can neither include concur-rent threads nor other preemptions. An intermediatevariant is the Local Watcher, which may include concur-rent threads and also preemptions handled by a ThreadWatcher, but cannot include another Local Watcher. Thedefault Watcher is the most general, which can handleall execution contexts.

5.3 Handling Exceptions

The KEP does not need an explicit equivalent to thetrap statement, but it provides an EXIT statement thatspecifies a trap scope. If a thread executes an EXITinstruction, it tries to perform a jump to the end of thetrap scope. If that address is beyond the range of the

1 module Trap1:2 output O;3 % −−−−−−− Thread 04 trap T1 in [5 % −−−−−−− Thread 16 nothing;7 ||8 % −−−−−−− Thread 29 trap T2 in [

10 % −−−−−−− Thread 311 exit T2;12 ||13 % −−−−−−− Thread 414 exit T1;15 % −−−−−−− Thread 216 ] end trap;17 emit O18 % −−−−−−− Thread 019 ] end trap;20 halt ;21 end module

(a) Trap1: Thread 2does not emit O, asexception T1 is thrownwithin Thread 2

1 module Trap2:2 output O;3 % −−−−−−− Thread 04 trap T1 in [5 % −−−−−−− Thread 16 exit T1;7 ||8 % −−−−−−− Thread 29 trap T2 in [

10 % −−−−−−− Thread 311 exit T2;12 ||13 % −−−−−−− Thread 414 nothing;15 % −−−−−−− Thread 216 ] end trap;17 emit O18 % −−−−−−− Thread 019 ] end trap;20 halt ;21 end module

(b) Trap2: Thread 2emits O, as exceptionT1 is thrown by con-current Thread 1

Fig. 5. Interaction of concurrency and exception handling.

current thread, control is not transferred directly to theend of the trap scope, but instead to the JOIN instructionat the end of the current thread. If other threads thatmerge at this JOIN are still active, they will still beallowed to execute within the current logical tick. This isdictated by the Esterel semantics, which specifies themas a variant of weak aborts.

It is possible that some concurrent threads were exe-cuted before executing the EXIT in the current tick andhave become inactive. Hence, they cannot directly re-spond to the exception because they will not be awokenin that tick. Such a situation will be detected at the joinpoint. If there is an active exception, all branch threadsthat wait at this join point will be set to disabled.

As for trap nests, the question is how to determinewhich one ought to take priority. The Esterel semanticsspecifies that outer traps take priority over inner ones.Hence, a simple idea is that the outer trap, which hasthe larger address range, will override the inner one.Unfortunately, this strategy is too simplistic to satisfy allcases. Compare the Trap1 and Trap2 examples in Fig. 5. Asillustrated there, one must not only consider trap nestingstructures, but also the concurrency relationships amongthe threads involved.

To handle this issue, a thread also keeps track ofits parent thread. If the PC resides inside of the scopeof a trap, the corresponding exception will be active.When several exceptions are all active, the KEP willcompare their parent thread to determine whether theybelong to a group of concurrent threads. If they havethe same parent thread, the outer trap will cancel theinner one, as in the Trap1 example. Otherwise the Esterel

Page 10: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 10

semantics requires to respond to the inner one first, asin Trap2. Furthermore, an inheritance strategy is usedwhen a trap crosses several threads. At the join point,if a thread finds there is an active exception which isemitted by its child thread, it will inherit this exceptionby adopting the parent thread of this exception as itsparent thread, i. e., it will propagate this exception asbeing emitted by itself. Once all joining threads havecompleted within the current tick, control is transferredto the end of the trap scope—unless there is anotherintermediate JOIN instruction. This process continuesuntil control has reached the thread that has declaredthe trap.

5.4 Handling DelaysDelay expressions are used in temporal statements, e. g.,await or abort. There are three forms of delay expressions:standard delays, immediate delays, and count delays. Adelay may elapse in some later tick, or in the same tickin the case of immediate delays. In the KEP, the awaitstatement is implemented via the AWAIT component.Every thread has its own await-component to store theparameters of the await-type statement, e. g., the valueof count delays. For the preemption statements, everyWatcher (including its trimmed-down derivatives) alsohas an independent counter to handle the delays.

5.5 The Tick Manager and Energy SavingOne of the distinguishing features of the KEP is theTick Manager. It can autonomously ensure that logicalticks, which for a deployed KEP correspond to one readinputs/compute outputs reaction cycle, are computed ata fixed rate, given a fixed clock frequency. Furthermore,the Tick Manager internally monitors timing violations.The strl2kasm compiler performs a conservative WCRTanalysis to determine the tick frequency, and timingviolations should never occur [13]. Hence, when usingthis compiler, the timing violation monitoring can beconsidered a redundant self-checking run-time mecha-nism to enhance robustness.

The Tick Manager is activated by setting the pre-definedvalued signal TICKLEN to a certain value. This is typi-cally at the beginning of the program, but may also bedone later at run time. The activation of the Tick Manageris optional; if TICKLEN is not set to any value, the KEPis in “free running” mode and starts computing the nexttick as soon as the current one is finished. This willtypically be faster on average; however, it will result ina jitter of the reaction time, which is often undesirablefrom a control perspective.

As the KEP instruction cycles require a fixed numberof clock cycles, providing a value for TICKLEN can alsoalleviate the need for the environment to provide a timerthat starts the ticks in regular intervals. An external timeris only needed if the clock rate is not stable enoughfor the application, e. g., due to energy-saving frequencyscaling.

1 INPUT D;2 OUTPUT3 A,B,C;4 EMIT5 TICKLEN, #3;6 EMIT A;7 EMIT B;8 PAUSE;9 EMIT A;

10 EMIT B;11 EMIT C;12 AWAIT D;

(a) KEPAssembler

(b) Timing diagram, for input D absent;TickWarn indicates a timing violation inthe second tick, fourth instruction

Fig. 6. The Tick Manager detects a timing overrun.

If a tick is finished in less than TICKLEN instructioncycles, the KEP idles for the remaining cycles beforestarting the next tick. If, on the other hand, a tick isnot finished within TICKLEN cycles, this is considereda tick length timing violation. Such timing violations aresignaled to the environment via a special signal, Tick-Warn, with a dedicated output pin; this signal remainspresent until the next reset of the processor. Furthermore,the self-monitoring makes it easy for the environmentto detect any timing violations. The WCRT analysisaims to determine a conservative, yet tight value for

TICKLEN. How a given value for TICKLEN translates toconcrete bounds on physical reaction times also dependson the interface with the environment, as describedelsewhere [10].

Fig. 6 illustrates the KEP timing behavior for a smallexample. In OVERRUN, the first EMIT statement sets

TICKLEN to three; in other words, the module claimsthat at most three KEP instructions suffice to computeone logical tick. For the input scenario of signal D alwaysabsent, the KEP produces the timing shown in Fig. 6b.In this example, the program is running on a KEP imple-mented on a Memec V2MB1000 Development Board at arate of Tosc = 41.67ns, the waveform was recorded by anAgilent 1683A Logic Analyzer. In the example, the firstlogical tick lasts three instruction cycles. In the secondtick, the controller has to execute five instructions untilthe AWAIT statement is executed. Hence, the Tick Managerwill set the TickWarn processor pin high when the fourthinstruction cycle is executed to indicate the tick lengthtiming violation.

For controller programming, the main goal of Esterel,the control signals tend to be more often absent thanpresent [14]. The condition of all signals being absentis called a blank event. Even though Esterel does allowto specify reactions for signal absence, typically veryfew instruction cycles are required for executing a blankevent (see also Sec. 6.5). To make the KEP benefit fromthis when less than TICKLEN instructions have beenexecuted and there are no instructions needed for thecurrent tick, i. e., all threads are inactive, an IDLE signal

Page 11: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 11

will be broadcast to gate the clock of other elements forpower reduction [32].

6 VALIDATION AND EXPERIMENTAL RESULTS

To validate the correctness of the KEP and its compilerand to evaluate its performance, we employ an evalua-tion platform whose structure is shown in Fig. 7. Theuser interacts via a host work station with an FPGAboard, which contains the KEP as well as some testinginfrastructure. First, an Esterel program is compiled intoan KEP object file (.ko) which is uploaded to the FPGAboard. Then, the host provides input events to the KEPand reads out the generated output events. This alsoyields the number of instructions per tick, from whichwe can deduce the WCRT for the given trace. Thismeasures performance, and allows to validate strl2kasm’sWCRT analysis with respect to its safety (never un-derestimates) and accuracy (as little overestimates aspossible). The input events can be either provided bythe user interactively, or they can be supplied via a .esifile. The host can also compare the output results to anexecution trace (.eso). We use EsterelStudio V5.0 to com-pute input/output trace files with state and transitioncoverage, except for the eight but benchmark, for whichthe generation of the transition coverage trace tookunacceptably long and we restricted ourselves to statecoverage. This comparison to a reference implementationproved a very valuable aid in validating the correctnessof both the KEP and its compiler. The regression testsuite currently includes well over 400 Esterel programs.

FPGA BoardUserstrl2kasm

.log

.eso

.kokasm2ko

.strl

Output

.esi InputTickGen

ProtocolGen

Host.kasm

EStudio

Tes

t Dri

ver

KEP Assembler

ProcessorKiel Esterel

Environment

Fig. 7. The KEP evaluation platform.

6.1 Concurrency AnalysisTo evaluate the performance of the KEP, we selectedeleven standard test cases, from the Estbench3 suite andother sources [5], [33]. These benchmarks are typicalEsterel applications, which not only contain reactivestatements, but also include arithmetic and logical datahandling. However, we leave out programs that makeuse of the pre operator, since the CEC currently does notsupport it [4].

To characterize each benchmark with respect to its useof concurrency constructs, Table 3 lists the counts and

3. http://www1.cs.columbia.edu/∼sedwards/software/estbench-1.0.tar.gz

Esterel source KEPModule LOC Threads Code CKAGName Count Max Max size Dep. Max PRIO

depth conc. (words) count priority instr’sabcd 160 4 2 4 164 36 3 30

abcdef 236 6 2 6 244 90 3 48eight but 312 8 2 8 324 168 3 66chan prot 42 5 3 4 62 4 2 10

reactor ctrl 27 3 2 3 34 5 1 0runner 31 2 2 2 27 0 1 0

example 20 2 2 2 28 2 3 6ww button 76 13 3 4 95 0 1 0greycounter 143 17 3 13 343 53 6 58

tcint 355 39 5 17 379 65 3 20mca200 3090 59 5 49 8650 129 11 190

TABLE 3Concurrency analysis of benchmarks.

Sheet1

Page 1

KEP (Peak) KEP (Blank)69 13 874 13 7

eight_but 74 13 770 28 12

reactor_ctrl 76 20 13runner 78 14 2example 77 25 9

81 13 478 44 33

Frequency [MHz] HW [Slices]Unrefined Refined Unrefined Refined

88,04 105,8 376 33284,33 103,7 506 370

eight_but 83,06 100,87 eight_but 623 418111,73 111,73 254 270

reactor_ctrl 107,68 111,37 reactor_ctrl 276 259runner 99,44 97,44 runner 354 320example 111,73 109,49 example 249 283

82,71 100,87 629 38387,64 106,17 470 41290,49 99,89 433 422

mca200 66,49 90,2 mca200 1253 798

MicroBlazeabcdabcdef

chan_prot

ww_buttongreycounter

abcd abcdabcdef abcdef

chan_prot chan_prot

ww_button ww_buttongreycounter greycountertcint tcint

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

0 10 20 30 40 50 60 70 80 90

MicroBlaze

KEP (Peak)

KEP (Blank)

Energy Consumption [mW]

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

tcint

mca200

0 20 40 60 80 100 120

Unrefined

Refined

Frequency [MHz]

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

tcint

mca200

0 200 400 600 800 1000 1200 1400

Unrefined

Refined

HW [Slices]

Sheet1

Page 1

KEP (Peak) KEP (Blank)69 13 874 13 7

eight_but 74 13 770 28 12

reactor_ctrl 76 20 13runner 78 14 2example 77 25 9

81 13 478 44 33

Frequency [MHz] HW [Slices]Unrefined Refined Unrefined Refined

88,04 105,8 376 33284,33 103,7 506 370

eight_but 83,06 100,87 eight_but 623 418111,73 111,73 254 270

reactor_ctrl 107,68 111,37 reactor_ctrl 276 259runner 99,44 97,44 runner 354 320example 111,73 109,49 example 249 283

82,71 100,87 629 38387,64 106,17 470 41290,49 99,89 433 422

mca200 66,49 90,2 mca200 1253 798

MicroBlazeabcdabcdef

chan_prot

ww_buttongreycounter

abcd abcdabcdef abcdef

chan_prot chan_prot

ww_button ww_buttongreycounter greycountertcint tcint

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

0 10 20 30 40 50 60 70 80 90

MicroBlaze

KEP (Peak)

KEP (Blank)

Energy Consumption [mW]

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

tcint

mca200

0 20 40 60 80 100 120

Unrefined

Refined

Frequency [MHz]

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

tcint

mca200

0 200 400 600 800 1000 1200 1400

Unrefined

Refined

HW [Slices]

Fig. 8. Clock frequencies and hardware costs with andwithout ISA refinement.

depths of them. For the KEP, the table shows the numberof dependencies found, the used number of prioritylevels (the KEP provides up to 255), and the numberof used PRIO instructions. In most cases, the maximumpriority used is three or less, indicating relatively fewpriority changes per tick. For example, eight buttons has168 dependencies, but the maximum priority used is 3.On the other hand, greycounter, with 53 dependencies,requires a maximum priority of 6.

6.2 Preemption Analysis and Watcher ComparisonTypically, the preemption constructs tend to be sequen-tial or concurrent rather than being nested. For example,the mca200 employs 64 preemption statements, however,the maximum depth of the preemption nest is just 4. Asit turns out, most of the preemptions can be handled bythe cheapest Watcher type, the Thread Watcher.

To assess the savings of watcher refinement, we syn-thesized different variants of the Reactive Core for eachbenchmark, with and without watcher refinement, seeFig. 8. In most cases, having refined watcher typesavailable uses less resources and allows to increase thefrequency, and its benefits increase with the scale ofthe modules. For the industry size mca200 benchmark,refined watchers reduce hardware usage by 36%, andraise the maximum frequency also by 36%. Anotherbenefit of the refined preemption handling architecture

Page 12: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 12

Esterel MicroBlaze KEPModule LOC Code+Data (b) Code (w) Code+Data (b)Name V5 V7 CEC abs. rel. abs. rel.

[1] [2] (best) [3] [3]/[1] [4] [4]/[2]abcd 160 6680 7928 7212 164 1.03 738 0.11

abcdef 236 9352 9624 9220 244 1.03 1098 0.12eight but 312 12016 11276 11948 324 1.04 1458 0.13chan prot 42 3808 6204 3364 62 1.48 279 0.08

reactor ctrl 27 2668 5504 2460 34 1.26 153 0.06runner 31 3140 5940 2824 27 0.87 121 0.04

example 20 2480 5196 2344 28 1.4 126 0.05ww button 76 6112 7384 5980 95 1.25 427 0.07greycounter 143 7612 7936 8688 343 2.4 1549 0.2

tcint 355 14860 11376 15340 379 1.07 1707 0.15mca200 3090 104536 77112 52998 8650 2.8 39717 0.75

TABLE 4Memory usage comparison between KEP and

MicroBlaze implementations. “(b)” refers tomeasurements in bytes, “(w)” to words.

is that it keeps the performance stable. If there are norefined watcher types, there is a 40% gap between thehighest (112MHz) and the lowest (66MHz) frequency.With refined watchers available, the frequency only de-grades by about 19% (from 112MHz to 90MHz).

6.3 Memory UsageTable 4 compares executable code size and RAM usagebetween the KEP and the MicroBlaze implementations.For the MicroBlaze, we used three different Esterel com-pilers (V5, V7, and CEC), and compared ourselves to thebest of these. To assess the size of the KEP code relatedto the Esterel source, we compare the code size in wordswith the Esterel Lines of Code (LOC, before dismantling,without comments). We notice that the KEP code is verycompact, with a word count close to the Esterel source.For comparison with the MicroBlaze, we compare thesize of Code + Data, in bytes, and notice that the KEPcode is typically an order of magnitude smaller thanthe MicroBlaze code. Furthermore, the more compactstate encoding reduces the data memory requirements.The KEP implementation results on average in an 83%reduction of memory usage (codes and RAM size) whencompared with the best result of the MicroBlaze imple-mentation. As for the mca200, the memory reduction ofthe KEP implementation is not so dramatic as that ofother cases. The reason is that the mca200 contains lotsof data handling—which is not a very strong point ofthe KEP.

6.4 Execution TimesThe improvement in execution time of the KEP im-plementation is shown in Fig. 9. Compared with thebest result of the MicroBlaze implementations, the KEPtypically obtains more than 4x speedup for the WCRT,and more than 5x for the Average Case Reation Time

ab

cd

ab

cdef

eig

ht_

bu

t

chan

_pro

t

react

or_

ctrl

run

ner

exam

ple

ww

_bu

tton

gre

yco

un

ter

tcin

t

mca

20

0

1

10

100

1000

10000

100000

WCRT-V5

ACRT-V5

WCRT-V7

ACRT-V7

WCRT-CEC

ACRT-CEC

WCRT-KEP

ACRT-KEP

React

ion T

imes

(Clo

ck C

ycl

es)

Fig. 9. The worst-/average-case reaction times (in clockcycles) for the KEP and MicroBlaze implementations.Note the logarithmic scale.

Sheet1

Page 1

KEP (Peak) KEP (Blank)69 13 874 13 7

eight_but 74 13 770 28 12

reactor_ctrl 76 20 13runner 78 14 2example 77 25 9

81 13 478 44 33

Frequency [MHz] HW [Slices]Unrefined Refined Unrefined Refined

88,04 105,8 376 33284,33 103,7 506 370

eight_but 83,06 100,87 eight_but 623 418111,73 111,73 254 270

reactor_ctrl 107,68 111,37 reactor_ctrl 276 259runner 99,44 97,44 runner 354 320example 111,73 109,49 example 249 283

82,71 100,87 629 38387,64 106,17 470 41290,49 99,89 433 422

mca200 66,49 90,2 mca200 1253 798

MicroBlazeabcdabcdef

chan_prot

ww_buttongreycounter

abcd abcdabcdef abcdef

chan_prot chan_prot

ww_button ww_buttongreycounter greycountertcint tcint

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

0 10 20 30 40 50 60 70 80 90

MicroBlaze

KEP (Peak)

KEP (Blank)

Energy Consumption [mW]

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

tcint

mca200

0 20 40 60 80 100 120

Unrefined

Refined

Frequency [MHz]

abcd

abcdef

eight_but

chan_prot

reactor_ctrl

runner

example

ww_button

greycounter

tcint

mca200

0 200 400 600 800 1000 1200 1400

Unrefined

Refined

HW [Slices]

Fig. 10. Energy consumption of KEP and MicroBlaze.

(ACRT). For a fair comparison, the time is measuredbased on the system clock. If the comparison is based onthe instruction cycles, the KEP will achieve 12x speedupfor the WCRT and more than 15x for the ACRT.

The MicroBlaze uses several levels of memory. Herewe employed an FPGA chip with a large on-chip mem-ory to implement the MicroBlaze system. Hence, allof the MicroBlaze programs could be loaded into theon-chip memory to make sure that the memory accesstime is minimal. The MicroBlaze implementation bene-fits from this, because if the implementation is based onan FPGA which has smaller scale on-chip memory, theKEP program is still likely to fit into the on-chip memory.

6.5 Power UsageTo compare the energy consumptions, we choose the Xil-inx 3S200-4ft256 as FPGA platform. This requires 37mWas quiescent power for the chip itself. The MicroBlaze isassumed to run at 50MHz, and the peak power of the Mi-croBlaze is calculated by the frequency and the hardwareresources of the MicroBlaze system via Xilinx WebPowerVersion 8.1.01. Based on the findings presented in Fig. 9,we calculate the minimal clock frequencies of the KEPto achieve the same WCRT of corresponding MicroBlazefor each benchmark, then calculate the peak power ofthe KEP implementation.

For most blank events, the action of an Esterel moduleis very simple—it tests the presence of awaited sig-nals, and then finishes this tick because those awaitstatements are not terminated. For the KEP, since the

Page 13: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 13

elapsed instruction cycle count for those actions is farfrom the assigned tick length, the system will turn tothe idle state for saving power. Although the MicroBlazehas no low-power operating mode that can be used toconserve processor energy (e. g., like the wait-state of thePowerPC405), we still assume it can use some additionalcircuit to manage its power usage by blocking its clockto satisfy the fixed tick length feature. Note that the realtick length for a blank event depends on the state ofthe program of the previous tick. The average powerusage of blank events is also estimated by an extendedesi file, which inserts a blank event between every twooriginal ticks. Fig. 10 shows that the KEP reduces energyusage on average by 75%. The reduction becomes evenmore significant for blank events. Energy consumptionsof the MicroBlaze system are similar for different events.However, the reactive architecture makes the powerusage of the KEP 52% lower than its peak power. Hence,in this case, the KEP achieves 86% power savings.

7 CONCLUSIONS & OUTLOOK

W E have presented the KEP, a multi-threaded pro-cessor, which allows the efficient, predictable

execution of concurrent Esterel programs. The multi-threaded approach poses specific compilation chal-lenges, in particular in terms of scheduling, and we havepresented an analysis of the task at hand as well as animplemented solution. As the experimental comparisonwith a 32-bit commercial RISC processor indicates, theapproach presented here has advantages in terms ofmemory use, execution speed, and energy consumption.An Esterel description of the KEP is available as opensource4.

To accurately capture the Esterel semantics in a re-active processing setting is not trivial, as has becomeevident from earlier (failed) attempts. So far, we arerelying on extensive experimental validation to ensurecorrectness, as described in Sec. 6; a formal treatment ofthe reactive processing approach is underway [34].

It would be interesting to implement a virtual machinethat has an instruction set similar to the KEP; see also therecent proposal by Plummer et al. [35]. Furthermore, theunderlying model of computation, with threads keepingtheir individual program counters and a priority basedscheduling, can also be emulated by classical processors.In fact, the recent SyncCharts in C (SC) proposal [27]defines a set of operators for reactive control flow thatfollow the KEP’s execution model quite closely. Themain differences are that SC does not support traps (asimplification of SyncCharts compared to Esterel) andthat the thread id/priority mechanism is reduced toonly ids, which serve as priorities as well. Interestingly,the current reference implementation for SC5 can takeadvantage of machine instructions for arithmetic opera-tions that are typically not available at the program level.

4. http://www.informatik.uni-kiel.de/rtsys/kep/5. http://www.informatik.uni-kiel.de/rtsys/sc/

Specifically, on the x86, the scheduler uses a bit vectorto represent active threads and the bsr (Bit Scan Reverse)assembler instruction to determine the active thread withthe highest id.

The SC operators are directly embedded in regular C,no prior compilation or OS support is necessary; hence,in a way, SC uses the ISA of a traditional processor ina reactive processing manner. At least in terms of codecompactness, this approach could be competitive withthe custom reactive processing approach; to what extentit can match reactive processing in terms of performance,power consumption and predictability is still open forinvestigation.

ACKNOWLEDGMENTS

This work has benefited from discussions with manypeople, in particular Michael Mendler and Stephen Ed-wards. We would also like to thank Marian Boldt fordeveloping the strl2kasm compiler, Ozgun Bayramogluand Hauke Fuhrmann for the KIELER visualization, andthe reviewers for providing very valuable feedback onthis manuscript.

REFERENCES

[1] G. Berry, “Esterel on Hardware,” Philosophical Transactions of theRoyal Society of London, vol. 339, pp. 87–104, 1992.

[2] G. Berry and G. Gonthier, “The Esterel synchronous program-ming language: Design, semantics, implementation,” Science ofComputer Programming, vol. 19, no. 2, pp. 87–152, 1992.

[3] S. A. Edwards, “High-Level Synthesis from the SynchronousLanguage Esterel,” Proceedings of the International Workshop of Logicand Synthesis (IWLS), Jun. 2002.

[4] S. A. Edwards and J. Zeng, “Code generation in the ColumbiaEsterel Compiler,” EURASIP Journal on Embedded Systems, vol.Article ID 52651, 31 pages, 2007.

[5] F. Balarin, P. Giusto, A. Jurecska, C. Passerone, E. M. Sentovich,B. Tabbara, M. Chiodo, H. Hsieh, L. Lavagno, A. Sangiovanni-Vincentelli, and K. Suzuki, Hardware-Software Co-Design of Embed-ded Systems, The POLIS Approach. Kluwer Academic Publishers,Apr. 1997.

[6] Z. A. Salcic, P. S. Roop, M. Biglari-Abhari, and A. Bigdeli, “RE-FLIX: A processor core for reactive embedded applications,” inProceedings of the 12th International Conference on Filed ProgrammableLogic and Applications (FPL-02), ser. Lecture Notes in ComputerScience, M. Glesner, P. Zipf, and M. Renovell, Eds., vol. 2438.Montpellier, France: Springer, Sep. 2002, pp. 945–945.

[7] X. Li and R. von Hanxleden, “A concurrent reactive Esterelprocessor based on multi-threading,” in Proceedings of the 21stACM Symposium on Applied Computing (SAC’06), Special TrackEmbedded Systems: Applications, Solutions, and Techniques, Dijon,France, April 23–27 2006.

[8] X. Li, M. Boldt, and R. von Hanxleden, “Mapping Esterel ontoa multi-threaded embedded processor,” in Proceedings of the 12thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS’06), San Jose, CA, Oc-tober 21–25 2006.

[9] S. Yuan, S. Andalam, L. H. Yoong, P. S. Roop, and Z. Salcic,“STARPro—a new multithreaded direct execution platform forEsterel,” in Proceedings of Model Driven High-Level Programming ofEmbedded Systems (SLA++P’08), Budapest, Hungary, Apr. 2008.

[10] X. Li, J. Lukoschus, M. Boldt, M. Harder, and R. von Hanxleden,“An Esterel Processor with Full Preemption Support and its WorstCase Reaction Time Analysis,” in Proceedings of the InternationalConference on Compilers, Architecture, and Synthesis for EmbeddedSystems (CASES’05). San Francisco, CA, USA: ACM Press, Sep.2005, pp. 225–236.

Page 14: IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, …rtsys.informatik.uni-kiel.de/~biblio/downloads/papers/tc07.pdf · Multi-Threaded Reactive Programming— The Kiel Esterel Processor

IEEE TRANSACTIONS ON COMPUTERS, VOL. XXX, NO. XXX, MONTH YEAR 14

[11] X. Li, “The Kiel Esterel Processor: A multi-threaded reactiveprocessor,” Ph.D. dissertation, Christian-Albrechts-Universitat zuKiel, Faculty of Engineering, Jul. 2007, http://eldiss.uni-kiel.de/macau/receive/dissertation diss 00002198.

[12] M. Boldt, “A compiler for the Kiel Esterel Processor,” Diplomathesis, Christian-Albrechts-Universitat zu Kiel, Department ofComputer Science, Dec. 2007, http://rtsys.informatik.uni-kiel.de/∼biblio/downloads/theses/mabo-dt.pdf.

[13] M. Boldt, C. Traulsen, and R. von Hanxleden, “Worst case reactiontime analysis of concurrent reactive programs,” Electronic Notes inTheoretical Computer Science, vol. 203, no. 4, pp. 65–79, Jun. 2008,proceedings of the International Workshop on Model-DrivenHigh-Level Programming of Embedded Systems (SLA++P’07),March 2007, Braga, Portugal.

[14] G. Berry, The Esterel v5 Language Primer, Version v5 91, Centrede Mathematiques Appliquees Ecole des Mines and INRIA,06565 Sophia-Antipolis, 2000, ftp://ftp-sop.inria.fr/esterel/pub/papers/primer.pdf.

[15] D. Potop-Butucaru, S. A. Edwards, and G. Berry, Compiling Esterel.Springer, May 2007.

[16] R. von Hanxleden, X. Li, P. Roop, Z. Salcic, and L. H. Yoong,“Reactive processing for reactive systems,” ERCIM News, vol. 67,pp. 28–29, Oct. 2006.

[17] P. S. Roop, Z. Salcic, and M. W. S. Dayaratne, “Towards DirectExecution of Esterel Programs on Reactive Processors,” in 4thACM International Conference on Embedded Software (EMSOFT’04),Pisa, Italy, Sep. 2004.

[18] X. Li and R. von Hanxleden, “The Kiel Esterel Processor - a semi-custom, configurable reactive processor,” in Synchronous Program-ming - SYNCHRON’04, ser. Dagstuhl Seminar Proceedings, S. A.Edwards, N. Halbwachs, R. v. Hanxleden, and T. Stauner, Eds.,no. 04491. Internationales Begegnungs- und Forschungszentrum(IBFI), Schloss Dagstuhl, Germany, 2005, http://drops.dagstuhl.de/opus/volltexte/2005/159.

[19] Z. A. Salcic, D. Hui, P. S. Roop, and M. Biglari-Abhari, “REMIC:Design of a reactive embedded microprocessor core.” in Proceed-ings of the 10th Asia and South Pacific Design Automation Conference(ASP-DAC), Shanghai, China, 2005, pp. 977–981.

[20] M. W. S. Dayaratne, P. S. Roop, and Z. Salcic, “Direct Executionof Esterel Using Reactive Microprocessors,” in Proceedings of Syn-chronous Languages, Applications, and Programming (SLAP’05), Apr.2005.

[21] L. H. Yoong, P. Roop, and Z. Salcic, “Compiling Esterel fordistributed execution,” in Proceedings of Synchronous Languages,Applications, and Programming (SLAP’06), Vienna, Austria, Apr.2006.

[22] S. Gadtke, C. Traulsen, and R. von Hanxleden, “HW/SW Co-Design for Esterel Processing,” in Proceedings of the InternationalConference on Hardware-Software Codesign and System Synthesis(CODES+ISSS’07), Salzburg, Austria, Sep. 2007.

[23] C. Traulsen and R. von Hanxleden, “Reactive parallel processingfor synchronous dataflow,” in Proceedings of the 25th SymposiumOn Applied Computing (SAC’10), Special Track Embedded Systems:Applications, Solutions, and Techniques, Sierre, Switzerland, Mar.2010.

[24] S. A. Edwards and E. A. Lee, “The case for the Precision Timed(PRET) machine,” in Proceedings of the 44th Design AutomationConference, San Diego, CA, USA, Jun. 2007, pp. 264–265.

[25] P. S. Roop, S. Andalam, R. von Hanxleden, S. Yuan, andC. Traulsen, “Tight WCRT analysis for synchronous C programs,”in Proceedings of the International Conference on Compilers, Archi-tecture, and Synthesis for Embedded Systems (CASES’09), Grenoble,France, Oct. 2009.

[26] M. Mendler and M. Pouzet, “Uniform and modular compositionof data-flow & control-flow in the lazy λ-calculus,” Presentation atthe International Open Workshop on Synchronous Programming(SYNCHRON’08), Aussois, France, Dec. 2008.

[27] R. von Hanxleden, “SyncCharts in C—A Proposal for Light-Weight, Deterministic Concurrency,” in Proceedings of the Inter-national Conference on Embedded Software (EMSOFT’09), Grenoble,France, Oct. 2009.

[28] X. Li and R. von Hanxleden, “KEP2 (Kiel Esterel Processor2): The Esterel Processor,” Christian-Albrechts-Universitat Kiel,Department of Computer Science, Technical Report 0506, Apr.2005.

[29] J. Lukoschus and R. von Hanxleden, “Removing cycles in Esterelprograms,” EURASIP Journal on Embedded Systems, Special Issue on

Synchronous Paradigms in Embedded Systems, 2007, http://www.hindawi.com/getarticle.aspx?doi=10.1155/2007/48979.

[30] J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The programdependence graph and its use in optimization,” ACM Transactionson Programming Languages and Systems, vol. 9, no. 3, pp. 319–349,1987, http://doi.acm.org/10.1145/24039.24041.

[31] C. Andre, “SyncCharts: A visual representation of reactive be-haviors,” I3S, Sophia-Antipolis, France, Tech. Rep. RR 95–52,rev. RR 96–56, Rev. April 1996, http://www.i3s.unice.fr/∼andre/CAPublis/SYNCCHARTS/SyncCharts.pdf.

[32] L. Benini, A. Bogliolo, and G. D. Micheli, “A survey of designtechniques for system-level dynamic power management,” inReadings in hardware/software co-design. Norwell, MA, USA:Kluwer Academic Publishers, 2002, pp. 231–248.

[33] G. Berry, The Esterel v5 Language Primer, 1999, ftp://ftp-sop.inria.fr/meije/esterel/papers/primer.ps.

[34] C. Traulsen, “The Kiel Reactive Processor (KReP),” Ph.D. disser-tation, Christian-Albrechts-Universitat zu Kiel, Faculty of Engi-neering, in preparation.

[35] B. Plummer, M. Khajanchi, and S. A. Edwards, “An Esterel virtualmachine for embedded systems,” in International Workshop onSynchronous Languages, Applications, and Programming (SLAP’06),Vienna, Austria, Mar. 2006.

Xin Li has obtained his B. S. in MeasurementTechnology and Instrument Design in 1996 andhis M. Sc. in Mechanical and Electronic Engi-neering in 2000, both from Wuhan University ofTechnology, P. R. China. Until 2003 he was lec-turer in the School of Mechanical and ElectronicEngineering at Wuhan. He completed his Ph. D.at the Dept. of Computer Science of Christian-Albrechts-Universita Kiel, in 2007. He now iswith the Laboratory for Advanced Research inComputing Technology and Compilers in the

Dept. of Electrical and Computer Engineering at the University ofMinnesota. His current research interests focus on embedded systemdesign and reactive processing. He holds two patents.

Reinhard von Hanxleden studied ComputerScience and Physics at the Christian-Albrechts-Universitat (CAU) Kiel from 1985 to 1988 andcompleted his M. Sc. in CS at The Pennsylva-nia State University in 1989. He performed hisdoctoral studies at the CS Dept./Center for Re-search on Parallel Computation at Rice Univer-sity in the field of data-parallel compilation until1994, with the Ph. D. conferred in 1995. He sub-sequently joined Daimler Chrysler research, un-til 2000 with the Responsive Systems in Berlin,

afterwards with Airbus in Hamburg and Toulouse. He joined the CAU CSfaculty in 2001 as head of the real-time/embedded systems group. Hisinterests include model-based design, concurrency and synchronousprogramming. He is currently involved in the IEEE standardization ofthe Esterel language and is a member of ACM, IEEE, and GI.