cross-abstraction functional veriﬁcation and performance ...gabe/papers/mpda_ieee_tii_2009.pdf ·...

1

Cross-abstraction Functional Verification andPerformance Analysis of

Chip Multiprocessor DesignsGabor Madl, Student Member, IEEE, Sudeep Pasricha, Member, IEEE,

Nikil Dutt, Fellow, IEEE, and Sherif Abdelwahed, Senior Member, IEEE

Abstract—This paper introduces the Cross-abstractionReal-time Analysis (CARTA) framework for the model-based functional verification and performance estimationof chip multiprocessors (CMP) utilizing bus matrix (cross-bar switch) interconnection networks. We argue that theinherent complexity in CMP designs requires the syner-gistic use of various models of computation to efficientlymanage the trade-offs between accuracy and complexity.Our approach builds on domain-specific modeling lan-guages (DSMLs) driving an open-source tool-chain thatprovides a cross-abstraction bridge between the finite statemachine (FSM), discrete event (DE) and timed automata(TA) models of computation, and utilizes multiple modelcheckers to analyze formal properties at the cycle-accurateand transaction-level abstractions. The cross-abstractionanalysis exploits accuracy for functional verification, andachieves significant speedups for performance estimationwith marginal accuracy loss. We demonstrate results onan industrial strength networking CMP design utilizing abus matrix interconnection network. To the best of ourknowledge, the CARTA framework is the first model-basedtool-chain that utilizes multiple abstractions and modelcheckers for the comprehensive and formal functional veri-fication, performance estimation, and real-time verificationof bus matrix based CMP designs.

Index Terms—Real-time, performance analysis, modelchecking, chip multiprocessor, bus matrix interconnect.

I. INTRODUCTION

Modern chip multiprocessors (CMPs) consist ofseveral heterogeneous components such as pro-grammable processors, custom logic blocks, memo-ries, and peripherals, all of which are connected to-gether via an interconnection network. These CMPdesigns must satisfy increasingly complex perfor-mance constraints for emerging applications. This is

This work was partially supported by the NSF grants CCR-0225610, ACI-0204028, CNS-0615438, CNS-0613971, NSF SODgrant CNS-0804230, a CPCC Fellowship, and a grant by FujitsuLaboratories of America.

becoming more and more challenging for system de-signers because of the large number of componentson a chip that have multifaceted dependencies andinteractions with each other. Model-based designis an emerging paradigm that aims to manage thiscomplexity by systematically capturing key proper-ties of CMPs, such as their structure, parameters ofindividual components, and their interactions.

This paper introduces the model-based Cross-abstraction Real-time Analysis (CARTA) frameworkfor the cross-abstraction analysis of CMP designs,that combines the concepts of component-baseddesign, domain-specific modeling, simulations, andmodel checking to provide a unified framework forthe functional and performance analysis of CMPswith bus matrix interconnection networks. The de-sign flow aims to address three major challenges inthe formal analysis of CMP designs: (1) functionalverification - to ensure that the system will notbe trapped in a deadlock or livelock state, (2)performance estimation - in order to obtain tightbounds on the worst case performance of the CMPdesign, and (3) verification of real-time properties -to prove whether individual deadlines for tasks andperformance estimates hold for the CMP design.

The proposed approach builds on several meth-ods originally developed for the real-time veri-fication and performance estimation of software-intensive distributed real-time embedded (DRE) sys-tems. While CMP designs themselves can be viewedas DRE systems, the communication subsystem inCMP designs has a major impact on both design andanalysis. Unlike software-intensive DRE systemsthat communicate over packet-switched networks,CMP designs often utilize complex bus matrix ar-chitectures, where access to the bus is managed byan arbiter (or several arbiters). Bus protocols andarbitration policies have a major impact on key de-

2

sign parameters such as throughput and delays, andpresent new challenges for functional verification. Inparticular, deadlock-freedom and livelock-freedomis not guaranteed by bus protocols, but is a keyrequirement for designers. A key contribution ofthis paper is to show how methods for the analysisof DRE systems can be adapted to CMP designsutilizing fully connected bus matrix interconnects,and how point arbitration policies can be expressedby the non-preemptive scheduling of task graphs.The contributions of this paper are focused on thefollowing areas:

• We describe the CARTA framework for thecross-abstraction analysis of CMP designs, anddefine a model-based design flow for the anal-ysis of CMP designs in Section II.

• We utilize the ALDERIS domain-specific mod-eling language (DSML) introduced in [1] forthe formal modeling of CMPs with bus ma-trix interconnection networks. ALDERIS wasproposed as a DSML for the modeling ofsoftware-intensive DRE systems. In this paperwe extend the use of ALDERIS to complex busmatrix architectures commonly used in modernCMPs. The novelty in this paper is to showhow complex bus designs can be abstractedout as transaction-level models, and how wetranslate resource allocation to the ALDERIStask graph model. The ALDERIS DSML isused as a high-level specification of the CMP,and directly drives the functional verification,performance estimation, and performance ver-ification methods. This is described in moredetail in Section IV.

• We describe an approach for the functionalverification of CMPs using the ARM Ad-vanced Microcontroller Bus Architecture Ad-vanced High-speed Bus (AMBA AHB) bus ma-trix interconnection networks (also referred toas Multi-layer AHB and AHB-Lite). We extendearlier work on the verification of simple busdesigns [2] to complex bus matrix structures.We use FSM models of the AMBA AHB pro-tocol with cycle-accurate timing information toformally verify deadlock-freedom. We describethis method in Section V.

• We utilize a discrete event simulation (DES)-based formal performance estimation methoddescribed in [3] to estimate the real-time per-

formance of a CMP design. By switching tomore abstract representation of the design, weachieve significant speedups and scalability in-crease with negligible accuracy loss. Section VIdescribes results for this work.

• We build on our earlier work on the real-timeverification of distributed real-time embedded(DRE) systems [4], [5], [6], [7] to propose amethod to verify estimates for real-time per-formance using timed automata model check-ers. By incorporating timed automata modelcheckers in the design flow, we can provethat the model satisfies the performance es-timates achieved by the DES-based method.Section VII presents this timed automata-basedreal-time verification method.

• Finally, the major contribution of this paper isthat it tightly integrates all of the above stepsin the CARTA model-driven design analysisframework. This approach provides a way touse time-accurate models for functional veri-fication, and more abstract representations forscalable performance estimation. By adoptingthe right abstractions to different steps of theanalysis, we can significantly increase scalabil-ity when needed, while also retaining accuracyfor steps where it is needed. We compare thescalability of the proposed methods in Sec-tion VIII.

II. THE CARTA FRAMEWORK

Fig. 2 shows our proposed cross-abstraction real-time analysis framework that provides a way toutilize the right level of abstraction for each analysismethod. The three challenges addressed by our pro-posed analysis framework – functional verification,performance estimation, real-time verification – re-quire different approaches, models of computation,abstractions, and tools for formal analysis. We pickthe model of computation and abstraction levelfor each analysis method that provides the mostefficient analysis. Finding the right abstraction is akey challenge for the model-based analysis of CMPdesigns.

For functional verification, it is important toaccurately capture the signals in the bus matrixinterconnection network. If the analysis model is tooabstract, certain problems can remain undetected inthe design phase. In our framework, we make use

3

Fig. 1. The CARTA Model-based Analysis Framework

of a cycle-accurate finite state machine (FSM) modelof computation (MoC) to capture the bus protocol,and arbitration algorithms. Using this model, we canefficiently check for all combinations of commu-nication signals that satisfy the protocol to verifythat no deadlocks and starvations occur in the CMPdesign.

The effectiveness of performance estimation andreal-time verification, on the other hand, is primarilylimited by scalability issues. Cycle-accurate FSMmodels quickly lead to the state space explosionproblem, when used for performance estimation, orreal-time verification. Therefore, we need to raisethe abstraction to transaction-level formal models.Transaction-level abstractions are well-establishedin the domain of simulation-based design explo-ration [8]. Transaction-level formal models in ourcontext are event-driven, and communicate via asyn-chronous message passing. Timing information inthe models is captured as time intervals associatedwith events. We apply the transaction-level mod-eling concept to increase the scalability of formalmethods in the CARTA framework.

The CARTA framework builds on various mod-eling and analysis tools created by the researchcommunity and the authors. We utilize the GenericModeling Environment (GME) [9] as a model-ing tool for designing ALDERIS [1] models (http://alderis.ics.uci.edu), as described in Section IV.Domain-specific modeling languages (DSMLs) inour approach are defined by the concept of model-integrated computing [10], that promotes the useof meta-modeling to create custom modeling lan-

guages which are a good fit for a specific problemdomain. ALDERIS captures key properties of a CMPdesign, such as computation units, inter-componentdependencies, the mapping of tasks to HW or SW,execution times and delays, and key constraints thatthe design has to satisfy. CMP designs are specifiedusing the ALDERIS DSML, and drive the CARTAmodel-based analysis framework.

ALDERIS models of CMP designs are exe-cutable discrete event (DE) models with formalsemantics. These models can be transformed intofinite state machine (FSM) models with cycle-accurate timing accuracy. We utilize the open-sourceDistributed Real-time Embedded Analysis Method(DREAM) tool (http://dre.sourceforge.net) for theperformance estimation of ALDERIS models, andthe NuSMV [11] model checker for the functionalanalysis of the FSM models. The DREAM tool alsogenerates a direct timed automata (TA) represen-tation of ALDERIS models, that can be analyzedby the UPPAAL [12] and Verimag IF [13] timedautomata model checkers.

A. Model-based Design Flow using CARTA

Fig. 2 shows the proposed model-based designflow using CARTA. The proposed approach is amulti-step process, in which the domain-specificmodel continually evolves until all required proper-ties are satisfied. This evolution is performed by thedesigner based on feedback from the analysis tools.The design flow starts with the domain-specificmodel, capturing key properties of the design, such

http://alderis.ics.uci.edu


http://dre.sourceforge.net

4

Fig. 2. Model-based Design-flow using CARTA

as its structure, behavior and environment. CARTAutilizes the ALDERIS DSML to specify CMP de-signs for real-time analysis. Analysis methods uti-lized by CARTA include (1) functional verification,(2) simulations, (3) performance estimation, and (4)real-time verification. While functional correctnessand simulation results are required for performanceestimation and real-time verification, some of thesteps can be performed concurrently. This approachreduces the overall design time compared to asequential design process.

The goal of functional verification is to provethat the CMP design is safe and works accordingto specification. CARTA proposes the use of cycle-accurate FSM models for functional verification bythe NuSMV tool, and models masters and slavesconnected to the interconnect as “black boxes”,that communicate non-deterministically. The modelchecker explores the resulting state space to proverequired properties in the design. Interconnect pro-tocols have to be captured by a formal specificationfor model checking, therefore designers need to

create models manually whenever a new protocol isconsidered for use in CMP designs. When workingwith a new protocol, if a property is not satisfied,designers need to check whether the problem is inthe design itself, or in the manually created FSMmodels, and update models accordingly. A compo-sitional approach for modeling is desirable, as itallows to reuse the formal specification for variousCMP designs. In this paper we describe cycle-accurate FSM models for CMP components builton the AMBA AHB [14] protocol, and use thesemodels for compositional functional verification.

Simulations form an integral part of the proposeddesign flow, and provide accurate task executiontimes and delays to the ALDERIS models. For thepurposes of simulation, we capture CMP designsin the SystemC [15] modeling language at theCycle Count Accurate At Transaction Boundaries(CCATB) modeling abstraction [16]. SystemC is aC++ library that provides a rich set of primitivesfor modeling communication and synchronization.The CCATB modeling abstraction is a form of

5

transaction-based bus cycle accurate model thatenables fast and accurate performance estimation forCMP designs. CCATB captures transactions in thedesign using function calls which allow a significantspeedup in simulation. For instance, in the AMBAAHB on-chip communication architecture, hundredsof signals can transition during a data read or writeissued from a processor to a memory. In the CCATBmodel, a read() or write() function call cap-tures the functionality of the hundreds of signals,while still maintaining cycle accuracy required formeaningful exploration. This leads to a reductionin modeling time, and improves simulation speedby several orders of magnitude over signal-accurateC++ or RTL models. CCATB also performs addi-tional optimizations, such as effectively clusteringstatic CMP delays and incrementing simulation timein chunks, to further improve simulation speed.

The CARTA framework facilitates a novel discreteevent simulation (DES)-based simulation-guidedperformance estimation method introduced in [3].Models in the DRE MoC capture dependenciesbetween timers, tasks, channels in a formal setting,as well as timing information and the schedulingalgorithm. Therefore, the dependencies essentiallyimpose a partial ordering between events in themodel. It was shown in [3] that two execution tracesof the DRE MoC, that were obtained by DES areequivalent, if the total (untimed) ordering of eventsis the same. If the order of events is fixed, theDE model is deterministic, and one simulation issufficient to verify real-time properties. When theorder of events can change due to non-deterministicexecution times, all the different traces need to beenumerated in order to verify that no deadlinesare missed in the model. As long as the orderof events is preserved, one only has to considerthe largest possible timestamps of events. Thus,the DES-based performance estimation method isbased on repeated simulations, that explore the non-deterministic ordering of events in the CMP models.We refer to this approach as “simulation-guidedmodel checking”.

The simulation-guided model checking methodis built on the assumption that discrete event sim-ulations using limited horizon are sufficient, afterwhich the system returns to an initial idle state. Thisis often the case if computation is driven by periodictimers. If this assumption can be guaranteed, thenthe analysis is exhaustive, and proves the schedu-

lability of the CMP design. If not, or the analysisdoes not terminate due to scalability issues, then anadditional step is needed for real-time verificationby timed automata.

We use results of the performance estimation toformalize properties on the end-to-end performanceof the design. Model checkers are traditionally op-timized to answer yes/no type questions, and there-fore require the performance estimation results asinput parameters (i.e., is the end-to-end computationtime less than the estimate?). This provides a wayto prove the schedulability and performance of theCMP design.

B. Relationship between Functional and Real-timeAnalysis

CARTA proposes the use of cycle-accurate FSMmodels for functional verification, and transaction-level DE/TA models for real-time analysis. Whilethe transaction-level models are more abstract, theFSM models are not a direct refinement of eitherthe DE or TA models. Therefore, it is important toclarify how the functional analysis relates to real-time analysis in the proposed approach.

The DES-based performance estimation and thetimed automata-based real-time verification arebased on an event-driven communication paradigm,where it is assumed that messages are passedthrough the interconnect as atomic transactions. Inmost CMP interconnect protocols – including theAMBA AHB on-chip communication architecture– hundreds of signals can transition during a dataread or write issued from a processor to a mem-ory. The DE and timed automata models abstractout this complexity to reduce modeling time andimprove simulation speed, while still maintainingcycle accuracy required for meaningful exploration.Communication on the bus is represented as simpleevent passing between masters and slaves.

For functional verification, it is important toaccurately capture the signals in the bus matrixinterconnection network. If the analysis model is tooabstract, certain problems can remain undetected inthe design phase. Therefore, it is necessary to con-sider the actual signals that pass on the interconnectfor functional verification.

Creating a cycle-accurate representation of thewhole CMP application, however, is not necessaryfor practical analysis. The question that functional

6

verification aims to address is to show that therecannot be a combination or sequence of signals onthe CMP interconnect that leads to a deadlock orlivelock. We also show that masters get access tothe interconnect whenever they request access, andcan safely perform read and write operations. Oncethese conditions are shown, there is no need to con-sider cycle-accurate simulations of the actual CMPapplication for functional verification purposes.

The functional analysis has to capture the com-munication architecture of the CMP design, ar-bitration policies, masters/slaves connected to thebus, and focus on the various combinations andsequences of signals that are allowed by the pro-tocol. This analysis does not consider the behaviorof individual components for the functional verifica-tion, but rather focuses on problems that may arisewhen designers integrate components by industrystandard interconnect protocols. If there cannot beany sequence or combination of bus signals thatleads to deadlocks or livelocks, we show that theCMP design is functionally correct, and can buildon this result to utilize transaction-level models forreal-time analysis.

The DES-based performance estimation methodand the timed automata-based analysis builds onthe results of functional verification, as we cansafely assume that masters and slaves can reliablycommunicate over the interconnect, following anevent-based message passing paradigm. Thus we getthe best of both worlds; we can prove functionalcorrectness, and also benefit from improved analysisperformance in transaction-level models.

III. NETWORKING ROUTER CMP DESIGN

To demonstrate the effectiveness of our designflow, we use CARTA to explore a networking routercase study. This design is a high-level abstractionof a router design by Conexant Systems, that wasfirst described in [17]. This system is used for datapacket forwarding, processing and encoding. TheCMP design implementation for this case studyconsists of multiple processors, memories and net-work interfaces, and a bus matrix interconnectionnetwork. The different application tasks are mappedonto the CMP components shown in Fig. 3. Fig. 3shows a simplified version of the hardware platform,without peripherals (e.g. timer, interrupt controller,UART, etc.) and without the DMA engine, for

Fig. 3. Networking Router CMP HW Design

clarity. The major components in the CMP are thetwo embedded processors (CPU_1 and CPU_2),a post processing dedicated HW (PostProc), anencrypt engine dedicated HW (Enc), on-chip mem-ories (MEM 1 and MEM 2), network interfaces tocommunicate with external components(IF1 toIF6), and an AMBA AHB bus matrix interconnec-tion network (Multi-layer AHB) to facilitate inter-component communication on the CMP.

Several terms are used in the research communitywhen referring to interconnect architectures. Termssuch as bus matrix, crossbar switch, and multi-layerall refer to interconnect designs where masters andslaves are connected by more than one bus. Busesin bus matrix interconnects are also referred to as“layers”, and buses containing a single master andslave are also referred to as “links”.

The AMBA AHB protocol allows the creationof buses where only a single master and slave isconnected. In this paper we use the term “bus” inthis sense, to emphasize the fact that all mastersand slaves communicate through standard AMBAAHB interfaces instead of being directly linked toeach other by wires. We use the term “bus matrix”to refer to the terms Multi-layer AHB and AHB-Lite by ARM. AHB-Lite specifically considers thespecial case where only a single bus master is usedfor each bus (layer).

There are several advantages to using an industrystandard bus between a single master and slave.Most importantly, it allows the use of any compo-nent in the design that supports the bus protocolstandard. In the case of AMBA AHB, designers

7

can pick any AMBA AHB master and slave fromthird parties, and know that they can be reliablyconnected by AMBA AHB. Other advantages arethat it provides a way to request retransmission ofdata from the master in the case of errors (i.e.RETRY response), it can be used to slow downthe master if the slave needs more time to processrequests (i.e. HREADY), and more importantly, itis much easier to extend in the future (by addingmore masters/slaves) than a direct wire between amaster and slave. On the slave side, generally apoint arbiter (slave multiplexor) is used. This pointarbiter can control the HREADY signal on eachbus connected to the slave. Therefore, it can pickwhich master gets access to the slave by settingHREADY for that bus, and disabling for others.AHB-Lite is marketed by ARM for bus matrixinterconnects where only a single bus masters areused. AHB-Lite removes the request/grant protocoland does not support RETRY/SPLIT from slaves.However, it also does not support slaves that mayissue RETRY/SPLIT responses.

Fig. 4 shows a high level overview of the soft-ware design modeled as a task graph, as well asthe mapping of the different tasks onto the CMPcomponents. Computation tasks (T) mapped to thesame processing unit communicate directly witheach other. The blocks marked as (B) representaccesses to the bus matrix, either to read/write oneof the memory units (M), or to communicate withthe external environment through network interfaces(IF). Processor CPU_1 is used to execute tasksassociated with the intrusion detection functionality,while processor CPU_2 executes tasks associatedwith the simple protocol translation and packetforwarding functionalities. The encrypt engine dedi-cated HW block is optimized for data encoding, andexecutes tasks that perform different encodings onpacket data. A post-processing dedicated HW blockis used to speed up the back-end of the protocoltranslation, as well as the encoding functionalities.

We now describe the networking router applica-tion case study in more detail. The system receivesdata packets from the network interface (IF) com-ponents. The receiver (PktRx) tasks are responsi-ble for data packet pre-processing and preliminarydecoding. Subsequently, the system performs multi-ple functions - intrusion detection, simple protocoltranslation, forwarding, and packet encoding. Theintrusion detection functionality consists of a Chk

Fig. 4. Networking Router CMP SW Design

task that is used to check if the packets have notbeen subjected to some suspicious activity, as wouldbe the case if there is an intrusion. The HdrCal taskis used to perform data packet processing to detectintrusions, based on user-defined intrusion signa-tures. If an intrusion is detected, then reports of sus-picious activities are generated. Finally the packetsare stored in physical memory Mem_1 by the Mem1task from where another subsystem either blocks theflow, or passes the packets on to the next router.In the simple protocol translation functionality, theProc1 task is used to strip the source protocol andstore the data packets into physical memory Mem_1by task Mem2. The memory also consists of a set oftranslational templates. These templates are used bythe Cnfg task to strip the source protocol header,and then append the new protocol information to thedata. The Recalc_1 task finally reorders the datapayload and new header, and sends out the packetsto an outgoing network interface. The Assm taskis used instead of the Cnfg and Recalc_1 tasksif the protocol translation is fairly lightweight (forinstance if the source and destination protocols aresimilar). The forwarding functionality consists of atask Pkt_fwd which receives a packet and updatesthe header data (e.g., updating time stamps andstripping source routing fields), and then forwardsthe data to the output interface. Enc implementsthe packet data encoding functionality, and storesthe results into physical memory Mem_2 by taskMem3. Subsequently, task Recalc_2 is responsi-

8

ble for reordering the data payload and creating aheader. Task Proc2 is used to finally perform post-processing the packet data headers before forward-ing them to the output network interface.

IV. MODELING BUS MATRIX-BASED CMPDESIGNS

A. The ALDERIS Domain-Specific Modeling Lan-guage

There are three basic constructs in ALDERIS;timers, tasks, and channels. Tasks represent compu-tations or resource utilization in CMP systems, andare characterized by an execution interval. Chan-nels are simple FIFO buffers that can also expresscommunication delays. Timers are special tasks thatperiodically broadcast events, triggering the execu-tion of its dependents. These three constructs allowmodeling of CMP designs as a set of interconnectedtask graphs.

The tasks and timers in ALDERIS are mapped tothreads. Threads represent non-preemptive sched-ulers – tasks and timers mapped to a threadare scheduled by a fixed-priority non-preemptivescheduler. Threads are in turn mapped to CPUs.ALDERIS makes use of components to express hier-archy. Components may contain tasks and channels,and communicate through input/output ports.TheALDERIS DSML is freely available for downloadat http://alderis.ics.uci.edu.

We refer to the semantic domain of the ALDERISDSML as the DRE model of computation (MoC).The semantics of the DRE MoC were formallyspecified as a discrete event system [3], and asnetwork of timed automata [4], [5], [1], [6], [7].

A major advantage of having both a discreteevent (DE) and timed automata (TA) representationis that we can choose which one we prefer touse depending on the problem that needs to beaddressed and the required abstraction level. In thispaper, we use the DE formalism for performanceestimation, and the TA representation for real-timeverification.

B. Modeling the Router CMP using ALDERIS

The networking router design shown in Fig. 3uses an AMBA AHB bus matrix interconnectionnetwork. This architecture simplifies the arbitrationpolicy – simple point arbitration is used at theslaves (either memories or network interfaces) to

determine which master gets access to the slave.If the slave is not already busy serving a master,any master can get access to it immediately, andtransmit data through its dedicated bus to the slave.If the slave is already busy serving a request, thenthe masters trying to access the slave are forced towait. When the slave is free again, the master withthe highest priority gets access to the slave. Thissimple point arbitration policy provides a great fitfor simple task scheduling problems – the memoryis modeled as a task, and scheduling this task forexecution represents the access of the memory byother tasks. This abstraction provides a simple, butaccurate model to capture the event flow betweentasks, as well as memory and interface accessesthrough the bus matrix.

In the following we describe how we usedthe ALDERIS domain-specific modeling language(DSML) to model the case study described inSection III. Tasks and interfaces are modeled asALDERIS tasks. We introduce FIFO buffers betweentasks in order to (1) capture communication delayson the bus, and (2) buffer events, in case the taskis already executing, and not able to receive theevent yet. Dependencies in the ALDERIS task graphfollow the SW design dependencies. However, thereare some differences, as the ALDERIS models cap-ture concurrency, real-time properties, and schedul-ing as well. We add timers to the model that rep-resent the sampling rates over the input interfaces.Timers periodically push events, that represent datareceived from the environment. The task graph isevent-driven and asynchronous, therefore furtherevent propagation is not synchronized, timers aresolely used as a triggering mechanism.

Fig. 4 shows the dependencies in the SW design.It does not, however, capture the semantics of theevent flow. There are four branches in the model(from IF1, IF2, IF3, and Mem2). The branchesfrom IF1, IF2, and IF3 represent a choice whereonly one path is taken. The branch from Mem2to Assm and Cnfg represent broadcast – bothdependents receive the event, and process it whenthey are scheduled for execution.

One physical HW unit may be represented as notjust one, but several CPU constructs in ALDERIS.When a choice is made, a simulation trace willchoose either one of the paths depending on certainconditions. However, to calculate the worst-caseperformance, the formal analysis has to capture both


9

Fig. 5. ALDERIS Model of the Router CMP in the GME Tool

options, so we build a model where both paths aresimultaneously taken. The CPU_2 HW componentshown in Fig. 4 is modeled as two independent(logical) CPUs in the ALDERIS model shown inFig. 5 (CPU_2_a and CPU_2_b). When runninga simulation, only one path is active, and thispath is running on the physical CPU_2 HW. Whenwe perform the formal exploration, we duplicatethe CPU_2 physical processor to the CPU_2_aand CPU_2_b logical processors, and execute bothpaths simultaneously on these independent CPUmodels. This approach is feasible as the executionof a task on one path does not interfere withthe execution of another task from the other path.Therefore, the formal model concurrently exploresboth possible traces simultaneously.IF2 is the interface for the protocol translation

functionality. Depending on whether the data isforwarded or stored in the memory for furtherprocessing, one of two distinct event paths is takenfrom IF2. Although both paths execute on the sameprocessing unit CPU_2, these two paths cannot beactive at the same time; either the PktRx3, Proc1path is taken (and then Mem2 and so on), or thePktRx4, Pkt_fwd, IF5 path. Even though thesepaths are mapped to the same processing unit, theyare not concurrent. Therefore, we execute both paths

in parallel for performance estimation. By introduc-ing a new logical processing unit, CPU_2_b, wecan simultaneously explore both paths at the sametime.

We encounter the same situation at the branchfrom IF1; either PktRx1, or PktRx2 is chosen toprocess the packages arriving through the interface.Therefore, we have the option to introduce a newlogical processing unit, to capture the fact thatPktRx1 and PktRx2 do not execute at the sametime. Multiple logical masters are generally requiredanytime a choice is made. However, if the choice isperformed within a single thread, and both pathsmerge before any task communicates with tasksmapped to remote threads, the different paths maybe grouped into a single task. This is a manualoptimization technique that may improve analysisperformance.

We utilize the manual optimization in thiscase, as both PktRx1 and PktRx2 are mappedto the same thread, and both paths merge be-fore the bus access is made. We substitute anew logical task (PktRx1/2), with a bestcase execution time (bcet) of min[bcet(PktRx1),bcet(PktRx2)], and a worst case execution time(wcet) of max[wcet(PktRx1), wcet(PktRx2)].Regardless of which execution path is taken

10

Fig. 6. Finite State Machine Model of an AMBA AHB Master

(PktRx1 or PktRx2), task PktRx1/2 capturesthe execution intervals of both tasks, and thereforecan be used for worst case performance estimation.This method increases scalability with minimal lossof accuracy, and is therefore preferable in the earlystages of the design flow. We apply the same idea forthe branch from IF3, and substitute tasks PktRx5and PktRx6 with the PktRx5/6 logical task.Finally, the branch from Mem2 follows broadcastsemantics to dependents, and therefore requires nochanges in the ALDERIS representation. The result-ing ALDERIS model shown in Fig. 5 captures depen-dencies between tasks, the mapping of tasks to thetarget platform, as well as timing information of thenetworking router CMP design, and allows formalanalysis as described in the following sections.

V. FUNCTIONAL VERIFICATION OF BUS MATRIXCMP DESIGNS

Functional verification is a key challenge in thedesign of complex CMP systems. Modern bus proto-cols are increasingly complex, and do not guaranteethe correctness of the overall CMP design. More-over, they are often specified by natural languages,and therefore there may remain ambiguities in thespecification. In [2], we presented a method forthe modeling and functional verification of CMPdesigns utilizing an AMBA AHB bus. In this paper,we apply this work to CMP designs using fully-connected AMBA AHB bus matrix interconnects.

Fig. 7. Finite State Machine Model of an AMBA AHB Slave

We have considered two key problems in ourformal analysis. A deadlock can be observed inthe finite state machine model as a state where notransitions are enabled. A livelock can be observedas a state from which only a subset of states isreachable. A livelock can express situations likestarvation, where a master is not granted access tothe requested slave.

In a fully connected bus matrix, there is a ded-icated AMBA AHB bus between each master andslave. Point arbitration is used to manage access toslaves (such as memories). The bus matrix designincreases throughput and concurrency, as multipletransactions can take place simultaneously betweendifferent masters/slaves.

In bus matrix designs, the functional verificationis simplified. Each bus in the bus matrix con-nects a single master to a single slave. Therefore,congestions on the bus are greatly reduced, andcan only appear when the slave is busy servinganother master. There are altogether 11 buses in thenetworking router case study, as shown in Fig. 3.Point arbitration is managed by a simple fixed-priority arbiter. Priorities for masters are definedas follows (in decreasing order): CPU 1, CPU 2,PostProc, Enc.

We have created a cycle-accurate model of theAMBA AHB bus protocol in order to model trans-actions over the bus accurately. We capture multiple

11

signals over the bus, such as BUSREQ, HREADY,HGRANT, HADDR, HTRANS, and HRESP. We con-sider RETRY responses from the slave (HRESP =RETRY), as well as split transfers, as we analyzeMulti-layer AHB. AHB-Lite does not support splittransfers, RETRY responses, or the request/grantprotocol, and is therefore even simpler to analyzethan Multi-layer AHB.

We did not model HLOCK signals for the analysis.HLOCK signals are set by a master that needs unin-terrupted access to the bus during a transaction. Ifthe master asserts HLOCK, the arbiter simply holdsits state. A master that does not deassert HLOCKcould cause a livelock by disallowing the slave toserve other masters. However, it is the responsibilityof the master to manage the HLOCK signal, and amaster that does not deassert HLOCK would be con-sidered faulty. When the master deasserts HLOCK,the arbiter is free to continue where it left off, sono livelocks can occur.

AMBA AHB masters and slaves are modeledas abstract state machines as shown in Fig. 6 andFig. 7. The models were manually created in ac-cordance with the AMBA AHB specification. Mas-ters are modeled using 6 states (idle, busreq,haddr, read, write, error), and slaves aremodeled using 4 states (idle, write, read,error). We have modeled arbitration delays, busyslaves, and two-cycle response times for RETRYand SPLIT responses as defined in the AMBAAHB specification. SPLIT responses are initiatedby the slave in the proposed FSM models non-deterministically.

We have used the NuSMV [11] model checker toverify CTL [18] properties on the networking routerCMP design. We specified AMBA AHB masters,slaves, and the arbiter in NuSMV code. Algorithm 1shows the NuSMV specification for an AMBA AHBarbiter managing a single master and slave. Thismodel is sufficient to verify the correctness of thearbiter, as in a fully connected bus matrix each busconnects a single master to a single slave. Pointarbitration is managed at the slave side; wheneverthe slave is busy serving a transaction, it signalsthis fact by setting the HREADY signal to low(other implementations are also possible). NuSMVmodels for AMBA AHB buses containing multiplemasters using a round-robin arbiter are availableat http://alderis.ics.uci.edu/amba2.html.

In [2] we have described an ambiguity in the final

Algorithm 1 NuSMV Specification of an AMBAAHB Arbiter Managing a Single Master and SlaveMODULE bus_matrix_arbiter (HTRANS, HREADY, HRESP,BUSREQ, HGRANT, HMASTER, HSPLIT)

VARmaster_state : idle, mask, grant, transmit;master_prev_state : idle, mask, grant, transmit;lasterror : boolean;preferred : master, default;

ASSIGNinit (master_state) := idle;init (master_prev_state) := master_state;init (lasterror) := 0;init (preferred) := default;next (master_prev_state) := master_state;next (preferred) :=case-- Master starts the transmission!HGRANT & master_state != mask & HREADY &

HRESP = OK & BUSREQ : master;-- Split masterHGRANT & HRESP = SPLIT : default;-- Master finishes the transactionHGRANT & master_state = transmit & HTRANS =

IDLE & HREADY & HRESP = OK & !BUSREQ :default;

-- Master gives up its BUSREQHGRANT & master_state = idle & HTRANS = IDLE

& HREADY & HRESP = OK & !BUSREQ : default;-- Unmasking masters!HGRANT & master_state = mask & HSPLIT =

master : master;1 : preferred;

esac;next (master_state) :=case-- Roll back for retrysHRESP = RETRY & !lasterror :

master_prev_state;master_state = idle & HREADY & HRESP = OK &

HGRANT : grant;master_state = grant & HTRANS != IDLE &

HREADY & HRESP = OK & HGRANT : transmit;master_state = transmit & HTRANS = IDLE &

HREADY & HRESP = OK & HGRANT : idle;HREADY & HRESP = SPLIT & HGRANT & !lasterror

: mask;master_state = mask & HSPLIT = master : idle;lasterror : master_state;!HREADY : master_state;1 : master_state;

esac;next (lasterror) :=caseHRESP = RETRY : 1;HRESP = SPLIT : 1;1 : 0;

esac;HMASTER :=casepreferred = master : master;preferred = default : default;

esac;HGRANT :=casepreferred = master : 1;1 : 0;

esac;

version of the AMBA AHB specification, that mayarise when a slave splits a a master, and requestsretransmission by setting the HRESP = RETRYresponse in the same bus cycle.

In a fully connected bus matrix, each AMBAAHB bus connects a single master and slave.Therefore, the slave has little reason to split themaster. AHB-Lite does not support split transfersand RETRY responses, therefore the problem doesnot occur in AHB-Lite. The ambiguity, however, itapplicable to Multi-layer AHB, as it supports bothsplit transfers and RETRY responses. Therefore, theambiguity is applicable to the networking router

http://alderis.ics.uci.edu/amba2.html

12

case study as well. We use the simple fix of disal-lowing the slave to split a master and set the HRESPsignal to RETRY in the same cycle.

To show that no deadlocks can occur in themodel, we have shown that the error state isunreachable in masters and slaves by checking thefollowing CTL formulas (where x refers to the indexof masters, y is the index of slaves):

AG (MASTERx.state != error),AG (SLAVEy.state != error).

We have specified rules to enforce that whenever amaster is split, it will be eventually unsplit in orderto avoid livelocks. We have verified this propertyusing the following CTL formulas:

AG ((MASK_MASTERx) ->AF (!MASK_MASTERx)).

Finally, we checked whether starvations are possibleby checking the following formulas:

AG (MASTERx.state = busreq ->AF HGRANTx),

AG (MASTERx.state = busreq ->AF MASTERx.state = write).

A. ExperimentsWe have run experiments using the NuSMV tool

on an Intel Core i7 processor running at 4GHz with6GB triple-channel DDR3 RAM. For the arbiterconnecting a single master and slave, the verificationtime took less than a second, with 6700KB memoryconsumption. For 2 masters, the analysis took lessthan a second with 11700K memory consumption.For 3 masters the analysis took 28 seconds with92080KB memory consumption. Since in this paperwe consider fully connected bus matrices only,scalability issues do not arise, as each master isconnected to each slave by a dedicated AMBA AHBbus.

We have found that starvations are possible ingeneral when high-priority masters continually re-quest access to slaves, and therefore lower-prioritymasters do not get served. Therefore, we needto consider the actual dependencies in the model,and perform real-time analysis to ensure that thiscondition does not occur. We need to consider theactual communication between tasks in the CMPdesign, and consider whether the starvation mayoccur in the actual CMP design.

This problem, however, cannot be adequatelyaddressed at the cycle accurate abstraction due tothe long computation times, that would lead to statespace explosion. There is no theoretical limitationon performing real-time verification at the cycle ac-curate abstraction; the limitiation is present simplydue to the state space explosion present at low-levelabstractions.

In the next section we describe how we obtainedworst case performance estimates on the networkingrouter case study, and how we used the results inthe final stage for real-time verification using timedautomata in Section VII. We address the problemof starvation in Section VII.

VI. FORMAL PERFORMANCE ESTIMATION BYDISCRETE EVENT SIMULATIONS

This section describes how we utilized the open-source DREAM tool – that implements the DES-based simulation-guided performance estimationmethod – for the performance estimation of thenetworking router CMP design. We build on thefact that the design is based on a fully connectedAMBA AHB bus matrix. When applying the pro-posed method to bus matrix designs that are notfully connected, accuracy loss is expected, as theDES-based performance estimation does not capturecongestions on the buses due to arbitration policies.The CMP design shown in Fig. 5, however, utilizesa fully connected bus matrix as shown in Fig. 3,therefore arbitration policies can be captured aspoint arbitration at the slave side.

Table I shows the parameters used for the modelshown in Fig. 5. We have obtained the deadlinesfrom application requirements, and the executiontime estimates by using fast and accurate system-level simulation at the Cycle Count Accurate AtTransaction Boundaries (CCATB) abstraction [16].

Fig. 8 illustrates the DES-based performance es-timation method. The dependencies imply a partialordering on the execution order of tasks, as well asthe events during the discrete event simulation. Wedistinguish three major types of events in Fig. 8;input events, denoted as “i”, output events denotedas “o” and start events denoted as “s”. Each eventis followed by a number to indicate which task isresponsible for the event. This is simply a syntacticshort-hand to keep the nodes in the tree small.Numberings for tasks are given in the last columnof Table I.

13

Fig. 8. A Partial View of the Event Order Tree for the Example Shown in Fig. 5 using the Parameters in Table I

Task Priority WCET BCET DL NoIF1 1 200 - 201 1IF2 2 100 - 101 2IF3 3 400 - 401 3IF4 4 200 - 201 22IF5 5 200 - 201 14IF6 6 400 - 401 23Mem1 7 100 - 201 16Mem2 8 100 - 201 13Mem3 9 100 - 101 15

PktRx1/2 10 1442 770 1443 4Chk 11 450 220 451 8

HdrCal 12 2400 1340 2401 12PktRx3 13 1530 1400 1531 5PktRx4 14 670 590 671 6Pkt_fwd 15 600 200 601 10Proc1 16 2800 2400 2801 9Cnfg 17 1600 1300 1601 18Assm 18 800 600 2401 17

Recalc_1 19 3000 2000 3801 20PktRx5/6 20 2100 1100 2101 7

Enc 21 4900 3800 4901 11Recalc_2 22 1750 1500 6381 19Proc2 23 5600 4800 5601 21

TABLE IPARAMETERS FOR THE NETWORKING ROUTER CMP DESIGN

SHOWN IN FIG. 5

As seen from Fig. 8, the application starts withtasks IF1, IF2, and IF3 receiving input events.The second node shows that the three tasks arescheduled (started) by the scheduler for execution.Since IF2 has the smallest execution time betweenIF1, IF2, and IF3, therefore IF2 will be thefirst to finish its execution. When IF2 finishesits execution, it generates an output event, that isbroadcasted to both PktRx3 and PktRx4.

The first non-deterministic branch occurs afterPkt_fwd (numbered 10) is scheduled for execu-tion. The earliest time when Pkt_fwd may finishits execution is at time 890, as can be computedby adding the best case execution times of itselfand its sources IF2, PktRx4 (100+590+200=890).We can also compute that Pkt_fwd will finishits execution by time 1370 (100+670+600=1370).For task PktRx1/2 we can similarly obtain thatit will finish its execution between 970 and 1642.Therefore, the execution intervals of Pkt_fwd andPktRx1/2 overlap, and we need to consider twocases; when Pkt_fwd finishes first, and whenPktRx1/2 finishes first. It is also possible thatthe two tasks finish simultaneously, in which caserace conditions may be considered, resulting in thesame two options. Total orderings become importantwhen multiple tasks are mapped to the same thread

14

Algorithm 2 Obtaining and Enumerating the EventOrder Tree by Discrete Event Simulations

1: create the (empty) superset of race conditions R2: set the execution time for all tasks tk ∈ T to their wcetk

time, and the next execution time for all tasks to theirbcetk time, respectively (∀tk ∈ T exec timek = wcetk,next timek = bcetk)

3: // enumerate all branching intervals4: for all permutations of exec timek assignments, obtained

using the next timek variables do5: clear the superset R6: call discrete event simulation () described in Algo-

rithm 37: // enumerate all race conditions with the current

exec timek assignments8: for all permutations of events in superset R do9: call discrete event simulation () described in Algo-

rithm 310: end for11: end for

or CPU. For example, both Mem1 and Mem2 denotememory accesses in the same memory module.Likewise, tasks Cnfg, Recalc_1, Recalc_2 andProc2 are all mapped to the same thread, andtherefore it is important to consider the order inwhich they may be scheduled for execution. Fig. 8only illustrates the top of the tree, and the greyrectangles refer to subtrees of the correspondingnodes.

The size of the event order tree grows veryfast even for moderate-size examples, and pre-computing the tree is not possible due to resourceconstraints. Rather, the proposed method builds theevent order tree on-the-fly, that captures all thepossible total orderings of events. Branches arediscovered during simulation-time, and then subse-quently enumerated. Algorithm 2 describes how theevent order tree is constructed on-the-fly.

There may be potentially exponential numbers ofpaths in the analysis. This is not the characteristicof the analysis method, rather that of the analyzedmodel. When non-determinism is present in themodels, all possible paths have to be consideredto guarantee that all scenarios that were consideredfor the performance analysis. CARTA can expressdeterministic models, or time-triggered models aswell, in which case the analysis is polynomial.A key advantage of the proposed DES-based per-formance estimation approach is that it does notrestrict designers’ hands, the bound on analysis

Algorithm 3 function discrete event simulation ()

1: run directed discrete event simulation, during which eachtask stores its start time as startk

2: during the simulation all tasks tk observe events ei thatare raised in the [startk + bcetk, startk + exec timek]interval

3: if startk + next timek < time(ei) then4: // we have encountered a branching point in the (bcetk,

wcetk) interval5: store the value of time(ei) - startk in the next timek

variable6: else7: do nothing, event will be considered in subsequent

simulations8: end if9: // find all race conditions with the current exec timek

assignments10: for all race conditions detected between events

ei, ej , . . . ek during the simulation do11: search for the set containing events ei, ej , . . . ek in

superset R (from Algorithm 2)12: if the set is found then13: do nothing14: else15: add the set S = {ei, ej , . . . ek} to the superset R16: end if17: end for

performance strictly depends on computation power,not arbitrary restrictions on the models.

By enumerating the event order tree, one canobtain the worst case bound on the overall perfor-mance of the model. To enumerate the tree, thediscrete event simulation step described in 3 areperformed repeatedly, where the execution time oftasks is continuously updated to capture all possiblepermutations of execution times. Since the analysisis based on repeated simulations, we refer to this ap-proach as “simulation-guided model checking”. Theevent order tree captures all permutations of events,and is therefore exhaustive; it does not produce falsepositives. On the other hand, the method is builton the assumption that discrete event simulationsusing limited horizon are sufficient, after which thesystem returns to an initial idle state, which maynot be the case for all types of real-time systems.For a description of the formal performance esti-mation method and proofs on the validity of theperformance estimation method please see [3].

The DES-based method is more accurate for per-formance estimation than static analysis methods, as

15

it captures dynamic effects such as congestions onthe bus, and race conditions. The advantage of theDES-based method compared to ad-hoc simulationsis the increased state space coverage. Compared topure model checking methods, the main advantageis that it does not run out of memory on large-scaleexamples, as it is based on fast iterative simulations,and is therefore CPU-bound. Moreover, most modelcheckers are tailored to answer yes/no questions, butthe DES-based method can directly obtain the worstcase bound on the end-to-end computations.

A. ExperimentsThe open-source DREAM tool computed the

worst case end-to-end performance of the network-ing CMP design modeled as shown in Fig. 5 in 590seconds on an Intel Core i7 920 processor running at4GHz using 6GB triple channel memory, with only776KB memory consumption. The implementationof the DES-based method was not optimized totake advantage of the multi-core architecture, andtherefore executed on a single thread. The DES-based performance estimation method is easy toparallelize as it consists of repetitive simulations,and we are considering a multi-core implementationin the future.

The overall end-to-end performance estimate is17780 cycles. We use this information as a boundon the end-to-end performance of the network-ing router CMP design. Each cycle in the modelrepresents 5ns. The lowest period of the timersthat still does not violate the end-to-end boundis therefore 17780×5

103 = 88.9µs. This means thatthe highest possible frequency for the sampling is∼11.248KHz. Note that with each sampling thenetworking router CMP processes several packets,and therefore provides reasonable performance. Theanalysis is exhaustive, and therefore the bound istight. In the next section we perform real-time veri-fication on the model to prove that the performanceestimates hold, and that no starvation occurs in theCMP design.

VII. REAL-TIME VERIFICATION USING TIMEDAUTOMATA

This section describes the proposed approachfor real-time verification by timed automata. Thismethod also assumes a fully connected bus ma-trix for the analysis, as arbitration policies are not

captured in the formal models. When applying theproposed method to bus matrix designs that arenot fully connected, accuracy loss is expected. TheCMP design shown in Fig. 5, however, utilizes afully connected bus matrix as shown in Fig. 3,therefore arbitration policies can be captured aspoint arbitration at the slave side.

Real-time verification is optional in some cases,as the performance estimation method is based onan exhaustive state space search, and is therefore amodel checking method itself. There are three caseswhen the use of the extra verification step is justi-fied. First, the discrete event simulation (DES)-basedperformance estimation method does not capturetimed states to improve scalability, but rather utilizesa limited horizon for the simulations. Althoughthis approach is sufficient in most cases where thesimulation periodically returns to the initial state,it cannot be applied to all models in general. Forexample, obtaining the required time horizon forsimulating a pipeline architecture may be error-prone. In other words, while the DES-based methoddoes not produce false positives, it relies on theassumption that a limited horizon is sufficient forthe analysis. In cases where this condition cannotbe proven, the use of the additional verification stepis required.

Second, for some complex CMP designs theperformance estimation method might not terminatedue to the state space explosion problem. TA-basedmodel checkers in some cases (but not always)achieve better scalability. In this case, the use of thereal-time verification method is justified, as it mightprove the validity of the worst case performanceestimate.

Third, the TA-based model checker can be uti-lized to prove properties on the design that couldnot be carried out at the cycle-accurate level dueto the state space explosion problem. Our finitestate machine (FSM)-based analysis described inSection V has shown that starvations may be presentin the design due to the fixed-priority point arbitra-tion algorithm utilized in the slaves. By capturingthe CMP design shown in Fig. 5 as TA, we canprove that no livelocks and starvations are presentin the model. Note that real-time properties suchas execution times may influence starvations, andtherefore have to be considered for the analysis.

In this paper we utilize the timed automata modelchecking due to the third condition; it is hard to

16

prove that no deadlocks and livelocks are presentusing the DES-based method. During DES, we caneasily identify situations where a task did not exe-cute at all, but it is hard to identify whether a block-ing occurs simply due to waiting for resources, oran actual deadlock or livelock. The timed automata-based method removes any doubt, and there is noreason not to use it given that the models can beautomatically generated from DREAM.

Fig. 9 shows the partial TA representation inthe UPPAAL tool for the networking router CMPdesign shown in Fig. 5. The locations denoted withU are urgent locations, and C denotes committedlocations [12], both of which imply that time cannotpass in that location, and the outgoing transitionsneeds to be taken immediately upon entering thelocation. Tasks have two clocks, ce and cd. Tasks,channels, timers, and schedulers compose togetheras a network of timed automata, providing an ab-stract model for scheduling. The translation fromALDERIS models to the TA representation is de-scribed in detail in [4], [5], [6], [7]. The translationprocess itself is based on refinement, DREAM gen-erates a timed automata model for each task, FIFObuffer, and timer in the ALDERIS models using atemplate. Scheduling policies are specified as au-tomata, where transitions may trigger the executionof tasks.

The translation is implemented in the open-sourceDREAM tool and is fully automated. DREAM gener-ates timed automata representations for the UPPAALand Verimag IF model checkers. In this sectionwe arbitrarily focus on UPPAAL, as both tools aretimed automata model checkers. We have verifiedthat no task can violate its deadline by checking thefollowing UPPAAL macro:

A[] not deadlock (1)

Since the timed automata are created in such away to deadlock when a deadline is violated, thisanalysis proves the real-time schedulability of thesystem. Note that this result does not contradictthe FSM-based analysis. The FSM-based analysisshows that deadlocks may be present in generalin CMP designs utilizing AMBA AHB with fixed-priority point arbitration. When we consider theactual communication in the CMP design by consid-ering dependencies and when components requestaccess to resources in the timed automata model, we

can prove that no deadlock are present in the actualCMP design. In short, deadlocks may be present ingeneral, but are not present in the router case studyused in this paper.

A. ExperimentsWe have run experiments using the UPPAAL

model checker on an Intel Core i7 920 processorrunning at 4GHz using 6GB triple channel memory.The verification time took less than a second for thedesign illustrated in Fig. 9, with 9140KB memoryconsumption.

We have rounded up the period of the timerto 18000 cycles for simplicity, and we were ableto prove the end-to-end execution time of 17780cycles. The real-time verification shows that anyfrequency that is lower than the corresponding11.1KHz is guaranteed not to violate the end-to-enddeadline, or the individual deadlines of tasks.

VIII. COMPARING THE RESULTS OF THEANALYSIS METHODS

Fig. 10 illustrates the analysis time and memoryconsumption used for the analysis of the networkingrouter case study. During the functional verificationphase, we considered a cycle-accurate model ofthe AMBA AHB bus, and found that deadlocksmay be present due to the fixed-priority arbitration.The analysis time took less than one second forbuses containing a single master and slave, howeverwe observed increased analysis time and memoryconsumption as more and more masters were addedto the bus. This was mainly the result of the morecomplex arbitration policy needed. Since in fullyconnected bus matrices each master is connected toeach slave, analysis scalability was not an issue.

For the performance estimation, we relied onthe DES-based simulation-guided model checkingmethod. Analysis time was higher, while memoryconsumption was lower. Generally, we find thatmemory-bound model checkers have much betterperformance than the CPU-bound method if theexample is actually small enough to fit in the mainmemory. Once the model size increases beyond themain memory size, memory-bound model checkersare not useful in practice, as performance degradessignificantly, and no partial results are given. Forthis example, we find that the model is small enoughto fit in memory, and the various optimizations

17

Fig. 9. Partial Timed Automata Model of the Networking Router CMP Design Shown in Fig. 5 in UPPAAL

result in much improved performance compared tothe DES-based method. That said, 590 seconds iscompletely acceptable to obtain the worst case end-to-end bound on the networking router design.

For real-time verification, the UPPAAL tool showsimpressive performance by proving deadlock-freedom, as well as proving the real-time schedula-bility of the router design. Results are similar to theNuSMV results, even though the problem analyzedis different: UPPAAL considers the interactions be-tween tasks – similarly to the DES-based method– on a transaction-level abstraction. Here we seethe advantage of using multiple abstractions for theanalysis; performing the same analysis using theNuSMV tool at the cycle-accurate level would resultin serious performance penalty, and possibly evenstate space explosion. Our results show the practicalapplicability of the CARTA framework for the cross-

abstraction analysis of CMP designs.

IX. RELATED WORK

Model-based Design: Ptolemy II [19] is a mod-eling framework that composes heterogeneous mod-els of computation, and performs simulation forsymbolic performance estimation. Platform-baseddesign was proposed for the system-level design ofembedded systems [20], including wireless sensornetworks [21]. A model-driven design frameworkfor the development of embedded systems on GMEis described in [10]. CARTA also utilizes the GMEtool, but the focus is on formal analysis ratherthan development. CARTA also integrates multiplemodel checkers, and CCATB simulations for thecomprehensive functional verification, performanceestimation, and real-time verification of bus matrixCMP designs.

18

Fig. 10. Analysis Time and Memory Consumption for the Networking Router Case Study

We have introduced the ALDERIS DSML in [1].ALDERIS models embedded systems as task graphsmapped to logical threads and processing units, andcaptures key execution times, delays, and deadlinesof the design for formal analysis. We utilize theALDERIS DSML in this paper as the high-levelspecification of the design, that drives the overallmodel-based design flow. The novelty in this paperis to show how the ALDERIS DSML can be appliedfor the analysis of CMPs utilizing complex busmatrix interconnects.

Functional verification of bus architectures:An early work on applying model checking methodsto the AMBA protocol was presented in [22]. Averification platform for AMBA-ARM7 is presentedin [23]. A verification platform for AMBA using acombination of model checking and theorem prov-ing is described in [24].

All the works described above utilize formalmethods to prove the functional correctness of busprotocols. In contrast, CARTA focuses on the analy-sis of CMP designs and aims to prove that no dead-locks and livelocks can occur in in a CMP design– a property not guaranteed by the AMBA AHBprotocol itself. The novelty in this paper is to extendour earlier work on the functional verification andperformance analysis of AMBA-based CMP designsdescribed in [2] to complex AMBA AHB bus matrixarchitectures.

Performance analysis: The SymTA/S [25] ap-proach proposed methods from scheduling theoryfor the performance analysis of embedded systems.A component-based framework for schedulabilityanalysis and performance evaluation was proposedin [26]. Modular Performance Analysis [27] is anapproach based on real-time calculus, that modelsdependencies as arrival curves.

Simulations are widely used methods for perfor-mance estimation by the industry. Disadvantages ofthese approaches include immeasurable state spacecoverage, and long development times resultingin performance issues being detected too late inthe design cycle, implying higher costs to addresschanges. A semi-formal simulation-based perfor-mance estimation technique was proposed in [28].

We have presented a method for the formal per-formance estimation of distributed real-time embed-ded (DRE) systems using discrete event simulationsin [3]. This approach is a model checking methodbased on iterative simulations, and is more accuratethan static performance estimation methods, andimproves on the coverage of ad-hoc simulation-based methods. Moreover, it is able to providecounter-examples when real-time properties are vi-olated. In this paper we extend this method for theperformance estimation of CMP designs, and showa novel method to abstract out bus matrices into theDES model used by the estimation algorithm.

Real-time verification: Model checking is analternative approach for symbolic performance ver-ification. Timed automata were proposed as modelof computation for real-time verification of taskgraphs [29]. A timed automata-based approach forthe thread-level analysis of DRE systems is pre-sented in [30]. CADENA [31] is an analysis frame-work for building and analyzing CORBA Com-ponent Model (CCM)-based systems. This paperextends our earlier work on the real-time verificationof DRE systems based on timed automata describedin [4], [5], [6], [7]. We extend these works forthe real-time verification of CMP designs utilizingbus matrices. Moreover, we show how the timedautomata model checkers are integrated into ourmodel-based CARTA design flow.

19

X. CONCLUSION

We have presented the Cross-abstraction Real-time Analysis (CARTA) framework for the model-based functional verification and performance esti-mation of chip multiprocessors (CMP) utilizing busmatrix (crossbar switch) interconnection networks.The CARTA design flow aims to address threemajor challenges in the formal analysis of CMPdesigns: (1) functional verification, (2) performanceestimation, and (3) real-time verification, and pro-vides a cross-abstraction bridge between the finitestate machine (FSM), discrete event (DE) and timedautomata (TA) models of computation. CARTA uti-lizes multiple model checkers to analyze formalproperties at the cycle-accurate and transaction-levelabstractions. We have demonstrated the applica-bility, as well as the scalability of the prototypeanalysis methods implemented in the CARTA frame-work on a networking router CMP design. TheALDERIS modeling language (http://alderis.ics.uci.edu) and the open-source DREAM tool (http://dre.sourceforge.net) are free to use for any purpose.

REFERENCES

[1] G. Madl and N. Dutt, “Domain-specific Modeling of PowerAware Distributed Real-time Embedded Systems,” in Proceed-ings of the 6th Workshop on Embedded Computer Systems:Architectures, Modeling, and Simulation (SAMOS), 2006.

[2] G. Madl, S. Pasricha, Q. Zhu, L. A. D. Bathen, and N. Dutt,“Formal Performance Evaluation of AMBA-based System-on-Chip Designs,” in Proceedings of EMSOFT, 2006, pp. 311–320.

[3] G. Madl, N. Dutt, and S. Abdelwahed, “Performance Estimationof Distributed Real-time Embedded Systems by Discrete EventSimulations,” in Proceedings of EMSOFT, 2007.

[4] ——, “A Conservative Approximation Method for the Ver-ification of Preemptive Scheduling using Timed Automata,”in Proceedings of the 15th IEEE Real-Time and EmbeddedTechnology and Applications Symposium (RTAS), 2009, pp.255–264.

[5] G. Madl, S. Abdelwahed, and D. C. Schmidt, “Verifying Dis-tributed Real-time Properties of Embedded Systems via GraphTransformations and Model Checking,” Real-Time Systems,vol. 33, pp. 77–100, Jul 2006.

[6] G. Madl and S. Abdelwahed, “Model-based Analysis of Dis-tributed Real-time Embedded System Composition,” in Pro-ceedings of EMSOFT, 2005.

[7] G. Madl, S. Abdelwahed, and G. Karsai, “Automatic Verifica-tion of Component-Based Real-Time CORBA Applications,” inProceedings of the 25th IEEE International Real-Time SystemsSymposium (RTSS), 2004, pp. 231–240.

[8] S. Pasricha, “Transaction Level Modeling of SoC with SystemC2.0,” in Synopsys User Group Conference (SNUG), May 2002.

[9] A. Ledeczi, A. Bakay, M. Maroti, P. Volgyesi, G. Nordstrom,and J. Sprinkle, “Composing Domain-Specific Design Environ-ments,” IEEE Computer, pp. 44–51, Nov 2001.

[10] K. Balasubramanian, A. Gokhale, G. Karsai, J. Sztipanovits,and S. Neema, “Developing Applications using Model-drivenDesign Environments,” IEEE Computer, vol. 39, pp. 33–40,2006.

[11] A. Cimatti and E. Clarke and E. Giunchiglia and F. Giunchigliaand M. Pistore and M. Roveri and R. Sebastiani and A.Tacchella, “NuSMV 2: An OpenSource Tool for SymbolicModel Checking,” in Proceedings of the 14th InternationalConference on Computer-Aided Verification (CAV), 2002.

[12] P. Pettersson and K. G. Larsen., “UPPAAL2k,” Bulletin ofthe European Association for Theoretical Computer Science,vol. 70, pp. 40–44, feb 2000.

[13] M. Bozga, S. Graf, I. Ober, I. Ober, and J. Sifakis, “The IFToolset,” Formal Methods for the Design of Real-Time Systems,LNCS 3185, pp. 237–267, 2004.

[14] ARM, “AMBA Specification rev 2.0, IHI-0011A,” 1999.[15] J. R. W. Muller and W. Rosenstiel, SystemC Methodologies and

Applications. Kluwer Academic Publishers, 2003.[16] S. Pasricha, N. Dutt, and M. Ben-Romdhane, “Fast Exploration

of Bus-based Communication Architectures at the CCATB Ab-straction,” ACM Transactions on Embedded Computing Systems(TECS), vol. 7, no. 2, pp. 1–32, 2008.

[17] ——, “BMSYN: Bus Matrix Communication Architecture Syn-thesis for MPSoC,” IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, (TCAD), vol. 26,no. 8, pp. 1454–1464, 2007.

[18] E. Clarke and E. Emerson, “Design and synthesis of synchro-nisation skeletons using branching time temporal logic,” Logicof Programs, Lecture Notes in Computer Science, vol. 131, pp.52–71, 1981.

[19] E. A. Lee, C. Hylands, J. Janneck, J. D. II, J. Liu, X. Liu,S. Neuendorffer, S. S. M. Stewart, K. Vissers, and P. Whitaker,“Overview of the Ptolemy Project,” EECS Department, Uni-versity of California, Berkeley, Tech. Rep. UCB/ERL M01/11,2001.

[20] Alberto Sangiovanni-Vincentelli, “Defining Platform-based De-sign,” EEDesign of EETimes, February 2002.

[21] A. Bonivento, C. Fischione, L. Necchi, F. Pianegiani, andA. Sangiovanni-Vincentelli, “System Level Design for Clus-tered Wireless Sensor Networks,” IEEE Transactions on Indus-trial Informatics, vol. 3, no. 3, pp. 202–214, August 2007.

[22] A. Roychoudhury, T. Mitra, and S. R. Karri, “Using FormalTechniques to Debug the AMBA System-on-Chip Bus Proto-col,” in Design, Automation and Test in Europe (DATE), 2003,pp. 828–833.

[23] K. W. Susanto and T. F. Melham, “An AMBA-ARM7 FormalVerification Platform,” in International Conference of FormalEngineering Methods (ICFEM), 2003, pp. 48–67.

[24] H. Amjad, “Verification of AMBA Using a Combination ofModel Checking and Theorem Proving,” Electronic Notes inTheoretical Computer Science, Proceedings of the 5th Interna-tional Workshop on Automated Verification of Critical Systems(AVoCS 2005), vol. 145, pp. 45–61, 2006.

[25] Rafik Henia and Arne Hamann and Marek Jersak and RazvanRacu and Kai Richter and Rolf Ernst, “System Level Perfor-mance Analysis - the SymTA/S Approach,” IEE Proceedingson Computers and Digital Techniques, vol. 152, pp. 148–166,2005.

[26] K. Richter, M. Jersak, and R. Ernst, “A Formal Approach toMpSoC Performance Verification,” IEEE Computer, vol. 36, pp.60–67, April 2003.

[27] Ernesto Wandeler and Lothar Thiele and Marcel Verhoef andPaul Lieverse, “System architecture evaluation using modularperformance analysis - a case study,” Software Tools for Tech-nology Transfer (STTT), vol. 8, no. 6, pp. 649–667, Oct. 2006.





20

[28] K. Lahiri, A. Raghunathan, and S. Dey, “System-Level Per-formance Analysis for Designing On-Chip CommunicationArchitectures,” IEEE Transactions on Computer Aided-Designof Integrated Circuits and Systems, vol. 20, pp. 768–783, 2001.

[29] C. Ericsson, A. Wall, and W. Yi, “Timed Automata as TaskModels for Event-Driven Systems,” in Proceedings of Real-Time Computing Systems and Applications (RTSCA), 1999, pp.182–189.

[30] Venkita Subramonian and Christopher Gill and Cesar Sanchezand Henny B. Sipma, “Reusable Models for Timing andLiveness Analysis of Middleware for Distributed Real-TimeEmbedded Systems,” in Proceedings of EMSOFT, 2006, pp.252–261.

[31] J. Hatcliff, X. Deng, M. B. Dwyer, G. Jung, and V. P. Ran-ganath, “Cadena: An Integrated Development, Analysis, andVerification Environment for Component-based Systems,” inProceedings of International Conference on Software Engineer-ing, 2003.

Gabor Madl received his Ph.D. degree inComputer Science from the University of Cal-ifornia, Irvine in 2009. His research is focusedon the combination of formal methods andsimulations for the model-based analysis ofdistributed real-time embedded systems. Hisresearch was recognized by the 2007 FrankAnger Memorial Award, which he receivedfor promoting the crossover of ideas between

the embedded software and software engineering communities. Ga-bor has worked as an engineer for Google, Fujitsu Laborato-ries of America, ARM, and has created the open-source DREAM(http://dre.sourceforge.net) and ALDERIS (http://alderis.ics.uci.edu)projects.

Sudeep Pasricha (M’02) received the B.E.degree in electronics and communication en-gineering from Delhi Institute of Technology,Delhi, India, in 2000, and the M.S. and Ph.D.degrees in computer science from the Univer-sity of California, Irvine, in 2005 and 2008,respectively.

He has several years of work experience insemiconductor companies, including STMicro-

electronics and Conexant. He is currently an Assistant Professor withthe Department of Electrical and Computer Engineering, ColoradoState University, Fort Collins. He has published more than 40 researcharticles in peer-reviewed conferences and journals and presented sev-eral tutorials in the area of on-chip communication architecture designat leading conferences. He recently coauthored a book titled On-chipCommunication Architectures (Morgan Kauffman/Elsevier 2008). Hisresearch interests are in the areas of multiprocessor embedded systemdesign, networks-on-chip and emerging interconnect technologies,system-level modeling languages and design methodologies, andcomputer architecture. Dr. Pasricha has received a Best Paper Awardat ASPDAC 2006, a Best Paper Award nomination at DAC 2005, andseveral fellowships and awards for excellence in research.

Nikil D. Dutt (F) received a Ph.D. in Com-puter Science from the University of Illinois atUrbana-Champaign in 1989, and is currentlya Chancellor’s Professor at the University ofCalifornia, Irvine, with academic appointmentsin the CS and EECS departments. His researchinterests are in embedded systems, electronicdesign automation, computer architecture, op-timizing compilers, system specification tech-

niques, distributed embedded systems, and formal methods. Hereceived best paper awards at CHDL89, CHDL91, VLSIDesign2003,CODES+ISSS 2003, CNCC 2006, ASPDAC-2006, and IJCNN 2009.Professor Dutt currently serves as Associate Editor of ACM Trans-actions on Embedded Computer Systems (TECS), and of IEEETransactions on VLSI Systems (IEEE TVLSI). He served as Editor-in-Chief of ACM Transactions on Design Automation of ElectronicSystems (TODAES) between 2004-2008. He was an ACM SIGDADistinguished Lecturer during 2001-2002, and an IEEE ComputerSociety Distinguished Visitor for 2003-2005. He has served on thesteering, organizing, and program committees of several premierCAD and Embedded System Design conferences and workshops,including ASPDAC, DATE, ESWEEK, CASES, CODES+ISSS, IC-CAD, ISLPED and LCTES. He serves or has served on the advisoryboards of ACM SIGBED and ACM SIGDA, the ACM PublicationsBoard, and previously served as Vice-Chair of ACM SIGDA and ofIFIP WG 10.5. Professor Dutt is a Fellow of the IEEE, an ACMDistinguished Scientist, and an IFIP Silver Core Awardee.

Sherif Abdelwahed is an Assistant Profes-sor with Electrical and Computer EngineeringDepartment at Mississippi State University.He received his Ph.D. degree in Electricaland Computer Engineering from the Universityof Toronto, Canada, in 2002. From 2000 to2001, he was a research scientist at RockwellScientific Company. From 2002 to 2007 heworked as a research assistant professor with

the Institute for Software Integrated Systems at Vanderbilt University.His main research interests include model-based design and analysisof self-managing computation systems, modeling and analysis ofdistributed real-time systems, automated verification, fault diagnosistechniques, and model-integrated computing. He has published over70 publications. He is a senior member of the IEEE and member ofSigma Xi.

cross-abstraction functional veriﬁcation and performance ...gabe/papers/mpda_ieee_tii_2009.pdf ·...

Documents