automatic parallelization using pipelining for equation-based

Linköping Studies in Science and Technology

Thesis No. 1381

Automatic Parallelization using Pipelining for Equation-Based Simulation Languages

by

Håkan Lundvall

Submitted to Linköping Institute of Technology at Linköping University in partial fulfilment of the requirements for the degree of Licentiate of Engineering

Department of Computer and Information Science Linköpings universitet

SE-581 83 Linköping, Sweden

Linköping 2008

Automatic Parallelization using Pipelining for Equation-Based Simulation Languages

by

Håkan Lundvall

September 2008 ISBN 978-91-7393-799-3

Thesis No. 1381 ISSN 0280-7971

ABSTRACT

During the most recent decades modern equation-based object-oriented modeling and simulation languages, such as Modelica, have become available. This has made it easier to build complex and more detailed models for use in simulation. To be able to simulate such large and complex systems it is sometimes not enough to rely on the ability of a compiler to optimize the simulation code and reduce the size of the underlying set of equations to speed up the simulation on a single processor. Instead we must look for ways to utilize the increasing number of processing units available in modern computers. However to gain any increased performance from a parallel computer the simulation program must be expressed in a way that exposes the potential parallelism to the computer. Doing this manually is not a simple task and most modelers are not experts in parallel computing. Therefore it is very appealing to let the compiler parallelize the simulation code automatically. This thesis investigates techniques of using automatic translation of models in typical equation based languages, such as Modelica, into parallel simulation code that enable high utilization of available processors in a parallel computer.

The two main ideas investigated here are the following: first, to apply parallelization simultaneously to both the system equations and the numerical solver, and secondly. to use software pipelining to further reduce the time processors are kept waiting for the results of other processors. Prototype implementations of the investigated techniques have been developed as a part of the OpenModelica open source compiler for Modelica. The prototype has been used to evaluate the parallelization techniques by measuring the execution time of test models on a few parallel archtectures and to compare the results to sequential code as well as to the results achieved in earlier work. A measured speedup of 6.1 on eight processors on a shared memory machine has been reached. It still remains to evaluate the methods for a wider range of test models and parallel architectures.

This work has been supported by the Swedish Foundation for Strategic Research (SSF) in the ECSEL graduate school, by Vinnova in the Safe and Secure Modeling and Simulation Project, and by the

Swedish Research Council (VR).

Acknowledgements

I have learned a great deal during these years working towards this thesis and I have many people to thank for this. First, I would like to express my gratitude towards my supervisor Peter Fritzson for giving me the opportunity to work on this research and for the guidance and encouragement he has provided. I also want to thank my co-supervisors Christoph Kessler and Dag Fritzson for their support.

Even though I live two hours drive from the colleagues at PELAB it has always been a pleasant drive knowing that interesting discussion and good friends await when I arrive. Many thanks to all the employees at PELAB for that. And I would especially like to thank Bodil for all the administrative help which has made the life easier for all the rest of us.

Those days when I have not drove all the way to Linköping, Sogeti Sverige AB has provided a nice working atmosphere here in Örebro, as well as a computer to work with.

My time has been divided evenly between the PhD studies and the work as a consultant at Sogeti and I wish to thank Atlas Copco Rock Drills AB for keeping me busy with assignments through out these years knowing that they can only get half of my attention.

Finally I would like to thank my lovely wife Jenny and my wonderful daughters Emma and Lina for always being there and believing in me.

Örebro, August 13, 2008

Håkan Lundvall

i

Table of Contents Part 1 Introduction 5

Chapter 1 Background and Introduction 6

1.1 Modeling and Simulation 6 1.2 Related Work on Task Scheduling and Clustering 7 1.3 Research Problem 8 1.4 Contributions 9 1.5 Thesis Structure 9 1.6 List of Papers 9

Chapter 2 Implementation – The OpenModelica Parallelizing Backend 11

2.1 Background 11 2.2 Compiler Phases 12 2.3 Modules of the OpenModelica Compiler 13 2.3.1 DAELow 14 2.3.2 TaskGraph 17 2.3.3 TaskGraphExt 18 2.3.4 ModPar/Schedule 18 2.3.5 ModPar/Codegen 20 2.3.6 ModPar/TaskGraph 21 2.4 Simulation Runtime 21 2.4.1 MPI-Based Runtime System 22 2.4.2 Pthreads-Based Runtime System 22 2.5 Additional Load Balancing and Memory Layout Ideas 23

Bibliography 25

Part II Papers 27

Paper A Event Handling in the OpenModelica Compiler and Runtime System 29

1 Introduction 31 2 Conversion of a Modelica Model to a Hybrid DAE 32 3 Simulation of Hybrid Models Based on Solving Hybrid DAEs 32 4 The Form of the Hybrid DAE System 33 4.1 Summary of Notation 33 4.2 Continuous-Time Behavior 34 4.3 Discrete-Time Behavior 35 4.4 The Complete Hybrid DAE 36 5 Hybrid DAE Solution Algorithm 36 6 Varying Structure of the Active Part of the Hybrid DAE 38 7 Finding Consistent Initial Values at Start or Restart 39 8 Detecting Events during Continuous-time Simulation 40 9 Crossing Functions 40

ii

10 Integrating the Numerical Solver with Event-Handling 42 11 Code Generation 44 12 Example 45 13 Measurements and Evaluation 47 13.1 Bouncing Ball 47 13.2 Full Wave Rectifier 48 14 Conclusions 50 15 Acknowledgements 50 16 References 50

Paper B Automatic Parallelization of Object Oriented Models Across Method and System 53

1 Background – Introduction to Mathematical Modeling and Modelica 55

2 The OpenModelica Open Source Implementation 57 3 Approaches to Integrate Parallelism and Mathematical Models 57 3.1 Automatic Parallelization of Mathematical Models 58 3.2 Coarse-Grained Explicit Parallelization Using Computational

Components 59 3.3 Explicit Parallel Programming 59 4 Combining Parallelization at Several Levels 59 5 Pipelining the Task Graph 62 6 Sorting Equations for Short Access Distance 62 7 Scheduling 63 8 Measurements 64 9 Conclusion 65 10 Future work 65 11 Acknowledgements 66 12 References 66

Paper C Automatic Parallelization of Models using Pipeline Extraction from Combined RHS and Inlined Solvers 69

1 Introduction 71 2 Background on Mathematical Modeling and Modelica 72 2.1 The OpenModelica Open Source Environment 74 3 Approaches to Exploit Parallelism in Mathematical Models 75 3.1 Automatic Fine-Grained Parallelization of Mathematical Models 75 3.2 Coarse-Grained Explicit Parallelization Using Computational

Components 75 3.3 Explicit Parallel Programming 76 4 Combining Parallelization from Several Levels of Abstraction 76 5 Pipelining the Task Graph 77 6 Sorting Equations for Short Access Distance 77 7 Scheduling 79 8 Measurements 81

iii

9 Conclusions 85 10 Future work 85 11 Acknowledgements 85 12 References 86

Part 1 Introduction

Chapter 1 Background and Introduction

In this chapter we first give a short background and introduction to the research area investigated. We also state the research problem and list the contributions of this work. At the end of the chapter we present the structure of this thesis and list the papers presenting the results of the research.

1.1 Modeling and Simulation

In many areas modeling and simulation becomes more and more important. In product development, for example, modeling and simulation is used to allow for shorter development time and more robust systems. Instead of embarking on a time consuming and expensive prototype construction, a model of the system is built and experiments are performed on this model. The model can be a physical model where some details of the real system has been left out, but usually the model is a mathematical one stored in a computer and the experiments carried out on the model are computer simulations. Running the simulations on a model instead of building the system can help us to answer about the system which would not otherwise be possible due to reasons such as:

• The experiments would be too expensive to perform on a real system. • The investigated variable may not be accessible in the real system. • Performing some experiments on the real system may be too dangerous. • It may take a too long time to perform the experiment on the real system.

In order to perform a simulation of a model on a computer we must have an executable program that performs all the calculations needed. When computer simulations first saw the light of day these programs were hand written in relatively low level languages like C or Fortran. Although hand coded simulation programs are still used and developed, more and more models are written in high level equation-based modeling languages such as Modelica[4], gPROMS[9], and VHDL-AMS[8]. One of the great benefits of these equation-based languages is that the mathematical models behind the simulations are usually given in equations and a lot of manual work of transforming the equations into assignment statements can be

Related Work on Task Scheduling and Clustering 7

avoided. The same submodel can also be used in several different contexts where the dependent variables, i.e., the variables to solve for, are not necessarily the same. This greatly reduces development time and improves reusability. The information about which variables are dependent variables and which are independent variables in the equations is called the causality of the system.

As the models grow larger and more accurate simulation results are desired, more computational power is needed. Although computers have become faster, recently most of the increased computational power of new computers can only be utilized with extensive usage of parallelism. This puts demands on the modeling languages used to allow the simulation programs to utilize parallelism either by introducing language constructs to allow the modeler to express parallelism explicitly in the models, but even better by automatically parallelizing the same models previously used in simulations on single CPU machines.

1.2 Related Work on Task Scheduling and Clustering

When a mathematical model is translated into executable simulation code a task graph is generated which describes all dependencies between the tasks which puts restrictions on the order in which the tasks can be executed. Therefore it is important to have efficient algorithms that can schedule the tasks on a parallel computer. Much research has been done in this area. In this section we summarize some of the previous work on task scheduling and clustering.

A classification of scheduling algorithms can be found in [13], which classifies the algorithms in a hierarchical manner. At the top level the algorithms are divided into local and global algorithms, where the local algorithms schedule the tasks on one processor whereas the global algorithms consider the case of multiple processors. The global algorithms are further divided into static and dynamic algorithms. In this work we explore static scheduling which means the scheduling takes place at compile time.

Static scheduling can be done optimally or sub-optimally and sub-optimal scheduling is further divided into heuristic and approximate scheduling. Both the optimal and the approximate sub-optimal scheduling are divided into four different kinds: enumerative, graph-theory, mathematical programming, and queuing theory. The work in this thesis falls under the approximate scheduling category.

There exist algorithms for automatically scheduling entire taskgraphs on a fixed number of processors, such as the Modified Critical Path (MCP) scheduling algorithm [14] and the Earliest Ready Task (ERT) algorithm [15].

The problem with scheduling algorithms which solves the entire scheduling problem, including assigning each task to a processor, is the computational complexity of such algorithms. Therefore, a clustering algorithm is used to make the job easier for the scheduler. A cluster is a set of tasks that run on the same processor. After the clustering face, a smaller number of clusters can be assigned to the processors and the scheduler still need to determine the execution order within the processor, i.e., local scheduling.

8 Background and Introduction

Existing clustering algorithms include the Internalization algorithm [16] and the Dominant Sequence Clustering (DSC) algorithm [17]. The Internalization algorithm works by traversing all edges of the task graph and check whether assigning the two tasks to the same processor would lead to an increased total execution time. If this is not the case the edge is internalized and the two tasks are put in the same cluster.

1.3 Research Problem

In this section we discuss the research problem of this work. In the research leading to this thesis the aim has been to investigate if other

techniques to automatically translate models stated in equation-based languages into parallel code can lead to more efficient code compared to previously tested approaches. A second objective of this work has been to reduce the complexity of the translation algorithm in order to speed up the translation process.

The research methodology used in this work is the traditional system oriented computer science research method. In order to validate our hypotheses, a prototype implementation is built. By running simulations of test models on parallel computers we measure the amount of time needed to complete the simulations. We can compare these measurements to test runs on a single processor and thereby calculate the gained speed-up. In order to compare the results to earlier work the test model used is chosen such that test data are available from the earlier work to which we compare our results. This work shares essentially the same objectives as the work presented in [7] by Peter Aronsson, but we continue a bit further and develop some new methods.

Parallelizing the simulation of a mathematical model can be done at several levels, see Paper B, page 59. The calculation of the simulation results involves a numerical solver which iterates over the simulation problem in order to compute the result. Parallelization can be done at the numerical method level by using a parallel version of the solver or at the system level by using a sequential solver where each call to the right hand side of the system of equations is parallelized.

One hypothesis explored in this work is that more speedup can be gained if parallelization is done at both of these levels at the same time.

The methods developed in Aronssons thesis use a graph rewriting system to automatically merge tasks into larger units to increase the granularity of the task graph. This is done on the global full system task graph. A second hypothesis explored in this work is that compilation time can be reduced by assigning the tasks to processors early, guided by information about locality present in the model, to avoid expensive work on a large task graph, and instead later merge the tasks locally in each processor, to increase granularity.

Contributions 9

1.4 Contributions

The following are the main contributions of this work:

• Contributions to the OpenModelica compiler (specifically the event handling mechanism) which is used in several research projects. This is described in Paper A, pages 29-50.

• A new variant of topological sorting of equations to allow for simple and fast scheduling with little communication. This is described in Paper B, pages 53-66.

• Task clustering after processor assignment to enable pipelining of simulations steps without unnecessary waiting times. This is described in Paper C, pages 69-85.

1.5 Thesis Structure

This thesis has the following structure. Part I is divided into two chapters where the first gives an introduction including

the research objective, some background on the Modelica Language and a summary of the contributions of this thesis. The second chapter describes the implementation of the prototype system developed as a part of this thesis work.

Part II contains the papers which presents the research results of this work.

1.6 List of Papers

The research results are presented in three papers in Part II of the thesis. The papers are;

Paper A: Event Handling in the OpenModelica Compiler and Runtime System

Håkan Lundvall, Peter Fritzson, and Bernhard Bachmann

Edited version of a technical report published in Technical Reports in Computer and Information Science No. 2 Linköping Electronic Press. 2008. An earlier shorter version of this paper was published in the Proceedings of the 46th Conference on Simulation and Modelling of the Scandinavian Simulation Society (SIMS2005), Trondheim, Norway, October 13-14, 2005.

10 Background and Introduction

Paper B Automatic Parallelization of Object Oriented Models Across Method and System

Håkan Lundvall and Peter Fritzson

Edited version of a paper originally published in Proceedings of 6th Eurosim Congress, Ljubljana, Slovenia, 2007.

Paper C Automatic Parallelization of Models using Pipeline Extraction from Combined RHS and Inlined Solvers

Håkan Lundvall and Peter Fritzson

Edited version of a paper submitted to the 14th Workshop on Compilers for Parallel Computing, IBM Research Center, Zurich, Switzerland, January 7-9, 2009

Chapter 2 Implementation – The OpenModelica Parallelizing Backend

In order to test and evaluate the methods and theories developed in this thesis work, which are described in Papers A to C, a prototype implementations of these methods have been developed as part of the OpenModelica compiler. This chapter describes the implementation of the prototype parallelizing backend for OpenModelica[6].

2.1 Background

OpenModelica is an open source project aiming at creating a complete Modelica modeling, compilation and simulation environment. The current version includes the following parts:

• A Modelica compiler (omc). • An interactive shell (OMShell) which gives a command interface to the

compiler. • The OpenModelica Notebook (OMNotebook), a Mathematica style

electronic notebook for Modelica where Modelica models can be written and simulated within the document.

• A Modelica Eclipse plugin (MDT – Modelica Development Tooling) • The OpenModelica Development Environment (OMDev).

The OpenModelica project was started at PELAB in 1998 and is now run within The Open Source Modelica Consortium [11].

In earlier work on automatic parallelization of Modelica code, the ModPar[7] system was developed and integrated into the OpenModelica compiler.

12 Implementation – The OpenModelica Parallelizing Backend

Some code from the ModPar system has been reused also in this implementation to save implementation time. The ModPar system is divided into a number of different modules, some written in MetaModelica and some in C++. A description of the modules is presented in Section 2.3.

2.2 Compiler Phases

In this section we provide a short overview of the different compiler phases involved when compiling a Modelica model, see also Figure 2-1.

First a Modelica parser reads the Modelica source code and produces an abstract syntax tree (AST) that represents the Modelica model. In the next step a translator reorganizes the model into a flat list of equations and variables. This involves, among other things, handling inheritance, import statements and redeclarations.

After the equations have been flattened the equations are topologically sorted according to the causal dependencies between the equations. This step involves symbolic manipulations such as algebraic simplifications, symbolic index reduction, and, removal of trivial equations. Once the causality has been established, the dependent variable in each equation, not part of a strong component, is solved for and the equations are converted to assignment statements. Equations that are part of a strong component must be solved simultaneously. Such systems of simultaneous equations are either solved symbolically at this stage, if possible, or left to a numerical solver. We now have an execution order in which the right hand side of the equations can be evaluated.

In the last step C code is generated for all the assignment statements and a suitable C code representation (e.g. an RHS function) of the strongly connected equations is generated, which is then linked to a numerical solver.

Modules of the OpenModelica Compiler 13

Modelica Source Code

Translator

Analyzer

Optimizer

Code Generator

C Compiler

Simulation

Modelica models

Flat model

Sorted equations

Optimized sorted equations

C Code

Executable

Figure 2-1. OpenModelica compiler translation phases.

The newly developed version of ModPar only influences the last few steps; equation sorting and code generation. An extra step has been added to the topological sorting to keep equations with dependencies closer together, and the code generation step is exchanged for a set of modules which generates a task graph and performs scheduling and generation of parallel code. In section 2.3 we give a more detailed explanation of the modules involved.

2.3 Modules of the OpenModelica Compiler

The OpenModelica compiler is divided into a number of modules, see Figure 2-2. A description of the entire compiler and its modules can be found in the OpenModelica system documentation [10]. Most of the modules are implemented in MetaModelica [12]. The ModPar system adds a number of modules to the compiler. A list of the modules implementing the ModPar system is shown in table 2-1. For each module we have a section “Added Functionality” which summarizes the contributions to the implementation compared to the old ModPar version.


SCode /explode

Lookup

Parse DAELow Inst

Ceval

Static

Absyn SCode DAE

(Env, name) SCode.Class

Exp.Exp

Values.Value

SCode.Exp (Exp.Exp,

Types.Type)

(Env, name)

Main

CodeGen SimCodeGen

C code

Dump DAE Flat Modelica

Figure 2-2. The most important modules of the OpenModelica Compiler (omc) with module connections and data flow (from the OpenModelica System Documentation). There are more than 40 modules in the compiler.

When using the ModPar system the SimCodeGen module is exchanged for the ModPar system which takes over the task of generating simulation code. Some changes have also been made to the DAELow module which handles the topological sorting of the equations

Module name Implementation language

Specific to ModPar

DAELow MetaModelica No

TaskGraph MetaModelica Yes

TaskGraphExt MetaModelica/C++ Yes

ModPar/Schedule C++ Yes

ModPar/Codegen C++ Yes

ModPar/TaskGraph C++ Yes

Table 2-1. List of modules implementing the ModPar system.

2.3.1 DAELow

This module takes care of sorting of the equations according to data dependency and building the block lower triangular form of the system. Input to the module is


the flattened set of equations. The module provides functions for matching of variables to solve for, to access equations and finding out the causality between the equations.

2.3.1.1 Main Functions

Below follows explanations of the most important functions of the DAELow module:

public function lower

input DAE.DAElist lst;

input Boolean add_dummy;

output DAELow outDAELow;

end lower;

This function takes a list of discrete algebraic equations (DAE), which is the flattened model, and translates it into a different form, called DAELow defined in this module. The DAELow representation splits the DAE into equations and variables and further divides variables into known and unknown variables and the equations into simple and non-simple equations.

The input variable add_dummy adds a state variable and an associated equation which is independent of all other variables. This is used if the model contains no state variables since the numeric solver requires at least one state variable.

public function incidenceMatrix

input DAELow inDAELow;

output IncidenceMatrix outIncidenceMatrix;

end incidenceMatrix;

This function calculates the incidence matrix, i.e., which variables are present in each equation. This matrix has rows representing each equation and columns representing each unknown variable. A non-zero entry means that the (column) variable is referenced in the (row) equation. See also Paper C, page 77 and the following page.


public function matchingAlgorithm

input DAELow inDAELow;

input IncidenceMatrix inIncidenceMatrix;

input IncidenceMatrixT inIncidenceMatrixT;

input MatchingOptions inMatchingOptions;

output Integer[:] outAssignments1;

output Integer[:] outAssignments2;

output DAELow outDAELow3;

output IncidenceMatrix outIncidenceMatrix4;

output IncidenceMatrixT outIncidenceMatrixT5;

end matchingAlgorithm;

This function performs the matching algorithm, which is the first part of sorting the equations into BLT (Block Lower Triangular) form. The BLT form is discussed in Paper C on page 77. The matching algorithm finds a variable that is solved in each equation. But to also find out which equations form a block of equations, the second algorithm of the BLT sorting is run, finding strong components. This function returns the updated DAE which, in case of index reduction, has added equations and variables, and the incidence matrix. The variable outAssignments is returned as a vector of variable indices, as well as its inverse, i.e., which equation a variable is solved in as a vector of equation indices.

The variable inMatchingOptions contain options given to the algorithm.

• Whether index reduction should be used or not. • Whether the equation system is allowed to be underconstrained or not,

which is used when generating code for initial equations.

public function strongComponents

input IncidenceMatrix inIncidenceMatrix1;

input IncidenceMatrixT inIncidenceMatrixT2;

input Integer[:] inAssignments1;


output list<list<Integer>> components;

end strongComponents;

This is the second part of the BLT sorting. It takes the variable assignments and the incidence matrix as input and identifies strong components, i.e., subsystems of equations. The result is stored in the output result variable, a list of integer lists called components. The integers stored in the innermost list represent the equation numbers and are references to specific equations of the model. The result is a topological sorting of the dependencies between the equations, i.e., the equations can be solved in the order given by the outermost list. If the innermost list contains more than one equation, it constitutes a strongly connected component and the involved equations must be solved simultaneously.


public function sortBlocksForShortAccessDistance

input list<list<Integer>> comps;

input Integer[:] matching;

input IncidenceMatrix incidenceMatrix;

output list<list<Integer>> outComps;

end sortBlocksForShortAccessDistance;

This function takes an already topologically sorted list of equations and rearranges the equations to bring equations with direct dependencies closer together. Pseudo code for this function can be found in Paper B, after page 62.

2.3.1.2 Added Functionality

The main difference compared to the sequential version of the compiler is the sorting step introduced in the function sortBlocksForShortAccessDistance.

This can greatly reduce the need for communication between the processors when we schedule the tasks based on the order of the equations given by this sorting step.

2.3.2 TaskGraph

This module replaces the SimCodegen module when generating parallel code. In the standard OpenModelica compiler the SimCodegen module traverses the sorted equations as they come from the DAELow module and generates C code for the equations. When generating parallel code the TaskGraph module traverses the equations in the same way with the difference that instead of C code, a task graph is generated and extra tasks implementing a Runge-Kutta solver are interleaved in the generated task graph, i.e., a kind of inlining of the solver.


The buildTaskgraph function is the main function and only public function of the TaskGraph module. Its job is to generate the task graph.

public function buildTaskgraph

input DAELow.DAELow inDAELow1;



input list<list<Integer>> inComponets;

end buildTaskgraph;

It takes the sorted list of equations and the matching of equations to unknown variables as input. For each equation the variable matched to the equation by the matching algorithm is solved for so that the equation can be reorganized into an assignment statement. Then tasks are added for the right hand side of the assignment and one task for the result being the root of this generated subtree.


When generating tasks for the right hand side expression, references to unknown variables are looked up in the already generated part of the task graph. Tasks for calculating all such references are already in the generated part of the task graph since the equations are traversed in data dependency order. All tasks are marked with the position in the topological sorting of the equation from which they were generated.

The function does not have a return value. Instead the task graph is stored in the internal state of the C++ modules. Normally MetaModelica functions cannot have internal state, but calling C++ functions makes this possible.


This module now also generates tasks for the solver itself and not just for computing the right hand side of the simulated system. This means some extra variables are introduced to store intermediate results used by the solver tasks and the tasks calculating the right hand side are duplicated for each solver stage.

2.3.3 TaskGraphExt

This module constitutes the interface between the MetaModelica part and the C++ part of the backend. It exposes functions for building the task graph and generating simulation code. Since this module and the rest of the ModPar system is implemented as a C++ library it can unlike MetaModelica modules, hold an internal state. The generated task graph is stored in this internal state of the C++ library and is accessible from all the C++ modules described below.

2.3.4 ModPar/Schedule

This module implements the scheduling algorithm. Input to the module is the generated task graph and the number of processors. The result is one task list for each processor. The first thing the scheduler does is to split all the tasks into one set of tasks per processor to generate code for. The position of the equation in the topological sorting is used for this.

In the next stage a topological sorting of the tasks within a processor must be established. This is what is discussed in Paper C. The order in which the tasks are used when generating the C code is established using a priority queue in which all tasks associated to a processor is stored. The queue is sorted on two properties: 1) task set index and 2) level. The task set index refers to which task set within a given processor the task has been assigned to and the level refers to the length of the path from the task to the ‘end task’. The ‘end task’ being a task with inbound edges from all the tasks representing the unknown variables, thus being the last task in any topological sorting. The task sets are discussed in Paper C. In short all tasks within one processor are divided into three sets executed in order and where all inbound


communication to a processor is gathered to the boundary between the first set (Pa,i) and the second set (Pb,i) and where all outgoing communication is gathered to the boundary between the second set and the third set (Pc,i).

To calculate these properties the task graph is traversed with three depth first traversals. The first one has a visitor function which sets the level attribute of each visited node to the maximum level of all child nodes incremented by one. The second one is a reverse depth first search which divides all nodes into three categories;

1. The node has at least one incoming edge from a task scheduled on a higher ranked processor or at least one parent that was already assigned to this category. This means that this node is dependent on results from the previous simulation step.

2. The node has at least one incoming edge from a task scheduled on a lower ranked processor or at least one parent was already put in this category. This means that some inbound communication from another processor must have preceded this task.

3. The node does not fit into either of the two categories above. In the third traversal the tasks are clustered into the three sets (Pa,i , Pb,i , and

Pb,i). The visitor function of the third traversal works in the following way:

• If the currently visited node is in category 3 and there is at least one child in category 1 or 2, then the node is inserted in Pa,i.

• If the currently visited node is in category 1 or 2 and there are at least one child assigned to different processor, then the node is inserted in Pb,i.

• Otherwise the currently visited node is inserted in Pc,i.


The following are the main functions of the ModePar/Schedule module.

Schedule(TaskGraph*,VertexID start, VertexID end,

int nproc);

Initializes the scheduler and performs the scheduling algorithm described above.

TaskList * get_tasklist(int proc);

Returns the sorted task list for a given processor.

set<int>* getAssignedProcessors(VertexID v);

Returns the set of tasks assigned to a given processor .

bool isAssignedTo(VertexID v, int proc);

Checks whether a given node in the task graph is assigned to a processor.



This module is almost completely rewritten since a totally different scheduling approach is used compared to the original ModPar system. The task merging code is replaced by a new scheduler that performs processor assignment by column index in the BLT matrix as given by the DAELow module and local scheduling of the tasks for each processor. The job of increasing the granularity of the code is moved from the global perspective into the local scheduling for each processor.

2.3.5 ModPar/Codegen

This module generates parallel C code from the task graph along with the schedule provided by the modpar/Schedule module.

The schedule provides one task queue for each processor to generate code for. One function is generated for each task queue and put in separate files so that each function is put in a separate object file together with the data array for the simulation variables. Before generating the simulation functions the tasks are traversed to collect data about all the communication needs. For each pair of processors between which messages need to be communicated, a set is kept where all variables communicated are stored. Before beginning to generate the tasks in Pc,i for processor a given processor i these communication sets are used to generate message receive code for all messages with processor i as destination. All messages between Pa,j and Pb,i, for two processors i and j, are merged into one single message. The same happens with the message send code after the tasks of Pb,i are generated.


The following are the main functions in the ModPar/Codegen module.

void initialize(TaskGraph*, Schedule *, ...);

This function initializes the code generator. It takes the global task graph and the schedule as input parameters. Some arguments to the function have been omitted for brevity.

void generateCode(char * init_code);

This function uses the schedule and the task graph given in the initialization and generates C code.


In the version of ModPar that existed before this work started, there was only a runtime for MPI available. In that version the tasks reaching the code generator was merged into larger but fewer tasks. In a merged task all result variables of one large task was collected into one message, but there could be more than one message

Simulation Runtime 21

between one pair of processors. In the new version the original small tasks are kept, instead they are clustered into three sets. This means that the message passing code has been rewritten.

The simulation code for the different processors is now also generated into separate functions in order to be usable by the pthreads runtime.

2.3.6 ModPar/TaskGraph

This module implements the task graph and provides primitives for accessing properties of the task graph nodes. The task graph is implemented using the Boost graph Library [3].


The following are the main functions in the ModPar/TaskGraph module.

EdgeID add_edge(VertexID parent, VertexID child,

TaskGraph *tg, string *result=0,

int prio=0);

This function adds a vertex between two nodes in the task graph. The name of the result variable communicated from the parent is given in the result argument.

In case of more than one incoming edge to a node there is a priority given in order to sort out which order to use the incoming data in the task.

Other graph manipulating functions are available through the Boost graph library.


The nodes of the task graph have been equipped with a new attribute which stores information about which variable the task is involved in evaluating the result for. This information is used by the scheduler to assign the task to the appropriate processor. Otherwise this module has been kept largely the same.

2.4 Simulation Runtime

When the simulation code has been generated it is linked together with a simulation runtime. The runtime is written in C++. Two different runtime systems have been developed. One for distributed memory using MPI [1] and one for shared memory using pthreads [2].


2.4.1 MPI-Based Runtime System

In this section we give a short overview of the MPI-based runtime system. When running a parallel program using MPI all nodes run the same executable

and the MPI system takes care of starting up the program on each node that should be used. Since the same code is run on all processors a switch statement takes care of calling the correct simulation code for each processor.

Each task of the task graph generates a result, but if a task only represents a sub-expression of one equation then there is no simulation variable from the model to store the result in, so a temporary variable is used. In the MPI-based runtime these temporary variables are allocated as a global array on the heap. This is due to the fact that in the previous version of ModPar it was possible that such a variable would be used in a message to another processor. In the new version, however, all tasks originating from the same equation always get assigned to the same processor.

Messages to other processors are implemented using non-blocking MPI send and receive. Before sending all the variables involved in one message are copied to a send buffer, since they are not necessarily placed consecutively in memory.

On the receiving end there is a corresponding receive buffer from which the values are copied to their final destination.

The function containing the simulation code is called from within a loop which first updates the simulation time variable and after the simulation code is called stores the new values of all simulation variables in a result file. One result file is generated for each processor. In this way the time it takes to store the simulation results are also distributed over all the processors. The files must be concatenated afterwards to get the complete simulation result.

2.4.2 Pthreads-Based Runtime System

In the pthreads runtime the entire simulation is running in the same process but it is divided into several threads. Since all threads share the same memory it is possible to just keep one array of simulation variables that all threads reference. For the communication we just have to make sure that the task that produces the value for a variable that is used in a task on another thread has completed before the receiving task starts. To accomplish we use a map indexed by the communication id consisting of the sending and receiving thread number. In this map we store how many simulation steps the sending thread has completed. The receiving thread performs a busy waiting loop until the sending thread has passed the point were the needed values are available and has increased the counter in the communication map.

However, it turns out that the simulation goes a lot faster if we separate the simulation variables into one array per thread, see Paper C. Thus, in the latest version of the prototype each thread has its own copy of the simulation variable array and the communication mechanism is extended to copy the relevant variables between the copies.

Additional Load Balancing and Memory Layout Ideas 23

The temporary variables used to store the results of subexpressions are generated as local variables. In this way the C compiler is able to eliminate them during optimization. The resulting executable looks the same as if the entire equation was generated as one single C expression instead of a series of assignment statements using only one operator per expression.

2.5 Additional Load Balancing and Memory Layout Ideas

Some ideas have not yet found their way into the current ModPar prototype.

In Paper C we mention the possibility of load balancing when the execution cost of all tasks are not known. The idea is to make an initial guess of assigning the tasks evenly over two processors and then run a small test run, adjust the initial guess based on the test run and then iterate until the tasks are balanced. Then we split each half in two and do the same thing for four processors. If this is done inside the compiler a large part of the compilation work can be reused for each test compilation so the compilation time should not suffer that much. This function is not implemented, but we made some small test manually, which indicates that a better load balancing can be achieved compared to just splitting the tasks evenly over the processors.

In the current version of the prototype the simulation variables are laid out in memory in the order they appear in the model. It would be better if we made the memory layout of the variables according to the equation sorting. This would keep all variables used by one processor close in memory. We should also make sure that variables that are communicated between processes should be laid out adjacent to each other so that they need not be copied to a communication buffer before transmission to other processors.

Bibliography

[1] Message Passing Interface Forum. MPI a message-passing interface standard. Technical Report UT-CS-94-230. 1994.

[2] B. Nicols, D. Buttlar, and J. P. Farrel. Pthreads Programming. O’Reilly & Associates, 1996.

[3] J Siek, LQ Lee, A Lumsdaine. The Boost Graph Library: User Guide and Reference

Manual, Addison-Wesley, 2002. [4] Peter Fritzson. Principles of Object-Oriented Modeling and Simulation with Modelica 2.1,

940 pp., ISBN 0-471-471631, Wiley-IEEE Press, 2004. See also www.openmodelica.org and book web page: www.mathcore.com/drModelica

[5] The Modelica Association. The Modelica Language Specification Version 3.0, September 2007. http://www.modelica.org.

[6] Peter Fritzson, et al. The OpenModelica Modeling, Simulation, and Software Development

Environment. In Simulation News Europe, 44/45, December 2005 See also: http://www.ida.liu.se/projects/OpenModelica.

[7] Peter Aronsson. Automatic Parallelization of Equation-Based Simulation Programs. PhD thesis, Dissertation No. 1022, Dept, Computer and Information Science, Linköping University, Linköping, Sweden.

[8] Ernst Christen and Kenneth Bakalar. VHDL-AMS – A Hardware Descriptions Language for

Analog and Mixed-Signal Applications. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 46(10):1263-1272, 1999.

[9] Paul Inigo Barton. The Modelling and Simulation of Combined Discrete/Continuous

Processes. PhD thesis, Department of Chemical Engineering, Imperial Collage of Science, Technology and Medicine, London, UK, 1992.

[10] The OpenModelica System Documentation, , February 2008. www.ida.liu.se/projects/OpenModelica

[11] The Open Source Modelica Consortium (OSMC) http://www.ida.liu.se/labs/pelab/modelica/OpenSourceModelicaConsortium.html

[12] Peter Fritzson. Modelica Meta-Programming and Symbolic Transformations –

MetaModelica Programming Guide. Draft version, www.openmodelica.org, June 2007. [13] Thomas L. Casavant and Jon G. Kuhl. A taxonomy of scheduling in general-purpos

distributed computing systems. Software Engineering, 14(2):141-154, 1988. [14] C.S. Park, S.B. Choi. Multiprocessor Scheduling Algorithm Utilizing Linear Clustering of

Directed Acyclic Graphs. In Parallel and Distributed Systems, Proceedings, 1997. [15] C.Y. Lee, J.J. Hwang, Y.C. Chow, F.D Anger. Multiprocessor Scheduling with

Interprocessor Communication Delays. Operating Research Letters, vol. 7(no. 3), 1988. [16] V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press,

Cambridge, MA, 1989. [17] T. Yang, A. Gerasoulis, DSC: Scheduling Parallel Tasks on Parallel and Distributed

Systems, vol. 5(no. 9), 1994.

LINKÖPINGS UNIVERSITETLINKÖPINGS UNIVERSITETLINKÖPINGS UNIVERSITETLINKÖPINGS UNIVERSITET

Rapporttyp Report category

Licentiatavhandling

Examensarbete

C-uppsats

D-uppsats

Övrig rapport

Språk Language

Svenska/Swedish

Engelska/English

Titel Automatic Parallelization using Pipelining for Equation-Based Simulation Languages

Författare Håkan Lundvall

Sammanfattning Abstract

ISBN

ISRN

Serietitel och serienummer ISSN Title of series, numbering

Linköping Studies in Science and Technology

Thesis No. 1381

Nyckelord Automatic parallelization, Equation-Based Languages, Modelica

Datum Date

URL för elektronisk version

X

X

2008-09-26

Avdelning, institution Division, department

Institutionen för datavetenskap

Department of Computer and Information Science

During the most recent decades modern equation-based object-oriented modeling and simulation languages, such as Modelica, have become available. This has made it easier to build complex and more detailed models for use in simulation. To be able to simulate such large and complex systems it is sometimes not enough to rely on the ability of a compiler to optimize the simulation code and reduce the size of the underlying set of equations to speed up the simulation on a single processor. Instead we must look for ways to utilize the increasing number of processing units available in modern computers. However to gain any increased performance from a parallel computer the simulation program must be expressed in a way that exposes the potential parallelism to the computer. Doing this manually is not a simple task and most modelers are not experts in parallel computing. Therefore it is very appealing to let the compiler parallelize the simulation code automatically. This thesis investigates techniques of using automatic translation of models in typical equation based languages, such as Modelica, into parallel simulation code that enable high utilization of available processors in a parallel computer. The two main ideas investigated here are the following: first, to apply parallelization simultaneously to both the system equations and the numerical solver, and secondly. to use software pipelining to further reduce the time processors are kept waiting for the results of other processors. Prototype implementations of the investigated techniques have been developed as a part of the OpenModelica open source compiler for Modelica. The prototype has been used to evaluate the parallelization techniques by measuring the execution time of test models on a few parallel archtectures and to compare the results to sequential code as well as to the results achieved in earlier work. A measured speedup of 6.1 on eight processors on a shared memory machine has been reached. It still remains to evaluate the methods for a wider range of test models and parallel architectures.

978-91-7393-799-3

0280-7971

LiU-Tek-Lic-2008:39

Department of Computer and Information ScienceLinköpings universitet

Linköping Studies in Science and TechnologyFaculty of Arts and Sciences - Licentiate Theses

No 17 Vojin Plavsic: Interleaved Processing of Non-Numerical Data Stored on a Cyclic Memory. (Available at:FOA, Box 1165, S-581 11 Linköping, Sweden. FOA Report B30062E)

No 28 Arne Jönsson, Mikael Patel: An Interactive Flowcharting Technique for Communicating and Realizing Al-gorithms, 1984.

No 29 Johnny Eckerland: Retargeting of an Incremental Code Generator, 1984.No 48 Henrik Nordin: On the Use of Typical Cases for Knowledge-Based Consultation and Teaching, 1985.No 52 Zebo Peng: Steps Towards the Formalization of Designing VLSI Systems, 1985.No 60 Johan Fagerström: Simulation and Evaluation of Architecture based on Asynchronous Processes, 1985.No 71 Jalal Maleki: ICONStraint, A Dependency Directed Constraint Maintenance System, 1987.No 72 Tony Larsson: On the Specification and Verification of VLSI Systems, 1986.No 73 Ola Strömfors: A Structure Editor for Documents and Programs, 1986.No 74 Christos Levcopoulos: New Results about the Approximation Behavior of the Greedy Triangulation, 1986.No 104 Shamsul I. Chowdhury: Statistical Expert Systems - a Special Application Area for Knowledge-Based Com-

puter Methodology, 1987.No 108 Rober Bilos: Incremental Scanning and Token-Based Editing, 1987.No 111 Hans Block: SPORT-SORT Sorting Algorithms and Sport Tournaments, 1987.No 113 Ralph Rönnquist: Network and Lattice Based Approaches to the Representation of Knowledge, 1987.No 118 Mariam Kamkar, Nahid Shahmehri: Affect-Chaining in Program Flow Analysis Applied to Queries of Pro-

grams, 1987.No 126 Dan Strömberg: Transfer and Distribution of Application Programs, 1987.No 127 Kristian Sandahl: Case Studies in Knowledge Acquisition, Migration and User Acceptance of Expert Sys-

tems, 1987. No 139 Christer Bäckström: Reasoning about Interdependent Actions, 1988.No 140 Mats Wirén: On Control Strategies and Incrementality in Unification-Based Chart Parsing, 1988.No 146 Johan Hultman: A Software System for Defining and Controlling Actions in a Mechanical System, 1988.No 150 Tim Hansen: Diagnosing Faults using Knowledge about Malfunctioning Behavior, 1988.No 165 Jonas Löwgren: Supporting Design and Management of Expert System User Interfaces, 1989.No 166 Ola Petersson: On Adaptive Sorting in Sequential and Parallel Models, 1989.No 174 Yngve Larsson: Dynamic Configuration in a Distributed Environment, 1989.No 177 Peter Åberg: Design of a Multiple View Presentation and Interaction Manager, 1989.No 181 Henrik Eriksson: A Study in Domain-Oriented Tool Support for Knowledge Acquisition, 1989.No 184 Ivan Rankin: The Deep Generation of Text in Expert Critiquing Systems, 1989.No 187 Simin Nadjm-Tehrani: Contributions to the Declarative Approach to Debugging Prolog Programs, 1989.No 189 Magnus Merkel: Temporal Information in Natural Language, 1989.No 196 Ulf Nilsson: A Systematic Approach to Abstract Interpretation of Logic Programs, 1989.No 197 Staffan Bonnier: Horn Clause Logic with External Procedures: Towards a Theoretical Framework, 1989.No 203 Christer Hansson: A Prototype System for Logical Reasoning about Time and Action, 1990.No 212 Björn Fjellborg: An Approach to Extraction of Pipeline Structures for VLSI High-Level Synthesis, 1990.No 230 Patrick Doherty: A Three-Valued Approach to Non-Monotonic Reasoning, 1990.No 237 Tomas Sokolnicki: Coaching Partial Plans: An Approach to Knowledge-Based Tutoring, 1990.No 250 Lars Strömberg: Postmortem Debugging of Distributed Systems, 1990.No 253 Torbjörn Näslund: SLDFA-Resolution - Computing Answers for Negative Queries, 1990.No 260 Peter D. Holmes: Using Connectivity Graphs to Support Map-Related Reasoning, 1991.No 283 Olof Johansson: Improving Implementation of Graphical User Interfaces for Object-Oriented Knowledge-

Bases, 1991.No 298 Rolf G Larsson: Aktivitetsbaserad kalkylering i ett nytt ekonomisystem, 1991.No 318 Lena Srömbäck: Studies in Extended Unification-Based Formalism for Linguistic Description: An Algorithm

for Feature Structures with Disjunction and a Proposal for Flexible Systems, 1992.No 319 Mikael Pettersson: DML-A Language and System for the Generation of Efficient Compilers from Denotatio-

nal Specification, 1992.No 326 Andreas Kågedal: Logic Programming with External Procedures: an Implementation, 1992.No 328 Patrick Lambrix: Aspects of Version Management of Composite Objects, 1992.No 333 Xinli Gu: Testability Analysis and Improvement in High-Level Synthesis Systems, 1992.No 335 Torbjörn Näslund: On the Role of Evaluations in Iterative Development of Managerial Support Sytems,

1992.No 348 Ulf Cederling: Industrial Software Development - a Case Study, 1992.No 352 Magnus Morin: Predictable Cyclic Computations in Autonomous Systems: A Computational Model and Im-

plementation, 1992.No 371 Mehran Noghabai: Evaluation of Strategic Investments in Information Technology, 1993.No 378 Mats Larsson: A Transformational Approach to Formal Digital System Design, 1993.No 380 Johan Ringström: Compiler Generation for Parallel Languages from Denotational Specifications, 1993.No 381 Michael Jansson: Propagation of Change in an Intelligent Information System, 1993.No 383 Jonni Harrius: An Architecture and a Knowledge Representation Model for Expert Critiquing Systems, 1993.No 386 Per Österling: Symbolic Modelling of the Dynamic Environments of Autonomous Agents, 1993.No 398 Johan Boye: Dependency-based Groudness Analysis of Functional Logic Programs, 1993.

No 402 Lars Degerstedt: Tabulated Resolution for Well Founded Semantics, 1993.No 406 Anna Moberg: Satellitkontor - en studie av kommunikationsmönster vid arbete på distans, 1993.No 414 Peter Carlsson: Separation av företagsledning och finansiering - fallstudier av företagsledarutköp ur ett agent-

teoretiskt perspektiv, 1994.No 417 Camilla Sjöström: Revision och lagreglering - ett historiskt perspektiv, 1994.No 436 Cecilia Sjöberg: Voices in Design: Argumentation in Participatory Development, 1994.No 437 Lars Viklund: Contributions to a High-level Programming Environment for a Scientific Computing, 1994.No 440 Peter Loborg: Error Recovery Support in Manufacturing Control Systems, 1994.FHS 3/94 Owen Eriksson: Informationssystem med verksamhetskvalitet - utvärdering baserat på ett verksamhetsinrik-

tat och samskapande perspektiv, 1994.FHS 4/94 Karin Pettersson: Informationssystemstrukturering, ansvarsfördelning och användarinflytande - En kompa-

rativ studie med utgångspunkt i två informationssystemstrategier, 1994.No 441 Lars Poignant: Informationsteknologi och företagsetablering - Effekter på produktivitet och region, 1994.No 446 Gustav Fahl: Object Views of Relational Data in Multidatabase Systems, 1994.No 450 Henrik Nilsson: A Declarative Approach to Debugging for Lazy Functional Languages, 1994.No 451 Jonas Lind: Creditor - Firm Relations: an Interdisciplinary Analysis, 1994.No 452 Martin Sköld: Active Rules based on Object Relational Queries - Efficient Change Monitoring Techniques,

1994.No 455 Pär Carlshamre: A Collaborative Approach to Usability Engineering: Technical Communicators and System

Developers in Usability-Oriented Systems Development, 1994.FHS 5/94 Stefan Cronholm: Varför CASE-verktyg i systemutveckling? - En motiv- och konsekvensstudie avseende ar-

betssätt och arbetsformer, 1994.No 462 Mikael Lindvall: A Study of Traceability in Object-Oriented Systems Development, 1994.No 463 Fredrik Nilsson: Strategi och ekonomisk styrning - En studie av Sandviks förvärv av Bahco Verktyg, 1994.No 464 Hans Olsén: Collage Induction: Proving Properties of Logic Programs by Program Synthesis, 1994.No 469 Lars Karlsson: Specification and Synthesis of Plans Using the Features and Fluents Framework, 1995.No 473 Ulf Söderman: On Conceptual Modelling of Mode Switching Systems, 1995.No 475 Choong-ho Yi: Reasoning about Concurrent Actions in the Trajectory Semantics, 1995.No 476 Bo Lagerström: Successiv resultatavräkning av pågående arbeten. - Fallstudier i tre byggföretag, 1995.No 478 Peter Jonsson: Complexity of State-Variable Planning under Structural Restrictions, 1995.FHS 7/95 Anders Avdic: Arbetsintegrerad systemutveckling med kalkylkprogram, 1995.No 482 Eva L Ragnemalm: Towards Student Modelling through Collaborative Dialogue with a Learning Compani-

on, 1995.No 488 Eva Toller: Contributions to Parallel Multiparadigm Languages: Combining Object-Oriented and Rule-Based

Programming, 1995.No 489 Erik Stoy: A Petri Net Based Unified Representation for Hardware/Software Co-Design, 1995.No 497 Johan Herber: Environment Support for Building Structured Mathematical Models, 1995.No 498 Stefan Svenberg: Structure-Driven Derivation of Inter-Lingual Functor-Argument Trees for Multi-Lingual

Generation, 1995.No 503 Hee-Cheol Kim: Prediction and Postdiction under Uncertainty, 1995.FHS 8/95 Dan Fristedt: Metoder i användning - mot förbättring av systemutveckling genom situationell metodkunskap

och metodanalys, 1995.FHS 9/95 Malin Bergvall: Systemförvaltning i praktiken - en kvalitativ studie avseende centrala begrepp, aktiviteter och

ansvarsroller, 1995.No 513 Joachim Karlsson: Towards a Strategy for Software Requirements Selection, 1995.No 517 Jakob Axelsson: Schedulability-Driven Partitioning of Heterogeneous Real-Time Systems, 1995.No 518 Göran Forslund: Toward Cooperative Advice-Giving Systems: The Expert Systems Experience, 1995.No 522 Jörgen Andersson: Bilder av småföretagares ekonomistyrning, 1995.No 538 Staffan Flodin: Efficient Management of Object-Oriented Queries with Late Binding, 1996.No 545 Vadim Engelson: An Approach to Automatic Construction of Graphical User Interfaces for Applications in

Scientific Computing, 1996.No 546 Magnus Werner : Multidatabase Integration using Polymorphic Queries and Views, 1996.FiF-a 1/96 Mikael Lind: Affärsprocessinriktad förändringsanalys - utveckling och tillämpning av synsätt och metod,

1996.No 549 Jonas Hallberg: High-Level Synthesis under Local Timing Constraints, 1996.No 550 Kristina Larsen: Förutsättningar och begränsningar för arbete på distans - erfarenheter från fyra svenska fö-

retag. 1996.No 557 Mikael Johansson: Quality Functions for Requirements Engineering Methods, 1996.No 558 Patrik Nordling: The Simulation of Rolling Bearing Dynamics on Parallel Computers, 1996.No 561 Anders Ekman: Exploration of Polygonal Environments, 1996.No 563 Niclas Andersson: Compilation of Mathematical Models to Parallel Code, 1996.No 567 Johan Jenvald: Simulation and Data Collection in Battle Training, 1996.No 575 Niclas Ohlsson: Software Quality Engineering by Early Identification of Fault-Prone Modules, 1996.No 576 Mikael Ericsson: Commenting Systems as Design Support—A Wizard-of-Oz Study, 1996.No 587 Jörgen Lindström: Chefers användning av kommunikationsteknik, 1996.No 589 Esa Falkenroth: Data Management in Control Applications - A Proposal Based on Active Database Systems,

1996.No 591 Niclas Wahllöf: A Default Extension to Description Logics and its Applications, 1996.No 595 Annika Larsson: Ekonomisk Styrning och Organisatorisk Passion - ett interaktivt perspektiv, 1997.No 597 Ling Lin: A Value-based Indexing Technique for Time Sequences, 1997.

No 598 Rego Granlund: C3Fire - A Microworld Supporting Emergency Management Training, 1997.No 599 Peter Ingels: A Robust Text Processing Technique Applied to Lexical Error Recovery, 1997.No 607 Per-Arne Persson: Toward a Grounded Theory for Support of Command and Control in Military Coalitions,

1997.No 609 Jonas S Karlsson: A Scalable Data Structure for a Parallel Data Server, 1997.FiF-a 4 Carita Åbom: Videomötesteknik i olika affärssituationer - möjligheter och hinder, 1997.FiF-a 6 Tommy Wedlund: Att skapa en företagsanpassad systemutvecklingsmodell - genom rekonstruktion, värde-

ring och vidareutveckling i T50-bolag inom ABB, 1997.No 615 Silvia Coradeschi: A Decision-Mechanism for Reactive and Coordinated Agents, 1997.No 623 Jan Ollinen: Det flexibla kontorets utveckling på Digital - Ett stöd för multiflex? 1997.No 626 David Byers: Towards Estimating Software Testability Using Static Analysis, 1997.No 627 Fredrik Eklund: Declarative Error Diagnosis of GAPLog Programs, 1997.No 629 Gunilla Ivefors: Krigsspel coh Informationsteknik inför en oförutsägbar framtid, 1997.No 631 Jens-Olof Lindh: Analysing Traffic Safety from a Case-Based Reasoning Perspective, 1997No 639 Jukka Mäki-Turja:. Smalltalk - a suitable Real-Time Language, 1997.No 640 Juha Takkinen: CAFE: Towards a Conceptual Model for Information Management in Electronic Mail, 1997.No 643 Man Lin: Formal Analysis of Reactive Rule-based Programs, 1997.No 653 Mats Gustafsson: Bringing Role-Based Access Control to Distributed Systems, 1997.FiF-a 13 Boris Karlsson: Metodanalys för förståelse och utveckling av systemutvecklingsverksamhet. Analys och vär-

dering av systemutvecklingsmodeller och dess användning, 1997.No 674 Marcus Bjäreland: Two Aspects of Automating Logics of Action and Change - Regression and Tractability,

1998.No 676 Jan Håkegård: Hiera rchical Test Architecture and Board-Level Test Controller Synthesis, 1998.No 668 Per-Ove Zetterlund: Normering av svensk redovisning - En studie av tillkomsten av Redovisningsrådets re-

kommendation om koncernredovisning (RR01:91), 1998.No 675 Jimmy Tjäder: Projektledaren & planen - en studie av projektledning i tre installations- och systemutveck-

lingsprojekt, 1998.FiF-a 14 Ulf Melin: Informationssystem vid ökad affärs- och processorientering - egenskaper, strategier och utveck-

ling, 1998.No 695 Tim Heyer: COMPASS: Introduction of Formal Methods in Code Development and Inspection, 1998.No 700 Patrik Hägglund: Programming Languages for Computer Algebra, 1998.FiF-a 16 Marie-Therese Christiansson: Inter-organistorisk verksamhetsutveckling - metoder som stöd vid utveckling

av partnerskap och informationssystem, 1998.No 712 Christina Wennestam: Information om immateriella resurser. Investeringar i forskning och utveckling samt

i personal inom skogsindustrin, 1998.No 719 Joakim Gustafsson: Extending Temporal Action Logic for Ramification and Concurrency, 1998.No 723 Henrik André-Jönsson: Indexing time-series data using text indexing methods, 1999.No 725 Erik Larsson: High-Level Testability Analysis and Enhancement Techniques, 1998.No 730 Carl-Johan Westin: Informationsförsörjning: en fråga om ansvar - aktiviteter och uppdrag i fem stora svenska

organisationers operativa informationsförsörjning, 1998.No 731 Åse Jansson: Miljöhänsyn - en del i företags styrning, 1998.No 733 Thomas Padron-McCarthy: Performance-Polymorphic Declarative Queries, 1998.No 734 Anders Bäckström: Värdeskapande kreditgivning - Kreditriskhantering ur ett agentteoretiskt perspektiv,

1998.FiF-a 21 Ulf Seigerroth: Integration av förändringsmetoder - en modell för välgrundad metodintegration, 1999.FiF-a 22 Fredrik Öberg: Object-Oriented Frameworks - A New Strategy for Case Tool Development, 1998.No 737 Jonas Mellin: Predictable Event Monitoring, 1998.No 738 Joakim Eriksson: Specifying and Managing Rules in an Active Real-Time Database System, 1998.FiF-a 25 Bengt E W Andersson: Samverkande informationssystem mellan aktörer i offentliga åtaganden - En teori om

aktörsarenor i samverkan om utbyte av information, 1998.No 742 Pawel Pietrzak: Static Incorrectness Diagnosis of CLP (FD), 1999.No 748 Tobias Ritzau: Real-Time Reference Counting in RT-Java, 1999.No 751 Anders Ferntoft: Elektronisk affärskommunikation - kontaktkostnader och kontaktprocesser mellan kunder

och leverantörer på producentmarknader,1999.No 752 Jo Skåmedal: Arbete på distans och arbetsformens påverkan på resor och resmönster, 1999.No 753 Johan Alvehus: Mötets metaforer. En studie av berättelser om möten, 1999.No 754 Magnus Lindahl: Bankens villkor i låneavtal vid kreditgivning till högt belånade företagsförvärv: En studie

ur ett agentteoretiskt perspektiv, 2000.No 766 Martin V. Howard: Designing dynamic visualizations of temporal data, 1999.No 769 Jesper Andersson: Towards Reactive Software Architectures, 1999.No 775 Anders Henriksson: Unique kernel diagnosis, 1999.FiF-a 30 Pär J. Ågerfalk: Pragmatization of Information Systems - A Theoretical and Methodological Outline, 1999. No 787 Charlotte Björkegren: Learning for the next project - Bearers and barriers in knowledge transfer within an

organisation, 1999.No 788 Håkan Nilsson: Informationsteknik som drivkraft i granskningsprocessen - En studie av fyra revisionsbyråer,

2000.No 790 Erik Berglund: Use-Oriented Documentation in Software Development, 1999.No 791 Klas Gäre: Verksamhetsförändringar i samband med IS-införande, 1999.No 800 Anders Subotic: Software Quality Inspection, 1999.No 807 Svein Bergum: Managerial communication in telework, 2000.

No 809 Flavius Gruian: Energy-Aware Design of Digital Systems, 2000.FiF-a 32 Karin Hedström: Kunskapsanvändning och kunskapsutveckling hos verksamhetskonsulter - Erfarenheter

från ett FOU-samarbete, 2000.No 808 Linda Askenäs: Affärssystemet - En studie om teknikens aktiva och passiva roll i en organisation, 2000.No 820 Jean Paul Meynard: Control of industrial robots through high-level task programming, 2000.No 823 Lars Hult: Publika Gränsytor - ett designexempel, 2000.No 832 Paul Pop: Scheduling and Communication Synthesis for Distributed Real-Time Systems, 2000.FiF-a 34 Göran Hultgren: Nätverksinriktad Förändringsanalys - perspektiv och metoder som stöd för förståelse och

utveckling av affärsrelationer och informationssystem, 2000.No 842 Magnus Kald: The role of management control systems in strategic business units, 2000.No 844 Mikael Cäker: Vad kostar kunden? Modeller för intern redovisning, 2000.FiF-a 37 Ewa Braf: Organisationers kunskapsverksamheter - en kritisk studie av ”knowledge management”, 2000.FiF-a 40 Henrik Lindberg: Webbaserade affärsprocesser - Möjligheter och begränsningar, 2000.FiF-a 41 Benneth Christiansson: Att komponentbasera informationssystem - Vad säger teori och praktik?, 2000.No. 854 Ola Pettersson: Deliberation in a Mobile Robot, 2000.No 863 Dan Lawesson: Towards Behavioral Model Fault Isolation for Object Oriented Control Systems, 2000.No 881 Johan Moe: Execution Tracing of Large Distributed Systems, 2001.No 882 Yuxiao Zhao: XML-based Frameworks for Internet Commerce and an Implementation of B2B

e-procurement, 2001.No 890 Annika Flycht-Eriksson: Domain Knowledge Management inInformation-providing Dialogue systems,

2001.FiF-a 47 Per-Arne Segerkvist: Webbaserade imaginära organisationers samverkansformer: Informationssystemarki-

tektur och aktörssamverkan som förutsättningar för affärsprocesser, 2001.No 894 Stefan Svarén: Styrning av investeringar i divisionaliserade företag - Ett koncernperspektiv, 2001.No 906 Lin Han: Secure and Scalable E-Service Software Delivery, 2001.No 917 Emma Hansson: Optionsprogram för anställda - en studie av svenska börsföretag, 2001.No 916 Susanne Odar: IT som stöd för strategiska beslut, en studie av datorimplementerade modeller av verksamhet

som stöd för beslut om anskaffning av JAS 1982, 2002.FiF-a-49 Stefan Holgersson: IT-system och filtrering av verksamhetskunskap - kvalitetsproblem vid analyser och be-

slutsfattande som bygger på uppgifter hämtade från polisens IT-system, 2001.FiF-a-51 Per Oscarsson:Informationssäkerhet i verksamheter - begrepp och modeller som stöd för förståelse av infor-

mationssäkerhet och dess hantering, 2001.No 919 Luis Alejandro Cortes: A Petri Net Based Modeling and Verification Technique for Real-Time Embedded

Systems, 2001.No 915 Niklas Sandell: Redovisning i skuggan av en bankkris - Värdering av fastigheter. 2001.No 931 Fredrik Elg: Ett dynamiskt perspektiv på individuella skillnader av heuristisk kompetens, intelligens, mentala

modeller, mål och konfidens i kontroll av mikrovärlden Moro, 2002.No 933 Peter Aronsson: Automatic Parallelization of Simulation Code from Equation Based Simulation Languages,

2002.No 938 Bourhane Kadmiry: Fuzzy Control of Unmanned Helicopter, 2002.No 942 Patrik Haslum: Prediction as a Knowledge Representation Problem: A Case Study in Model Design, 2002.No 956 Robert Sevenius: On the instruments of governance - A law & economics study of capital instruments in li-

mited liability companies, 2002.FiF-a 58 Johan Petersson: Lokala elektroniska marknadsplatser - informationssystem för platsbundna affärer, 2002.No 964 Peter Bunus: Debugging and Structural Analysis of Declarative Equation-Based Languages, 2002.No 973 Gert Jervan: High-Level Test Generation and Built-In Self-Test Techniques for Digital Systems, 2002.No 958 Fredrika Berglund: Management Control and Strategy - a Case Study of Pharmaceutical Drug Development,

2002.FiF-a 61 Fredrik Karlsson: Meta-Method for Method Configuration - A Rational Unified Process Case, 2002.No 985 Sorin Manolache: Schedulability Analysis of Real-Time Systems with Stochastic Task Execution Times,

2002.No 982 Diana Szentiványi: Performance and Availability Trade-offs in Fault-Tolerant Middleware, 2002.No 989 Iakov Nakhimovski: Modeling and Simulation of Contacting Flexible Bodies in Multibody Systems, 2002.No 990 Levon Saldamli: PDEModelica - Towards a High-Level Language for Modeling with Partial Differential

Equations, 2002.No 991 Almut Herzog: Secure Execution Environment for Java Electronic Services, 2002.No 999 Jon Edvardsson: Contributions to Program- and Specification-based Test Data Generation, 2002No 1000 Anders Arpteg: Adaptive Semi-structured Information Extraction, 2002.No 1001 Andrzej Bednarski: A Dynamic Programming Approach to Optimal Retargetable Code Generation for

Irregular Architectures, 2002.No 988 Mattias Arvola: Good to use! : Use quality of multi-user applications in the home, 2003.FiF-a 62 Lennart Ljung: Utveckling av en projektivitetsmodell - om organisationers förmåga att tillämpa

projektarbetsformen, 2003.No 1003 Pernilla Qvarfordt: User experience of spoken feedback in multimodal interaction, 2003.No 1005 Alexander Siemers: Visualization of Dynamic Multibody Simulation With Special Reference to Contacts,

2003.No 1008 Jens Gustavsson: Towards Unanticipated Runtime Software Evolution, 2003.No 1010 Calin Curescu: Adaptive QoS-aware Resource Allocation for Wireless Networks, 2003.No 1015 Anna Andersson: Management Information Systems in Process-oriented Healthcare Organisations, 2003.No 1018 Björn Johansson: Feedforward Control in Dynamic Situations, 2003.No 1022 Traian Pop: Scheduling and Optimisation of Heterogeneous Time/Event-Triggered Distributed Embedded

Systems, 2003.FiF-a 65 Britt-Marie Johansson: Kundkommunikation på distans - en studie om kommunikationsmediets betydelse i

affärstransaktioner, 2003.

No 1024 Aleksandra Tešanovic: Towards Aspectual Component-Based Real-Time System Development, 2003.No 1034 Arja Vainio-Larsson: Designing for Use in a Future Context - Five Case Studies in Retrospect, 2003.No 1033 Peter Nilsson: Svenska bankers redovisningsval vid reservering för befarade kreditförluster - En studie vid

införandet av nya redovisningsregler, 2003.FiF-a 69 Fredrik Ericsson: Information Technology for Learning and Acquiring of Work Knowledge, 2003.No 1049 Marcus Comstedt: Towards Fine-Grained Binary Composition through Link Time Weaving, 2003.No 1052 Åsa Hedenskog: Increasing the Automation of Radio Network Control, 2003.No 1054 Claudiu Duma: Security and Efficiency Tradeoffs in Multicast Group Key Management, 2003.FiF-a 71 Emma Eliason: Effektanalys av IT-systems handlingsutrymme, 2003.No 1055 Carl Cederberg: Experiments in Indirect Fault Injection with Open Source and Industrial Software, 2003.No 1058 Daniel Karlsson: Towards Formal Verification in a Component-based Reuse Methodology, 2003.FiF-a 73 Anders Hjalmarsson: Att etablera och vidmakthålla förbättringsverksamhet - behovet av koordination och

interaktion vid förändring av systemutvecklingsverksamheter, 2004.No 1079 Pontus Johansson: Design and Development of Recommender Dialogue Systems, 2004.No 1084 Charlotte Stoltz: Calling for Call Centres - A Study of Call Centre Locations in a Swedish Rural Region,

2004.FiF-a 74 Björn Johansson: Deciding on Using Application Service Provision in SMEs, 2004.No 1094 Genevieve Gorrell: Language Modelling and Error Handling in Spoken Dialogue Systems, 2004.No 1095 Ulf Johansson: Rule Extraction - the Key to Accurate and Comprehensible Data Mining Models, 2004.No 1099 Sonia Sangari: Computational Models of Some Communicative Head Movements, 2004.No 1110 Hans Nässla: Intra-Family Information Flow and Prospects for Communication Systems, 2004.No 1116 Henrik Sällberg: On the value of customer loyalty programs - A study of point programs and switching costs,

2004.FiF-a 77 Ulf Larsson: Designarbete i dialog - karaktärisering av interaktionen mellan användare och utvecklare i en

systemutvecklingsprocess, 2004.No 1126 Andreas Borg: Contribution to Management and Validation of Non-Functional Requirements, 2004.No 1127 Per-Ola Kristensson: Large Vocabulary Shorthand Writing on Stylus Keyboard, 2004.No 1132 Pär-Anders Albinsson: Interacting with Command and Control Systems: Tools for Operators and Designers,

2004.No 1130 Ioan Chisalita: Safety-Oriented Communication in Mobile Networks for Vehicles, 2004.No 1138 Thomas Gustafsson: Maintaining Data Consistency im Embedded Databases for Vehicular Systems, 2004.No 1149 Vaida Jakoniené: A Study in Integrating Multiple Biological Data Sources, 2005.No 1156 Abdil Rashid Mohamed: High-Level Techniques for Built-In Self-Test Resources Optimization, 2005.No 1162 Adrian Pop: Contributions to Meta-Modeling Tools and Methods, 2005.No 1165 Fidel Vascós Palacios: On the information exchange between physicians and social insurance officers in the

sick leave process: an Activity Theoretical perspective, 2005.FiF-a 84 Jenny Lagsten: Verksamhetsutvecklande utvärdering i informationssystemprojekt, 2005.No 1166 Emma Larsdotter Nilsson: Modeling, Simulation, and Visualization of Metabolic Pathways Using Modelica,

2005.No 1167 Christina Keller: Virtual Learning Environments in higher education. A study of students’ acceptance of edu-

cational technology, 2005.No 1168 Cécile Åberg: Integration of organizational workflows and the Semantic Web, 2005.FiF-a 85 Anders Forsman: Standardisering som grund för informationssamverkan och IT-tjänster - En fallstudie

baserad på trafikinformationstjänsten RDS-TMC, 2005.No 1171 Yu-Hsing Huang: A systemic traffic accident model, 2005.FiF-a 86 Jan Olausson: Att modellera uppdrag - grunder för förståelse av processinriktade informationssystem i trans-

aktionsintensiva verksamheter, 2005.No 1172 Petter Ahlström: Affärsstrategier för seniorbostadsmarknaden, 2005.No 1183 Mathias Cöster: Beyond IT and Productivity - How Digitization Transformed the Graphic Industry, 2005.No 1184 Åsa Horzella: Beyond IT and Productivity - Effects of Digitized Information Flows in Grocery Distribution,

2005.No 1185 Maria Kollberg: Beyond IT and Productivity - Effects of Digitized Information Flows in the Logging

Industry, 2005.No 1190 David Dinka: Role and Identity - Experience of technology in professional settings, 2005.No 1191 Andreas Hansson: Increasing the Storage Capacity of Recursive Auto-associative Memory by Segmenting

Data, 2005.No 1192 Nicklas Bergfeldt: Towards Detached Communication for Robot Cooperation, 2005.No 1194 Dennis Maciuszek: Towards Dependable Virtual Companions for Later Life, 2005.No 1204 Beatrice Alenljung: Decision-making in the Requirements Engineering Process: A Human-centered

Approach, 2005No 1206 Anders Larsson: System-on-Chip Test Scheduling and Test Infrastructure Design, 2005.No 1207 John Wilander: Policy and Implementation Assurance for Software Security, 2005.No 1209 Andreas Käll: Översättningar av en managementmodell - En studie av införandet av Balanced Scorecard i ett

landsting, 2005.No 1225 He Tan: Aligning and Merging Biomedical Ontologies, 2006.No 1228 Artur Wilk: Descriptive Types for XML Query Language Xcerpt, 2006.No 1229 Per Olof Pettersson: Sampling-based Path Planning for an Autonomous Helicopter, 2006.No 1231 Kalle Burbeck: Adaptive Real-time Anomaly Detection for Safeguarding Critical Networks, 2006.No 1233 Daniela Mihailescu: Implementation Methodology in Action: A Study of an Enterprise Systems Implemen-

tation Methodology, 2006.No 1244 Jörgen Skågeby: Public and Non-public gifting on the Internet, 2006.No 1248 Karolina Eliasson: The Use of Case-Based Reasoning in a Human-Robot Dialog System, 2006.No 1263 Misook Park-Westman: Managing Competence Development Programs in a Cross-Cultural Organisation-

What are the Barriers and Enablers, 2006. FiF-a 90 Amra Halilovic: Ett praktikperspektiv på hantering av mjukvarukomponenter, 2006.No 1272 Raquel Flodström: A Framework for the Strategic Management of Information Technology, 2006.

No 1277 Viacheslav Izosimov: Scheduling and Optimization of Fault-Tolerant Embedded Systems, 2006.No 1283 Håkan Hasewinkel: A Blueprint for Using Commercial Games off the Shelf in Defence Training, Education

and Research Simulations, 2006.FiF-a 91 Hanna Broberg: Verksamhetsanpassade IT-stöd - Designteori och metod, 2006.No 1286 Robert Kaminski: Towards an XML Document Restructuring Framework, 2006No 1293 Jiri Trnka: Prerequisites for data sharing in emergency management, 2007.No 1302 Björn Hägglund: A Framework for Designing Constraint Stores, 2007.No 1303 Daniel Andreasson: Slack-Time Aware Dynamic Routing Schemes for On-Chip Networks, 2007.No 1305 Magnus Ingmarsson: Modelling User Tasks and Intentions for Service Discovery in Ubiquitous Computing,

2007.No 1306 Gustaf Svedjemo: Ontology as Conceptual Schema when Modelling Historical Maps for Database Storage,

2007.No 1307 Gianpaolo Conte: Navigation Functionalities for an Autonomous UAV Helicopter, 2007.No 1309 Ola Leifler: User-Centric Critiquing in Command and Control: The DKExpert and ComPlan Approaches,

2007.No 1312 Henrik Svensson: Embodied simulation as off-line representation, 2007.No 1313 Zhiyuan He: System-on-Chip Test Scheduling with Defect-Probability and Temperature Considerations,

2007.No 1317 Jonas Elmqvist: Components, Safety Interfaces and Compositional Analysis, 2007.No 1320 Håkan Sundblad: Question Classification in Question Answering Systems, 2007.No 1323 Magnus Lundqvist: Information Demand and Use: Improving Information Flow within Small-scale Business

Contexts, 2007.No 1329 Martin Magnusson: Deductive Planning and Composite Actions in Temporal Action Logic, 2007.No 1331 Mikael Asplund: Restoring Consistency after Network Partitions, 2007.No 1332 Martin Fransson: Towards Individualized Drug Dosage - General Methods and Case Studies, 2007.No 1333 Karin Camara: A Visual Query Language Served by a Multi-sensor Environment, 2007.No 1337 David Broman: Safety, Security, and Semantic Aspects of Equation-Based Object-Oriented Languages and

Environments, 2007.No 1339 Mikhail Chalabine: Invasive Interactive Parallelization, 2007.No 1351 Susanna Nilsson: A Holistic Approach to Usability Evaluations of Mixed Reality Systems, 2008.No 1353 Shanai Ardi: A Model and Implementation of a Security Plug-in for the Software Life Cycle, 2008.No 1356 Erik Kuiper: Mobility and Routing in a Delay-tolerant Network of Unmanned Aerial Vehicles, 2008.No 1359 Jana Rambusch: Situated Play, 2008.No 1361 Martin Karresand: Completing the Picture - Fragments and Back Again, 2008.No 1363 Per Nyblom: Dynamic Abstraction for Interleaved Task Planning and Execution, 2008.No 1371 Fredrik Lantz:Terrain Object Recognition and Context Fusion for Decision Support, 2008.No 1373 Martin Östlund: Assistance Plus: 3D-mediated Advice-giving on Pharmaceutical Products, 2008.No 1381 Håkan Lundvall: Automatic Parallelization using Pipelining for Equation-Based Simulation Languages,

2008.