esto es una cagada

176
Challenges and Applications of Assembly-Level Software Model Checking Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften an der Universität Dortmund am Fachbereich Informatik von Tilman Mehler Dortmund 2005

Upload: marcio-diaz

Post on 09-Dec-2015

17 views

Category:

Documents


0 download

DESCRIPTION

lalala

TRANSCRIPT

Page 1: Esto es una cagada

Challenges and Applications of Assembly-LevelSoftware Model Checking

Dissertationzur Erlangung des Grades eines

Doktors der Naturwissenschaftenan der Universität Dortmundam Fachbereich Informatik

von

Tilman Mehler

Dortmund

2005

Page 2: Esto es una cagada

2

XXXXXXXXXXXXXXXXXXXXXTag der mündlichen Prüfung:

Dekan/Dekanin: Prof. Bernhard Steffen

Gutachter: Dr. Stefan Edelkamp,Prof. Katharina Morik.

Note: ausgezeichnet sehr gut gut genügend

Page 3: Esto es una cagada

Für meine Eltern.

Page 4: Esto es una cagada

4

Page 5: Esto es una cagada

Acknowledgements

Hereby, I would like to thank my advisor Dr. Stefan Edelkamp, for his excellent guid-ance. Without his various original ideas, his broad theoretical and practical knowledge,his reviews and moral support, this thesis would not exist. Moreover, I thank Prof. Dr.Katharina Morik and Dr. Doris Schmedding who agreed to review my thesis. Also, Iwant to thank Shahid Jabbar for his support and his reviews. Further acknowledge-ments must go to the German Research Society (DFG) who financially supported myresearch through the project Directed Model Checking with Exploration Algorithms ofArtificial Intelligence. Last, but not least, I would like to thank my girlfriend Frida aswell as my friends and family for their moral support.

5

Page 6: Esto es una cagada

6

Page 7: Esto es una cagada

Summary

This thesis addresses the application of a formal method called Model Checking to thedomain of software verification. Here, exploration algorithms are used to search forerrors in a program. In contrast to the majority of other approaches, we claim that thesearch should be applied to the actual source code of the program, rather than to someformal model.

There are several challenges that need to be overcome to build such a model checker.First, the tool must be capable to handle the full semantics of the underlying program-ming language. This implies a considerable amount of additional work unless the inter-pretation of the program is done by some existing infrastructure. The second challengelies in the increased memory requirements needed to memorize entire program configu-rations. This additionally aggravates the problem of large state spaces that every modelchecker faces anyway. As a remedy to the first problem, the thesis proposes to use an ex-isting virtual machine to interpret the program. This takes the burden off the developer,who can fully concentrate on the model checking algorithms. To address the problem oflarge program states, we call attention to the fact that most transitions in a program onlychange small fractions of the entire program state. Based on this observation, we devisean incremental storing of states which considerably lowers the memory requirements ofprogram exploration. To further alleviate the per-state memory requirement, we applystate reconstruction, where states are no longer memorized explicitly but through theirgenerating path. Another problem that results from the large state description of a pro-gram lies in the computational effort of hashing, which is exceptionally high for the usedapproach. Based on the same observation as used for the incremental storing of states,we devise an incremental hash function which only needs to process the changed partsof the program’s state. Due to the dynamic nature of computer programs, this is not atrivial task and constitutes a considerable part of the overall thesis.

Moreover, the thesis addresses a more general problem of model checking - the stateexplosion, which says that the number of reachable states grows exponentially in thenumber of state components. To minimize the number of states to be memorized, thethesis concentrates on the use of heuristic search. It turns out that only a fraction of allreachable states needs to be visited to find a specific error in the program. Heuristicscan greatly help to direct the search forwards the error state. As another effective wayto reduce the number of memorized states, the thesis proposes a technique that skipsintermediate states that do not affect shared resources of the program. By merging sev-eral consecutive state transitions to a single transition, the technique may considerablytruncate the search tree.

7

Page 8: Esto es una cagada

8

The proposed approach is realized in StEAM, a model checker for concurrent C++ pro-grams, which was developed in the course of the thesis. Building on an existing virtualmachine, the tool provides a set of blind and directed search algorithms for the detectionof errors in the actual C++ implementation of a program. StEAM implements all of theaforesaid techniques, whose effectiveness is experimentally evaluated at the end of thethesis.

Moreover, we exploit the relation between model checking and planning. The claim is,that the two fields of research have great similarities and that technical advances in onefields can easily carry over to the other. The claim is supported by a case study whereStEAM is used as a planner for concurrent multi-agent systems.

The thesis also contains a user manual for StEAM and technical details that facilitateunderstanding the engineering process of the tool.

Page 9: Esto es una cagada

Contents

1 Introduction 151.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1.1 Deadly Software Errors . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1.2 Expensive Software Errors . . . . . . . . . . . . . . . . . . . . . . . 15

1.1.3 Software Engineering Process . . . . . . . . . . . . . . . . . . . . . 16

1.1.4 Call for Formal Methods . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 History and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Model Checking 212.1 Classical Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.1 Graph-Based Formalisms . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.2 Models, Systems, Programs, Protocols . . . . . . . . . . . . . . . . . 22

2.1.3 Definition of Model Checking . . . . . . . . . . . . . . . . . . . . . . 23

2.1.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1.5 Temporal Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.6 Types of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Towards a Fault-Free Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.1 Need for Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 Classical Model Checkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 SPIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.2 SMV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Model Checking Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.1 The Classical Approach . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.2 The Modern Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.3 Some Software Model Checkers . . . . . . . . . . . . . . . . . . . . . 32

3 State Space Search 39

9

Page 10: Esto es una cagada

10 CONTENTS

3.1 General State Expansion Algorithm . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Undirected Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2.1 Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.2 Depth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.3 Depth-First Iterative Deepening . . . . . . . . . . . . . . . . . . . . 42

3.3 Heuristic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.1 Best-first Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.2 A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.3 IDA* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 StEAM 474.1 The Model Checker StEAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Architecture of the Internet C Virtual Machine . . . . . . . . . . . . . . . . 47

4.3 Enhancements to IVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.1 States in StEAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.2 The State Description . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.3.3 Command Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3.4 Multi-Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3.5 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Summary of StEAM Features . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.1 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.4.2 Multi-Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.4.3 Special-Purpose Statements of StEAM . . . . . . . . . . . . . . . . . 58

4.4.4 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.4.5 Detecting Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4.6 Detecting Illegal Memory Accesses . . . . . . . . . . . . . . . . . . . 60

4.5 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5.1 Error-Specific Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.5.2 Structural Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 Planning vs. Model Checking 675.1 Similarities and Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.3 Multi Agent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4 Concurrent Multiagent Systems in StEAM . . . . . . . . . . . . . . . . . . 74

5.4.1 Interleaving Planning and Simulation . . . . . . . . . . . . . . . . . 75

5.4.2 Combined Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Page 11: Esto es una cagada

CONTENTS 11

5.5 Case Study: Multiagent Manufacturing Problem . . . . . . . . . . . . . . 77

5.5.1 Example Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.6 Search Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.6.1 State Space Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.6.2 Evaluation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Hashing 836.1 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.2 Explicit Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2.1 Bit-State Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.2.2 Sequential Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2.3 Hash Compaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.3 Partial Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.4 Incremental Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.5 Static Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.5.1 Rabin and Karp Hashing . . . . . . . . . . . . . . . . . . . . . . . . 90

6.5.2 Recursive Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.5.3 Static Incremental Hashing . . . . . . . . . . . . . . . . . . . . . . . 92

6.5.4 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.6 Examples for Static Incremental Hashing . . . . . . . . . . . . . . . . . . . 95

6.6.1 Incremental Hashing in the (n2 − 1)-Puzzle . . . . . . . . . . . . . . 95

6.6.2 Incremental Hashing in Atomix . . . . . . . . . . . . . . . . . . . . . 96

6.6.3 Incremental Hashing in Propositional Planning . . . . . . . . . . . 98

6.7 Hashing Dynamic State Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.7.1 Stacks and Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.7.2 Component Insertion and Removal . . . . . . . . . . . . . . . . . . 101

6.7.3 Combined Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.8 Hashing Structured State Vectors . . . . . . . . . . . . . . . . . . . . . . . 102

6.8.1 The Linear Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.8.2 Balanced Tree Method . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.9 Incremental Hashing in StEAM . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.10 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.10.1 Zobrist Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.10.2 Incremental Heap Canonicalization . . . . . . . . . . . . . . . . . . 105

7 State Reconstruction 107

Page 12: Esto es una cagada

12 CONTENTS

7.1 Memory Limited Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.2 State Reconstruction in StEAM . . . . . . . . . . . . . . . . . . . . . . . . . 108

7.3 Key Sates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

8 Experiments 1138.1 Example Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.1.1 Choosing the Right Examples . . . . . . . . . . . . . . . . . . . . . . 113

8.1.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.1.3 Dining Philosophers . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.1.4 Optical Telegraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.1.5 Leader Election . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

8.1.6 CashIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

8.2 Experiments on Heuristic Search . . . . . . . . . . . . . . . . . . . . . . . 117

8.3 Experiments for State Reconstruction . . . . . . . . . . . . . . . . . . . . . 124

8.4 Experiments on Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

8.5 Experiments on Multi Agent Planning . . . . . . . . . . . . . . . . . . . . . 128

9 Conclusion 1319.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

9.1.1 Challenges and Remedies . . . . . . . . . . . . . . . . . . . . . . . . 133

9.1.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9.1.3 Theroretical Contributions . . . . . . . . . . . . . . . . . . . . . . . . 134

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

9.2.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

9.2.2 Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

9.3 Hash Compaction and Bitstate-Hashing . . . . . . . . . . . . . . . . . . . . 137

9.4 State Memorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

9.4.1 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 138

9.5 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

A Technical Information for Developers 141A.1 IVM Source Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

A.2 Additional Source files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

A.3 Further Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

B StEAM User Manual 147B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Page 13: Esto es una cagada

CONTENTS 13

B.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

B.2.1 Installing the Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . 147

B.2.2 Installation of StEAM . . . . . . . . . . . . . . . . . . . . . . . . . . 148

B.3 A First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

B.4 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

B.4.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

B.4.2 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

B.4.3 Memory Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

B.4.4 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

B.4.5 Debug Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

B.4.6 Illegal Memory Accesses . . . . . . . . . . . . . . . . . . . . . . . . . 152

B.4.7 Other Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

B.5 Verfying your own Programs . . . . . . . . . . . . . . . . . . . . . . . . . . 153

B.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

B.5.2 The Snack Vendor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

B.5.3 Finding, interpreting and fixing errors . . . . . . . . . . . . . . . . . 155

B.6 Special-Purpose Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

B.7 Status Quo and Future of StEAM . . . . . . . . . . . . . . . . . . . . . . . . 156

C Paper Contributions 161

Page 14: Esto es una cagada

14 CONTENTS

Page 15: Esto es una cagada

Chapter 1

Introduction

1.1 Motivation

The increasing role of software in safety-critical areas imposes new challenges to com-puter science. Software systems often contain subtle errors, which remain undetectedby manual code inspection or pure testing. Errors, that when they eventually arise afterdeployment may cause heavy financial damage or even the loss of human lives. We namea few examples.

1.1.1 Deadly Software Errors

The Lufthansa Warsaw AccidentIn 1993 the Lufhansa flight DLH 2904 from Frankfurt to Warsaw ended in a disasterwhen the aircraft tried to land at its destination airport. At that time, the software whichcontrols the thrust reversers would not allow them to activate unless the aircraft’s gearswere down and the aircraft’s weight was pressing on the wheels. As an unfortunatecoincidence, heavy wind lifted up the aircraft resulting in a lack of compression on theleft gear leg. As a result, the thrust reversers were started with a delay of 9 seconds,which caused the aircraft to leave the runway and crash into the surrounding field. Onecrew member and one of the passengers died.

Therac-25 Accident The Therac-25 was a medical linear accelerator that could be usedto destroy tumors while inflicting relatively little damage to the surrounding tissue.A complex bug in the control software caused six cases of heavy radiation overdosesbetween 1985 and 1987, causing death and severe injuries to patients.

1.1.2 Expensive Software Errors

Ariane 5 1

“On June 4, 1996 an unmanned Ariane 5 rocket launched by the European Space Agency(ESA) exploded just forty seconds after its lift-off from Kourou, French Guyana at an

1from Scientific Digital Visions, Inc - http://www.dataenabled.com/

15

Page 16: Esto es una cagada

16 CHAPTER 1. INTRODUCTION

altitude of 3,700 meters. The rocket was on its first voyage, after a decade of developmentcosting $7 billion. The destroyed rocket and its cargo were valued at $500 million ... Itturned out that the cause of the failure was a software error in the inertial referencesystem. Specifically, a 64 bit floating point number relating to the horizontal velocityof the rocket with respect to the platform was converted to a 16 bit signed integer. Thenumber was larger than 32,768, the largest integer storable in a 16 bit signed integer,and thus the conversion failed. ...”

Mars Climate Orbiter 2

“The Mars Climate Orbiter is part of a series of missions in a long-term program of Marsexploration. ... the spacecraft was lost on September 23, 1999 while entering the orbitaround Mars. After an internal peer review at JPL [Jet Propulsion Laboratory], severalproblems were discovered. The review indicated a failure to recognize and correct anerror in a transfer of data between the Mars Climate Orbiter spacecraft team in Col-orado and the mission navigation team in California. In particular, there was a failedtranslation of English units into metric units in a segment of ground-based, navigation-related mission software. As a result, the spacecraft was lost. In order to prevent futureproblems, all data and engineering processes related to the Mars Polar Lander werescrutinized.”

Note: $150M lost.Another popular example is the software bug of the Mars path finder explained in section2.4.

1.1.3 Software Engineering Process

Many industrial software projects obey the so-called waterfall model, which distinguish-es six phases: requirements analysis, design, implementation, testing, integration andmaintainance. These are illustrated in Figure 1.1.

The model suggests an iterative process of software development, which allows theproject to switch between neighbouring phases. For instance, the software engineersmay fall back from the testing phase to the implementation phase, if testing exposesan error and they may further fall back to the design phase, if during source investiga-tion it turns out that the cause is a general flaw in the system’s design, rather than asimple implementation error. An obvious observation is, that correcting an error getsmore expensive if it occurs at a later phase of the development process. In the designphase of the project, the required functionalities are usually specified with specializedmodeling languages, such as Z. These languages provide a standardized notation for thebehavior of program functions, such that their correctness can be verified through man-ual mathematical proofs. Then, the formal specification is implemented in the targetedprogramming language. Before the software is integrated, the correct functionality istested in a simulated environment.

2from Scientific Digital Visions, Inc - http://www.dataenabled.com/

Page 17: Esto es una cagada

1.1. MOTIVATION 17

Figure 1.1: The waterfall model..

1.1.4 Call for Formal Methods

Although, industrial software must undergo rigorous testing phases before being ship-ped, the software errors mentioned in section 1.1.1 and 1.1.2 passed unnoticed, simplybecause their occurrence was unlikely. Thus, a simulation of the system will insteadrepeatedly visit the more likely configurations, while some less likely are never encoun-tered.

As a reaction, over the last years research has spent considerable effort on formal meth-ods, which allow to verify software in more systematic way than testing. The method ofmodel checking has shown to be one of the most promising verification techniques un-til now. It is a highly generic verification scheme, which had successfully been appliedto other areas such as process engineering and the verification of hardware. Based on aformal model of the investigated system, model checking uses exploration algorithms fora systematical enumeration of all reachable states. For each state, it is checked whetherit fulfills certain properties.

There are several advantages of model checking software compared to testing . First ofall, the exploration treats state transitions independent from their probability in realprogram execution. Thus, model checking can find subtle errors in programs which willmost probably not arise during testing. Second, testing can merely provide informationabout the presence of errors but not about their cause. In contrast to testing, modelchecking builds a search tree of visited program states. Upon discovery of an error, theuser is provided with the sequence of states (trail) that leads from the initial configu-ration to the error state. That information can be used to track down the error in theprogram much faster than given only the actual error state. As another advantage overtesting, model checkers allow validating properties to be formulated using temporal log-ics such as LTL. That way, the user can precisely specify the expected behavior of the

Page 18: Esto es una cagada

18 CHAPTER 1. INTRODUCTION

program.

Most current approaches for model checking programs rely on abstract models. Thesemust either be constructed manually by the user or they are (semi-)automatically ex-tracted from the source code of the inspected program. The use of abstract models canconsiderably reduce the memory requirements for model checking a program, but it alsoyields some drawbacks. Manual construction of the model is time consuming and errorprone and may not correctly reflect the actual program. Also, the abstraction may hideexactly those aspects of the program which lead to errors in practice. Furthermore, thetrail leading to some error also refers to the abstract model and must be mapped to theoriginal program in order to find the source of the error.

As a recent trend, tools such as the Java model checker JPF [HP00] are capable of ex-ploring the compiled program rather than an abstract model using a virtual machine.This resolves the drawbacks discussed above: Fist of all, the construction of a model isno longer required, which saves time. Furthermore, errors can no longer be abstracted-away, since the explored states represent the actual program configuration encoded inthe processor registers and the computer’s main memory. Also, the returned trail di-rectly corresponds to positions in the programs source code and can be interpreted mucheasier than a sequence of states in the abstract model.

The single drawback of this approach lies in the high memory requirements of assembly-level program model checking. Due to the dynamic nature of actual programs, their statedescription can grow to an arbitrary size and storing the expanded states may quicklyexceed the available memory of the underlying system. Thus research on unabstractedsoftware model checking must mainly focus on reducing memory requirements.

1.2 History and Goals

The goal of this thesis is to give a clear vision of assembly-level software model check-ing. It introduces an experimental C++ model checker called StEAM (State ExploringAssembly Model Checker), which was developed in the course of the author’s research.

StEAM can be used to detect errors in different phases of the software developmentprocess (cf. section 1.1.3). In the design and implementation phase, the engineers maybuild a prototype of the software, which reflects the system specification through a set ofrudimentary C++ functions. This prototype can then be used to search for specificationerrors at the source code level. In later stages of development, more details are added tothe prototype, while the system should be checked for errors at regular intervals. Alongwith the implementation progress, the type of errors that are searched for, graduallyshift from specification to implementation errors.

As one advantage, this approach does not impose an additional overhead for constructinga formal model from the system specification in order to check properties in the designphase. Moreover, there is no need for the engineer to interpret error information froma formal model to the source code of the program - this is discussed more detailed insections 2.4.1 and 2.4.2.

The tool will be used to exemplify the problems and challenges of unabstracted softwaremodel checking and possible solutions to them. As a clear focus, we concentrate on

Page 19: Esto es una cagada

1.3. STRUCTURE 19

approaches that help to reduce the memory requirements. We will discuss methods forcompact storing of states - such as collapse compression and bitstate hashing and howthey apply in the context of C++ model checking. As another important issue we dealwith methods that reduce the number of explored states, including the atomic executionof code blocks and the use of heuristics. Furthermore, we discuss time/space tradeoffslike state reconstruction, which accept a higher computational effort in exchange forlower memory requirements.

Each approach is theoretically discussed, implemented and empirically tested. In theend, the sum of the most effective techniques should allow us to apply unabstractedmodel checking on some simple but yet non-trivial programs.

Moreover, we discuss other tools used for model checking of software and point out thedifferences to our approach.

1.3 Structure

The thesis is structured as follows: In Chapter 2, we give an introduction to modelchecking and its application to software. Then, Chapter 3 introduces to general statespace search and heuristics, which is essential for model checking. In Chapter 4, wegive an introduction to the new assembly-level model checker StEAM, its capabilitiesand the supported heuristics. We also comment on how StEAM differs from currentsate-of-the-art software model checkers. Then, in chapter 5 we point out the similaritiesbetween model checking and action planning. To emphasize the relation between thetwo fields, we use StEAM as a planner for concurrent multi-agent systems. Chapter6 is dedicated to finding an efficient hashing scheme for StEAM, which turns out tobe one of the core challenges in assembly-level software model checking. In Chapter7, we discuss the method of state reconstruction as a way to alleviate the high memoryrequirements faced by tools that model check actual programs. In Chapter 8, we conductexperiments on heuristic search, hashing and state reconstruction. Finally, in Chapter9 we summarize the contribution of the thesis and discuss future work.

The appendix contains the user manual of StEAM as well as technical details about thetool’s implementation.

Page 20: Esto es una cagada

20 CHAPTER 1. INTRODUCTION

Page 21: Esto es una cagada

Chapter 2

Model Checking

2.1 Classical Model Checking

Model checking is a formal verification method for state based systems, which has beensuccessfully applied in various fields, including process engineering, hardware designand protocol verification. In this section, we first clarify some terms that will be usedthroughout the thesis, some of which interchangeably as synonyms. Section 2.1.4 givesan example of an elevator system and checks it against some formal properties. It fin-ishes by motivating the use of model checking and introducing to two of the most impor-tant classical model checkers.

2.1.1 Graph-Based Formalisms

Model checking generally refers to search in directed graphs. However, the literaturedoes not obey to a standard formalism to denote a state space that is explored by a modelchecker. Some authors refer to or Kripke-Structures [LMS02], others to (deterministic)finite automata [AY98]. In some practically oriented papers, no formalism is given atall and the authors simply refer to states that are expanded [VHBP00b]. In fact, mostgraph-based formalisms used in model checking literature are equal in their expres-siveness. We will exemplify this by giving a conversion between Kripke-Structures anddeterministic finite automata.

Deterministic Finite Automata (DFA) are quadruples (S, I,A, T ), where S is a setof states, I ∈ S is the initial state, A a set of actions, and T : S × A → S is a partial

transition function mapping state/action pairs to successor states. In the theory of for-mal languages, a DFA additionally holds a set E ⊆ S of accepting states. A DFA is saidto accept a word w ∈ A∗, if starting at I, the corresponding sequence of actions leadsto an accepting state. In model checking, automata are used to model the configurationspace of a system, rather than to define formal languages, accepting states are usuallynot mentioned in definitions.

21

Page 22: Esto es una cagada

22 CHAPTER 2. MODEL CHECKING

Kripke Structures (KS) are quintuples (S, I,Σ, T, L), where S is a set of states, I theinitial state, T ⊆ S × S is a transition relation, Σ a set of propositions, and L : S → 2Σ isa labeling function which assigns proposition sets to states.

KS⇒ DFA Given a KS (S, I,Σ, T, L), we can construct an equivalent DFA (S′, I ′, A, T ′)as follows: Let S′ = 2Σ, I ′ = L(I) and A = a1, .., an, where

n = maxm|∃s, s1, .., sm : (s, si) ∈ T, 1 ≤ i ≤ m, si 6= sj ⇒ i 6= j

I.e., n is the maximal degree of outgoing transitions of a state in S. Furthermore, we de-fine T ′ =

⋃(s,s′)∈T (s, a, s′), such that for (s, a, s′), (s′′, a′, s′′′) ∈ T ′, s 6= s′′, s′ 6= s′′′ ⇒ a 6= a′.

The constructed DFA has a unique state for each set of labels which occurs in the KS.

DFA ⇒ KS Analogously, given a DFA (S, I,A, T ), an equivalent KS (S′, I ′,Σ, T ′, L′)can be constructed. Let Σ = A, and I ′ = I ∈ S′. We regard a transition (s, a, s′) as a di-rected edge from s to s′ labeled with a. For each state s ∈ S and each incoming edge withlabel a, we define a state sa ∈ S′. Given S′, we define T ′ = (sx, s′y)|∃(s, x, t), (s, y, s′) ∈ T.

Now, given a sequence of actions w the DFA will hold in state s, if and only if there is apath I ′, s1, .., sn in the KS, such that (L(s1), .., L(sn)) = w.

In the rest of the thesis we may vary the formalism through which a state space isdescribed, depending on what fits best in the current context.

2.1.2 Models, Systems, Programs, Protocols

Model checking is generally described as investigating the formal model of a system.This definition is somewhat misleading for new approaches like unabstracted softwaremodel checking (cf. Section 2.4), where the actual software is checked instead of a model.

Moreover, protocols control the behavior of a system. They restrict the state transitionsthat are allowed in the system (or in the corresponding formal model).

In practice, protocols are implemented as programs in a specific programming language.Just like protocols, these implementations can be model checked to detect errors.

In the rest of the thesis, the terms will be used as follows:

• Here, everything that can undergo a set of configurations is called a system. Thisincludes physical systems like an elevator just like logical systems such as soft-ware.

• A protocol or program is a declarative description of legal transitions in a statespace. For software, the states are implicitly defined by the values of the variablesand the current position within the program. In other areas, like hardware veri-fication, the state space is obtained by building a formal model of the underlyingsystem.

Page 23: Esto es una cagada

2.1. CLASSICAL MODEL CHECKING 23

2.1.3 Definition of Model Checking

Given a (state-based) formal modelM and a property φ, model checking uses explorationalgorithms to explicitly enumerate each state ofM. For each visited state s, it is checked,whether φ holds for s. A state s violating φ is called error state. If during exploration anerror state is encountered, the model checker returns the sequence of states, the errortrail, which leads from the initial to the error state. If the model M contains no errorstates, we say thatM fulfills (models) φ.

Concurrency Many model checkers aim at the detection of errors in systems involv-ing several concurrent processes. The false interleaving of process executions is a sourcefor numerous and hard to find errors. Consider n processes, each yielding a set Si ofstates. Then, the state space of the investigated system is formed by the cross productS1× ..×Sn of all processes’ state spaces. Note, that depending on the protocol or programwhich restricts the allowed state transitions, the cross product forms a superset of allreachable states.

Explicit vs Symbolic Model Checking Model checking approaches form two classesthrough the way they represent their explored states: In explicit model checking, eachexplored state is explicitly present in memory, while symbolic model checking uses de-clarative descriptions, e.g., logic formulae to represent sets of states. A symbolic modelchecker can, for example, use binary decision diagrams [McM92] for a compact repre-sentation of the logic formula that describes the set of explored states. Thus, it is pos-sible to enumerate significantly larger states than with an explicit state representation[BCM+90]. Alternatives are bounded model checkers as presented in [BCCZ99]. As aclear disadvantage, symbolic model checking does not explicitly build a tree of visitedstates. Among other problems, this forbids the use of heuristics which rely on the struc-ture of the search tree. Despite the potential of symbolic state representations, the workat hand will thus focus on explicit model checking, as the new software model checkerpresented here heavily relies on the use of heuristics to be evaluated on individual states.

2.1.4 Example

We use a simple example to illustrate the upcoming formal descriptions. Assume, thatwe want to design a control software for an elevator. The model involves three types ofcomponents (or processes). The first is the elevator, which can either be moving or at oneof two floors. The second component is the elevator’s door which can either be open orclosed. The third component type is a button, of which the model has two instances (onefor each floor). A button can either be idle or pressed. Figure 2.1 illustrates the threecomponent types as deterministic finite automata.

The protocol, which controls the elevator’s behavior, should fulfill several properties thatguarantee the saftey and fair treatment of the passengers. These include, for instance,that the door is always closed when the elevator is moving and that each request for theelevator is eventually followed.

Page 24: Esto es una cagada

24 CHAPTER 2. MODEL CHECKING

Figure 2.1: Three components of the elevator example.

According to Figure 2.1, the example may seem overly simplistic. We may start with atrivial protocol, that does not restrict the transitions of the system. Figure 2.2 showsthe cross product of the automata - note the two instances for the button automaton.For clarity, we introduced an additional edge labeled with the action start. The edge hasno predecessor and leads to the model’s initial state (at0, closed, idle, idle). The size ofthe product automaton illustrates, what is known as the state explosion problem. Thiscombinatorial problem refers to the fact, that a system’s state space grows exponentiallyin the number of its components.

2.1.5 Temporal Logics

Expressions in temporal logics allow logical formulae defined over the propositions of alabeled transition system S to be quantified over the paths of S. Temporal properties canoften be integrated via a cross product of the automaton for the model and the automatonfor the negated property. Examples for temporal logics are CTL [CES86] (ComputationTree Logic) and LTL (Linear Temporal Logic) [Pun77, BK00, ELLL01]. Temporal log-ics allow the basic logical operators ∨,∧,¬ to describe state properties through basicterms over atomic propositions. By applying path quantifiers to basic terms, we arriveat temporal formulae, like:

AG(p⇒ F¬p)

The above formula (always globally p implies future not p), asserts that along each pathof the search tree, if at some state p holds, then there must be a future state at which pdoes not hold. We pass on a detailed description of temporal logics, as they are currentlynot used in the new software model checker presented in Chapter 4 and listing theirformal syntax and semantics would be out of the thesis’ scope. As temporal logics arean important aspect of classical model checking, the term should be mentioned though.Moreover, the remaining part of the elevator example given in this chapter uses temporallogic formulae to describe properties.

2.1.6 Types of Properties

Protocol verification distinguishes several classes of properties the system can be check-ed against. These properties are formulated either by assertions or temporal logic for-mulae. Assertions describe locally checked properties formulated as logical expressionsover the associated values of a state. Depending on the underlying state model thesevalues may correspond to the associated atomic propositions or to the content of systemvariables.

Page 25: Esto es una cagada

2.1. CLASSICAL MODEL CHECKING 25

Figure 2.2: The state space of the elevator protocol.

Expressions in temporal logics quantify the aforesaid properties over paths in the statespace of the system. Hence, temporal logics are more expressive than assertions, as thedescribed properties can be checked independently from the position within the investi-gated protocol.

In the following, we discuss three important property classes in model checking.

Safety-Properties Safety properties describe states that should never be reached inthe protocol. For our example we may consider that the elevator should never be movingwhile the door is open, i.e., AG(¬moving ∨ closed).

Page 26: Esto es una cagada

26 CHAPTER 2. MODEL CHECKING

Liveness-Properties Liveness-properties demand that a certain property will even-tually hold in a state. For the elevator example, such a property might be that when abutton was pressed at a floor, the elevator will eventually arrive at that floor in a futurestate: A(pressedi ⇒ F (ati)).

Absence of Deadlocks Deadlocks describe a systems state in which no progress ispossible. This often occurs in systems involving concurrent processes, which requestexclusive access to shared resources. There, a deadlock occurs if all processes wait for aresource to be released by another process. The elevator example is obviously deadlock-free. A example for a Deadlock is given in Section 8.1.

2.2 Towards a Fault-Free Protocol

The design of a fault-free protocol constitutes an evolutionary process. Starting with aprototype, it is checked if the protocol violates a desired property. If so, the error sourceis analysed and eliminated leading to a new protocol version. The cycle is repeated untilno further violations are found.

It is not difficult to detect a property-violation in the initial protocol of the elevatorexample. The action sequence (or error trail) start, open, up leads to an error state givenby (moving, open, idle, idle), which violates our safety property. An attempt to fix theerror might be to inhibit the action up in states, where open holds. The smart systemdesigner may also realize the analogous case, and forbid the action down, when the dooris open. The resulting state space is depicted in Figure 2.3.

It differs from the initial state space in so far, that some transitions were removed.However, the new version still violates our safety property: The door may also openwhile the elevator is moving, i.e., (start, up, open) is an error trail. As a consequence, wemay forbid the open transition, while the elevator is moving. The resulting state spaceis depicted in Figure 2.4. For clarity, we have removed all unreachable states along withtheir in- and outgoing edges. As can be seen, there are no more states where both movingand open hold.

The system fulfills the safety property, however the new version of the protocol stillviolates our liveness property, since there exists an infinite sequence of actions:start, press1, release1, press1, release1, ... for which the systems cycles between two statesat floor 0. Also, the elevator may simply open and close the door infinitely often. We tryto fix this problem by adding, two more restrictions. First, when a button was pressed,the elevator may no longer open the door. Second, a button at floor i must immediatelybe released if and only if the elevator is at floor i. The resulting state space is depictedin Figure 2.5.

2.2.1 Need for Model Checking

In Section 2.2, we have checked a simple example against two properties. Accordingto the violations we found, the protocol underwent several changes. In the example wewere able to detected the violating action sequences manually, for three reasons. First,

Page 27: Esto es una cagada

2.2. TOWARDS A FAULT-FREE PROTOCOL 27

Figure 2.3: State space for second version of the elevator protocol.

the described system is very simple. Second, we started with the trivial protocol, thatdoes not restrict the transitions and hence contains a lot of errors. Last, but not least,the example was designed for illustration purposes, and we a-priori knew the errors.

Limits of Code Inspection If we closely look at Figure 2.5, we realize that the systemstill contains errors. For instance, the elevator can be forced to remain at one floor if thecorresponding button is repeatedly pressed. Such errors are already harder to detect bymanual code inspection even in a system as simple as the elevator example, while forrealistically sized examples the chance of finding an error converges towards zero.

Page 28: Esto es una cagada

28 CHAPTER 2. MODEL CHECKING

Figure 2.4: State space of the third version of the elevator protocol, fulfilling the safetyproperty.

Limits of Testing The term testing refers to running a protocol or program in a sim-ulated environment while checking if any unwanted behavior is exposed. Industrialstandards demand quality criteria like a coverage of at least 99% of the code, beforethe program is deployed. However, such criteria only roughly correlate with the visitedfraction of all reachable states, as states with equal code positions can differ in theirvariable configuration. Furthermore, state transitions occur with different probabili-ties. As a consequence, testing will visit certain states frequently, while other states arenever reached. In contrast, model checking systematically enumerates all possible sys-tem configurations regardless of their probability. Thus model checking is able to detectsubtle errors which pass unnoticed during testing. As a further advantage over testing,

Page 29: Esto es una cagada

2.3. CLASSICAL MODEL CHECKERS 29

Figure 2.5: State space of the fourth version of the elevator example.

model checking builds a search tree of visited states, which makes it possible to returnan error trail. This makes it easier for the user to detect the source of the error.

2.3 Classical Model Checkers

2.3.1 SPIN

SPIN [Hol97a, Hol03] is one of the oldest and most advanced model checkers. Thetool uses the C-like description language Promela to describe a system with concurrentprocesses, whose state space can be explicitly enumerated. Due to its popularity, thereare numerous derivations of SPIN like dSPIN [DIS99], which allows dynamic process

Page 30: Esto es una cagada

30 CHAPTER 2. MODEL CHECKING

creation or HSF-spin [ELL01], which uses heuristic search to speed up the search for er-rors. Also, some tools use SPIN as a back end to check Promela code generated throughthe conversion of program source codes [HP00]. SPIN is also the eponym for the annualworkshop on model checking of software.

2.3.2 SMV

The tool SMV [McM92] developed in the Carnegie-Mellon university is a BDD-basedsymbolic model checker. Using a strongly declarative description language, models aredescribed as networks of communicating automata. The properties under investigationare formulated as CTL formulae over the model. SMV owns its popularity to the fact,that it was the first model checker to use BDDs for symbolic exploration, enabling itto fully explore very large systems. Recent development based on SMV is nuSMV areimplementation of the tool, which offers an open architecture that can be used as acore for custom verification tools [CCGR99, CEMCG+02].

2.4 Model Checking Software

Recent applications of model checking technology deal with the verification of softwareimplementations (rather than checking a formal specification). The advantage of thisapproach is manifold when compared to the modus operandi in the established softwaredevelopment cycle. For safety-critical software, the designers would normally write thesystem specification in a formal language like Z [Spi92] and (manually) prove formalproperties over that specification. When the development process gradually shifts to-wards the implementation phase, the specification must be completely rewritten in theactual programming language (usually C++). On the one hand, this implies an addi-tional overhead, as the same program logic is merely re-formulated in a different lan-guage. On the other hand, the re-writing is prone to errors and may falsify propertiesthat held in the formal specification.

In contrast, when software model checking is used, the system specification can be for-mulated as a framework written directly in the targeted programming language. At thatpoint, the framework can be checked for conceptual errors that violate formal properties.In the implementation phase, the framework is gradually augmented with implementa-tion details, while the implementation is regularly checked against formal properties inthe same fashion as done in the specification phase. In contrast, violations can now beattributed to implementation errors, rather than conceptional errors in the specification- given that we check against the same properties.

Moreover, the systematic verification of model checking is superior to any manual in-vestigation of the formal specification: it is less likely for undetected conceptual errorsto carry over to the implementation phase, where they are much more difficult to fix.Model checking is also superior to manual inspection or testing of the implementation,as the systematic exploration will not miss states that are unlikely to occur in the ac-tual execution of the program, which means that less errors are carried over from theimplementation- to the integration phase.

Page 31: Esto es una cagada

2.4. MODEL CHECKING SOFTWARE 31

2.4.1 The Classical Approach

In the classical approach of software model checking, an abstract model of the programis investigated. The model must either be constructed manually or (semi-)automaticallygenerated from the program’s source code. The model can then be investigated by estab-lished model checkers like SPIN or SMV.

As an advantage of the verification of abstract models, the user can use the full range offeatures of highly developed model checkers. The main disadvantage lies in the discrep-ancy between the actual program and its abstract model. Fist of all, the model mightabstract way details, which lead to an error in practice. Second, the model can leadto false positives - i.e. errors that are present in the model but not in the actual pro-gram. This may just invalidate the advantages of model checking over testing, as wecannot rely on the states - though systematically enumerated - to be representative forthe actual program.

The automatic translation of source code by custom-build parsers also yields problems:First, modern programming languages such C++ have complex formal semantics, whichimplies that devising a parser for the automatic conversion is tedious and time-consu-ming. The more time is spent for that parser, the less time remains for the design andimplementation of sophisticated model checking technology. As a workaround, manytools only support a subsets of the programming language’s formal semantics.

As another disadvantage, the returned error trail refers to the model and not to thesource code of the inspected program, which makes it harder to track down and fix theerror in the actual program source code.

Moreover, as a very severe problem, the structure and dynamics of computer programs,such as memory references, dynamic memory allocation, recursive functions, functionpointers and garbage collection can often not satisfactorily be modeled by the input lan-guages of classical model checkers.

2.4.2 The Modern Approach

The modern approach to software model checking relies on the extension or implementa-tion of architectures capable of interpreting machine code. These architectures includevirtual machines [VHBP00a, ML03] and debuggers [MJ05].

Such unabstracted software model checking does not suffer from any of the problems ofthe classic approach. Neither the user is burdened with the task of building an error-prone model of the program, nor there is a need to develop a parser that translates(subsets of) the targeted programming language into the language of the model checker.Instead, any established compiler for the respective programming language can be used(e.g. GCC for C++).

Given that the underlying infrastructure (virtual machine) works correctly, we can as-sume that the model checker is capable of detecting all errors and that it will only reportreal errors. Also, the model checker can provide the user with an error trail on thesource level. Not only does this facilitate to detect the error in the actual program, theuser is also not required to be familiar with the specialized modeling languages, such asPromela.

Page 32: Esto es una cagada

32 CHAPTER 2. MODEL CHECKING

As its main disadvantage, unabstracted software model checking may expose a largestate description, since a state must memorize the contents of the stack and all allo-cated memory regions. As a consequence, the generated states may quickly exceed thecomputer’s available memory. Also, as will be seen later, a larger state description willslow down the exploration. Therefore, one important topic in the development of anunabstracted software model checker is to devise techniques that can handle the poten-tially large states.

2.4.3 Some Software Model Checkers

dSPIN

The tool dSPIN is not primarily a software model checker, but rather a dynamic exten-sion to SPIN. The motivation for the tool came from previous experiences regarding at-tempts to convert Java programs to Promela [IDS98]. The extensions relate to dynamicfeatures of programs, which should allow the modeling of object-oriented programs ina natural manner. The extensions include pointers, left-value operators, new/delete-statements, function definition/call, function pointers and local scopes [DIS99].

The tool alleviates the problem of modeling actual computer programs. However, itstill imposes many restrictions to what language constructs can be modeled. For in-stance, the ampersand operator cannot be applied to pointer variables, which impliesthat multi-level pointers cannot be fully handled in the language of dSPIN. Also, thegenerated counter examples are based on the modeling language and must be manuallyre-interpreted to the actual program source.

Bandera

The Bandera [HT99] project of the Kansas State University is a multi functional toolfor model checking Java programs. It is capable of extracting models from Java sourcecode and converting them to the input language of several well known model checkerssuch as SPIN or SMV. Also, Bandera serves as a frontend for JPF2 (cf. Section 2.4.3).The approach is based on a custom-build intermediate language called BIR (Bandera In-termediate Language). The advantage is, that Bandera does not need to implement themodel checking technology itself, but rather inherits it from the back-end tools. However,in consequence it also suffers from the same drawbacks as the other abstraction-basedmodel checkers. Even if the intermediate language allows an adequate modeling of Javaprograms, it still needs to be converted to the input language of the back-end modelchecker which brings up the problems of handling dynamic issues of real programs.Moreover, since Bandera supports a couple of back-end tools, it should be difficult tomaintain, as it needs to catch up with the features new versions, such as new searchalgorithms and heuristics.

Page 33: Esto es una cagada

2.4. MODEL CHECKING SOFTWARE 33

SPIN 4.0

In the versions 4.0 and above, Spin allows embedding of C code in the Promela models.Yet, this does not enable Spin to model check programs on the source code level. Theembedded code is blindly copied into the source of the generated verifier and cannot bechecked [Hol03]. The idea is to provide a way to define guards, states, transitions anddata types in an external language within the model.

CMC

CMC [MPC+02], the ”C Model Checker”, checks C and C++ implementations directlyby generating the state space of the analyzed system during execution. CMC is mainlyused to check correctness properties of network protocols. The checked correctness prop-erties are assertion violations, global invariant checks avoiding routing loops, sanitychecks on table entries and messages as well as memory leaks. The approach of CMCis based on extending the checked program with a testing environment which executessingle steps of the (multi-threaded) program while checking for correctness-propertiesin-between. This implies, that CMC does not possess a meta-level infrastructure (e.g. avirtual machine) which has full control over the program execution. In particular, theresulting state of a transition is only available, after the corresponding machine codewas physically executed on the computer on which the model checker runs. This makesit impossible to detect fatal program errors, such as illegal memory accesses, since theselead to a crash of the executed program and thus of the model checker. In contrast,tools based on machine-code interpreting infrastructure can check the validity of eachmachine instruction, before it is executed.

VeriSoft

The tool VeriSoft [God97] is a program model checker which can be applied to check anysoftware, independently from the underlying programming language. VeriSoft offers twomodes: interactive simulation and automatic verification. In the latter, the tool system-atically enumerates the state space of the investigated program. The supported errortypes are deadlocks, livelocks, divergences, and assertion violations. A livelock means,that a process has no enabled transitions for a specifies amount of time. Divergences oc-cur, when a process does not execute any visible operation during a user-specified time.When an error is found, the respective counter example is presented to the user in ainteractive graphical interface. Like CMC, the approach is based on a supervised pro-gram execution by augmenting the source code of the verified program with additionalcode. This a priori excludes the possibility to find low-level errors such as illegal memoryaccesses. Moreover, VeriSoft explores programs based on input- output behavior ratherthan on the source level of the investigated program.

Zing

The Zing [AQR+04a, AQR+04b] model checker developed by Microsoft Research aims atfinding bugs in software at various levels such as high-level protocol descriptions, work-

Page 34: Esto es una cagada

34 CHAPTER 2. MODEL CHECKING

flow specifications, web services, device drivers, and protocols in the core of the operatingsystem. Zing provides its own modeling language and an infrastructure for generatingmodels directly from software source code. In a recent case study1, it was possible to finda concurrency error in a Windows device driver. This indicates, that Zing may have apractical relevance for checking properties of actual programs. Still, the approach facesthe same problems like all software model checkers based on formal models, i.e. themodel may abstract-away details of the program which lead to an error in practice or itmay introduce false positives.

SLAM

The SLAM project by Microsoft research is a model checker for sequential C programs,based on static analysis. It uses the specification language SLIC to define safety prop-erties over the program. Given a program P and a SLIC specification, an instrumentalprogram P ’ is created, such that a label ERROR in P ′ is reachable, if and only if P doesnot satisfy the specification [BR02]. The process in SLAM generates a sound Booleanabstraction B’ from the instrumental program P ’, where sound means, that if ERROR isreachable in P ′, then it is also reachable in B′. However, the reverse cannot be assumed,i.e. the inspection of B’ may yield false positives. To deal with the latter, the tool usescounter-example driven refinement, which generates a more detailed abstraction B′′ ofP ′, containing less spurious paths than B′. The approach of SLAM is very interesting,as the tool is based on abstract models, while it adresses one of its main problems - i.e.false positives. Yet, the tool works on a function call/return level and is mainly targetedto programs that implement an underlying protocol’s state machine. Thus, it is presum-ably not possible to use this approach to find low-level errors such as illegal memoryaccesses.

BLAST

The Berkley Lazy Abstraction Software verification Toolkit (BLAST) [HJMS03] is a toolfor the verification of safety properties in C programs. The verification scheme of BLASTuses control flow automata to represent C programs and works in two phases: in the firstphase it constructs a reachability graph of reachable abstract states of the program. Ifan abstract error state is found, the second phase begins. Here, it is checked throughsymbolic execution of the program, whether the error is real or results from a too coarseabstraction of the program. Similar to SLAM, BLAST considers error states found inthe abstract state space to be false positives and performs an additional analysis to testtheir genuity.

As a limitation, the current version of BLAST cannot accurately interpret C programsthat involve function pointers or recursive function calls [Ber04].

1http://research.microsoft.com/zing/illustration.htm

Page 35: Esto es una cagada

2.4. MODEL CHECKING SOFTWARE 35

Java PathFinder 1+2

The Java PathFinder [Hav99] (JPF) developed by the NASA is a model checker for con-current Java programs. The name of the tool originates from the Mars rover robot MarsPathfinder, which shut down due to an error in the control software [HP00]. The errorhad cost the NASA a considerable amount of money - money that could have been saved,if the software had undergone a more thorough investigation. This famous event raisedan increased awareness to the need for automated software verification. In its first ver-sions (JPF1), the tool was merely a converter from Java source code to Promela. SinceJPF1 is no longer under development, the term ’JPF’ always refers to the second versionof the model checker: JPF(2) [VHBP00a], uses a custom-made Java Virtual Machine toperform model checking on the compiled Java Byte-Code. By this approach, JPF consti-tutes the first software model checker, which does not rely on an abstract model of thechecked program. JPF also provides a set of heuristics, which can significantly accel-erate the search for errors [GV02]. An interface allows the implementation of user de-fined heuristics (see e.g. [EM03]). Its powerful features and the public availability 2 hasearned JPF a lot of attention in the model checking community. The unabstracted modelchecking approach, which eliminates various problems of previous technologies has alsoinspired other projects [ML03, MJ05] to use machine code interpreting infrastructure tomodel check actual programs.

Yet, JPF is limited to the verification of Java programs, while most software products -including that of the NASA - are written in the industrial standard programming lan-guage C++. In fact, the developers of JPF state that for the verification with JPF, somesoftware was manually converted from C++ to Java [VHBP00b]. Not only does this implyand additional overhead, but may also invalidate some advantages of the JPF approach,since the manual conversion may induce errors, resulting in false positives and hidingactual software faults.

Estes

The Estes tool implements a quite recent [MJ05] approach to program model checking,which builds on the GNU debugger (GDB) to verify software on the assembly level. Theapproach differs from assembly model checkers such as JPF or StEAM in so far, as GDBsupports multiple processor models. Hence the tool is not limited to the verification ofhigh-level program properties. Instead, a stronger focus is laid on checking time criticalproperties of the compiled code of embedded software. In [MJ05] an example for anerror is given that is not visible on the source level, but rather occurs due to an incorrecttiming in the m68hc11 machine code. Here, the program reads two sensor values fromregisters that are updated by an interrupt which is invoked every x CPU cycles. A datainconsistency can occur, when the interrupt fires after the program has read the firstregister and before comparing it with the second.

Building on a debugger is an alternative approach, which yields the same advantages asenhancing virtual machines. Also, the possibility to search for errors that occur only onthe machine code of a specific architecture make this a promising tool. Unfortunately,

2http://javapathfinder.sourceforge.net/

Page 36: Esto es una cagada

36 CHAPTER 2. MODEL CHECKING

the currently available literature gives little technical detail about the implementation(e.g., how states are encoded), nor does it provide enough experimental data to reallyjudge the capabilities and limits of this approach.

Summary

Besides state explosion, an inherent problem that most existing program model checkersface is the complexity of the formal semantics (FS) of modern programming languagessuch as C++. The more effort tools invest to account for details of the underlying FS, theless time remains for the development of sophisticated model checking techniques, suchas exploration algorithms, compact state memorization and heuristics.

As one workaround, tools such as Bandera or BLAST support only a subset of the re-spective programming language’s FS. Obviously, this burdens the user with the task ofrewriting the application to a version, which is checkable by the respective tool. Notonly does this impose an additional overhead to the software development process, butrather the rewriting process may introduce or hide errors of the actual program.

In a second workaround that is taken by e.g. CMC, the investigated programs sourcecode is augmented with a model checking unit and compiled to an executable, whichperforms a stepwise, supervised execution of the program, while checking for errors be-tween each step. This method is mainly suitable for checking highlevel properties suchas safety-violations or deadlocks, rather than machine-level errors such as illegal mem-ory writes. The problem is, that the machine instructions which correspond to a statetransition of the program are executed on the actual hardware of the machine that runsthe model checker. Since the result of a transition is only known, after the respectivemachine instructions were executed, fatal errors such as illegal memory writes cannotbe caught, as they terminate the execution of the model checker.

The second version of the Java PathFinder remedies the above problems by using a vir-tual machine to execute the Byte Code of the investigated program. This ensures, thatthe model checker accounts for the complete FS of the underlying programming lan-guage, while there is no need to develop a translator from the program source code tosome formal model - instead one can use any 3rd party Java compiler, that producesvalid Byte Code. Obviously, the development of a virtual machine also imposes a con-siderable one-time overhead. In particular, it must be assured that the virtual machineworks correctly, as errors in the machine may again lead to false positives even if theunderlying formal model (the machine code) is correct. Model checking the virtual ma-chine’s implementation may help here, but that requires another virtual machine thatworks correctly.

Alternatively, a tool may use an existing virtual machine that was designed to run ap-plications rather than for the use in a model checker exclusively. Such an infrastructurecan be assumed to be thoroughly tested for correct functioning. However, it is alwaysdifficult to integrate external components into your own tool. As previously mentioned,the developers of JPF assumed the design of a custom-made virtual machine to be morefeasible than tailoring a model checking infrastructure to an existing machine (such asthe official Java interpreter).

As will be seen in Chapter 4, the C++ model checker StEAM takes the challenge of

Page 37: Esto es una cagada

2.4. MODEL CHECKING SOFTWARE 37

using a 3rd-party virtual machine to execute the machine instructions correspondingto state transitions in the investigated program. It will turn out, that the amount ofwork needed to do this is not more than that of developing a custom virtual machine -probably less. As the core contribution of the thesis, StEAM is the first model checkercapable of checking concurrent C++ programs without the need to build formal models.This approach allows the investigation of programs written in the industrial standardprogramming language, while

• There is no need of manual conversion or heavy rewriting of the original program.

• The semantics of the program are accurately interpreted.

• The designers of the model checker are not required to write a compiler or virtualmachine themselves.

• The errors that can be searched for are not restricted to violations of high-levelproperties, such as safety-violations or deadlocks. Additionally, the model checkercan detect low-level errors, like illegal memory accesses.

Furthermore, the model checker StEAM constitutes a versatile tool whose field of ap-plication is not restricted to the verification of formal properties. Chapter 5 presents atechnique to use the model checker as a planner for concurrent multi-agent system.

Page 38: Esto es una cagada

38 CHAPTER 2. MODEL CHECKING

Page 39: Esto es una cagada

Chapter 3

State Space Search

Explicit model checking essentially relies on the exploration of state spaces with searchalgorithms. Here, we give an introduction to state space search and discuss the mostimportant search algorithms and the advantages and drawbacks they impose in thefield of (software-) model checking.

3.1 General State Expansion Algorithm

A general search algorithm systematically enumerates the nodes of a directed graph.A directed graph is a tuple (V,E), where V is a set of nodes and E ⊆ V × V is a setof edges. Graph nodes are often used to model the states of a system, hence the termsnode and state may be used interchangeably. During the enumeration of the nodes, asearch algorithm maintains two lists: open and closed. The closed-list contains all fullyexpanded nodes. A node n is declared to be fully expanded , if all immediate successorstates n′|(n, n′) ∈ E were visited. The list open, contains all nodes of the exploredgraph that have been visited, but not yet fully expanded. Initially, the closed list isempty and open only contains some designated start node s ∈ V . In each step, thealgorithm removes a node n from open according to the used search strategy. Then, n isexpanded - i.e. each immediate successor node n′ with (n, n′) ∈ E is visited and addedto the open list, if n′ is not already in the closed or open list. After state n has beenexpanded, it is added to the closed list. The algorithm terminates, when the open listis empty, which implies that all reachable nodes were visited. Algorithm 3.1 shows thepseudo code for the general state expansion algorithm. Here, the function expand(n)returns all immediate neighbors of node n - i.e. expand(n) = n′|(n, n′) ∈ E.Common search algorithms are basically implementations of the abstract descriptionin algorithm 3.1 and differ mainly in the criterion upon which the next open state isselected for expansion.

The working method of different search algorithms can very naturally be exemplified forgeometric search. Here, the costs of a path from a start location to the goal correspondsto the sum of covered distances between the intermediate locations in that path. Inthe graph depicted in figure 3.1, we have a start node i, goal node g and seven othernodes s1, . . . , s7. For simplicity, we assume, that the spacial arrangement of the nodes in

39

Page 40: Esto es una cagada

40 CHAPTER 3. STATE SPACE SEARCH

Procedure GSEA(V,E,γ)

Input: A directed graph (V,E), a start node i and a goal description γ.Output: Goal state g or null.

open = i ;closed = ∅while (open 6= ∅)

select n ∈ openif (n |= γ) return nopen← open\nΓ← expand(n)closed← closed ∪ nopen← open ∪ (Γ\closed)

return null

Algorithm 3.1: General State Expansion Algorithm

Figure 3.1 is such, that the geometric distance between two locations is proportional tothe Euclidean distance between their corresponding nodes. The optimal path from i to gobviously leads through s2 and s5.

Figure 3.1: Example Network

3.2 Undirected Search

Undirected (aka blind) search algorithms have no information about the distance of astate to a goal state in the search space and instead select the open state to be expandednext according to its position in the search tree. The two most important undirected

Page 41: Esto es una cagada

3.2. UNDIRECTED SEARCH 41

search algorithms are breadth-first and depth-first search.

3.2.1 Breadth-First Search

BFS expands in each step a node n with minimal depth (number of nodes in the pathfrom i to n). This can be achieved by implementing the open list as a FIFO queue. BFSgives optimal paths in uniformly weighted graphs - i.e. in graphs where all transitionsfrom some node to an immediate successor imply the same costs. Obviously, this isnot the case for the example network in figure 3.2, since the costs correspond to theEuclidean distance between the nodes, which may be different for each pair of adjacentnodes. Hence, in geometric search, paths obtained by BFS may be suboptimal as shownin Figure 3.2, which depicts the search tree for the example network. Here we assumethat of two nodes si 6= sj with the same depth and i < j, si is expanded first. The romannumbers indicate the order in which nodes are expanded. BFS requires five expansionsand returns the path i, s1, s4, g.

BFS in Software Model Checking

Optimality of BFS in model checking depends on the subjective definition of the ”costs”of counter examples (trails). These costs should be proportional to how difficult it isto track down the trail and use the information for fixing the error in the protocol orprogram. The assumption in this thesis is, that shorter trails can always more easilybe interpreted by the user than longer trails. This implies that we give uniform costsfor all state transitions, which makes BFS an optimal search algorithm in the domainof software model checking. However, since the size of the open list grows exponentiallywith the search depth, the space required to memorize the open states will often exceedthe available memory, before an error is found.

Figure 3.2: Path found by BFS

Note that there is more recent work on helping the user in diagnosing the cause ofan error [GV03]. This may yield even better criteria concerning the quality of counterexamples, but it also imposes a harder problem.

Page 42: Esto es una cagada

42 CHAPTER 3. STATE SPACE SEARCH

3.2.2 Depth-First Search

As the dual of BFS, depth-first search (DFS) selects the open node with maximal depth tobe expanded next, which is achieved by implementing the open list as a stack. Obviously,DFS is not optimal as it tends to return the longest path from the start node to a goalnode, which is most likely suboptimal even in non-uniformly weighted graphs.

Figure 3.3 shows the search tree for the example network as generated by DFS. Here,the path from the start- to the goal node is of length five - as opposed by four in BFS. Onthe other hand, DFS only requires four node expansions - in contrast to five expansionsfor BFS.

DFS in Software Model Checking

Depth-first search is an option in software model checking, if optimal algorithms failto find an error. The advantage of DFS over BFS is that the number of open statesgrows only linearly with the encountered search depth. As its main drawback, the trailsreturned by DFS may be very long which makes it difficult or impossible for user to trackdown the error with the help of that trail. Moreover, DFS is incomplete in infinite searchspaces that programs often expose.

Figure 3.3: Path found by DFS

3.2.3 Depth-First Iterative Deepening

Given a depth limit d, Depth-first iterative deepening search (DFID), explores the statespace in the same order as DFS, ignoring states with a depth greater than d. If no goalstate is found, the search is re-initiated with an increased depth limit. DFID incorpo-rates the low memory requirements of DFS with the optimality of BFS in uniformlyweighted graphs. Moreover, DFID remedies the incompleteness of DFS in infinite statespaces. The drawback of DFID lies in the high time requirements, as each iteration mustrepeat all node expansions of the previous one.

Page 43: Esto es una cagada

3.3. HEURISTIC SEARCH 43

DFID in Software Model Checking

DFID is very interesting for software model checking, as it constitutes an optimality-preserving time- space tradeoff. It is a known fact that due to the state-explosion prob-lem, the failure of search algorithms can generally be attributed to a lack of space ratherthan a timeout. For the memory saving to take effect however, closed states may not bememorized explicitly. This requires the search to be integrated with compacting hashfunctions such as bitstate-hashing (cf. 6.2.1).

3.3 Heuristic Search

3.3.1 Best-first Search

Best-first search belongs to the category of heuristic search algorithms. The state to beexpanded next is chosen as the state s which minimizes the estimated distance h(s) tothe goal (h is called the estimator function or simply the heuristic). An obvious estimatorfunction for geometric searches is the Euclidean distance to the goal point i. Assumingan adequate heuristic, best-first search will in general find the goal state faster thanblind search algorithms such as BFS. However, the obtained path is not guaranteedto be optimal. When applied to our example network using Euclidean distance as theheuristic estimate, the generated search tree is the same as for DFS (Figure 3.3). Thepreferability of heuristic search over undirected search critically depends on the qualityof the used heuristic. A bad heuristic will not only return suboptimal paths, it willalso lead to an increased number of node expansions as compared to e.g. DFS. Eventhough, the Euclidean distance is generally a good heuristic in geometric search, it maybe misleading as the example shows. However, we may generally expect it to performgood in geometric networks.

Best-First Search in Software Model Checking

Best-first search is an important option in software model checking, when optimal searchalgorithms fail due to time- or space restrictions. With a properly chosen heuristic it isoften possible to find an error. Experimental results (see e.g. chapter 8) show that withan adequate heuristic errors can be found with low time- and memory requirementswhile the generated error trails are close to optimal.

3.3.2 A*

The heuristic search algorithm A* is a refinement of best-first search. Here, the state sis expanded next, which minimizes f(s) = h(s)+g(s), where h is a heuristic estimate andg the path cost from the initial state to s. In geometric search, the path costs correspondto the accumulated point distances, i.e. if s1, . . . , sn is the path leading from initial statei = s1 to state s = sn, and d(s′, s′′) denotes the Euclidean distance between point s′ andpoint s′′, then g(s) =

∑n−1i=1 d(si, si+1)

Page 44: Esto es una cagada

44 CHAPTER 3. STATE SPACE SEARCH

Admissibility and Consistency

A heuristic h is admissible, if the estimated goal distance h(s) of some state s is alwaysless or equal than the actual distance from s to g. A* provably returns the optimalpath, if the used heuristic is admissible. The term admissible is also used to describestrategies that produce optimal results for a given class of problems - such as BFS inuniformly weighted graphs.

As a stronger property, a heuristic is consistent, if for each state s and each immediatesuccessor s′ of s it holds, that h(s) ≤ h(s′) + c(s, s′). Here, c(s, s′) denote the path costsfrom s to s′. When A* is used with a consistent heuristic, the f -value along each path isnon-decreasing.

If we use A* combined with the Euclidean distance on a network of waypoints, we canexpect that the returned path is optimal. Figure 3.4 shows the search tree we get byapplying A* on the example network. Here, only three expansions are needed to obtainthe optimal path i, s2, s5, g.

Figure 3.4: Path found by A*

A* in Software Model Checking

A severe problem of using A* in software model checking is, that most of the heuristicsused are not admissible - one exception is the most-blocked heuristic used to detect dead-locks (cf. 4.5.1). Non-admissible heuristics are generally known to perform poorly underA* and the experimental results in chapter 8 support this claim. Still, A* is useful insoftware model checking e.g. if a suboptimal trail is to be improved. In this case, anadmissible heuristic can be devised by abstracting the state description to a subset of itscomponents.

3.3.3 IDA*

Like DFID (cf 3.2.3), Iterative Deepening A* (IDA*) [Kor85] performs a series of depth-first traversals of the search tree. In contrast to DFID, IDA* limits the f − value f(s) =h(s) + g(s) of the generated states, rather than their depth. Starting with an initialcost-bound c = f(i), where i is the initial state, IDA* preforms a depth-first traversal

Page 45: Esto es una cagada

3.3. HEURISTIC SEARCH 45

while generated states with an f -value greater than c are not inserted to the open list.When no more open states remain, the new value of c is the minimal cost-value of allencountered states that exceeded the old value of c. Like A*, IDA* returns the optimalpath form i to a goal state, if an admissible heuristic is used.

IDA* in Software Model Checking

Like DFID, IDA* is interesting for software model checking at points when other optimalalgorithms fail due to lack of memory. Like A*, its applicability depends on whether anadmissible heuristic is available.

Recent Development

There is a vast body of recent development of search algorithms not covered by the thesis.The algorithms presented in this chapter are the fundamental and most important ones.Moreover, with exception of IDA*, the presented algorithms are implemented in the newmodel checker StEAM presented in Chapter 4.

Page 46: Esto es una cagada

46 CHAPTER 3. STATE SPACE SEARCH

Page 47: Esto es una cagada

Chapter 4

StEAM

4.1 The Model Checker StEAM

StEAM is an experimental model checker for concurrent C++ programs. Like JPF, thetool builds on a virtual machine, but also takes some further challenges. To the best ofthe author’s knowledge, StEAM is the fist C++ model checker, which does not rely onabstract models. An increased difficulty of C++ lies in the handling of untyped memoryas opposed by Java, where all allocated memory can be attributed to instances of certainobject types. Also, multi-threading in Java is covered by the class Thread from thestandard SDK. Such a standardized interface for multi-threading does not exist in C++.

Moreover, StEAM build on an existing virtual machine, IVM. This is novel since tailoringa model checking engine to an existing virtual machine imposes a task, that was thoughtto be infeasible by the creators of JPF [VHBP00b]. IVM is further explained in section4.2.

4.2 Architecture of the Internet C Virtual Machine

The Internet Virtual Machine (IVM) by Bob Daley aims at creating a programming lan-guage that provides platform-independence without the need of rewriting applicationsinto proprietary languages like c] or Java . The main purpose of the project is to be ableto receive pre-compiled programs through the Internet and run them on an arbitraryplatform without re-compilation. Furthermore, the virtual machine was designed to rungames, so simulation speed was crucial.

The Virtual Machine The virtual machine simulates a 32-bit CISC CPU with a setof approximately 64,000 instructions. The current version is already capable of runningcomplex programs at descend speed, including the commercial game Doom1. This is astrong empirical evidence that the virtual machine correctly reflects the formal seman-tics of C++. IVM is publicly available as open source2.

1www.doomworld.com/classicdoom/ports2ivm.sourceforge.net

47

Page 48: Esto es una cagada

48 CHAPTER 4. STEAM

The Compiler The compiler takes conventional C++ code and translates it into themachine code of the virtual machine. IVM uses a modified version of the GNU C-compiler gcc to compile its programs. The compiled code is stored in ELF (Executableand Linking Format), the common object file format for Linux binaries. The three typesof files representable are object files, shared libraries and executables, but we will con-sider mostly executables.

ELF Binaries An ELF-binary is partitioned in sections describing different aspects ofthe object’s properties. The number of sections varies depending on the respective file.Important are the Data- and BSS-sections. Together, the two sections represent the setof global variables of the program.

The BSS-section describes the set of non-initialized variables, while the Data-sectionrepresents the set of variables that have an initial value assigned to them. When the pro-gram is executed, the system first loads the ELF file into memory. For the BSS-sectionadditional memory must be allocated, since non-initialized variables do not occupy spacein the ELF file.

Space for initialized variables, however, is reserved in the Data-section of the objectfile, so accesses to variables directly affect the memory image of the ELF binary. Othersections represent executable code, the symbol table etc., not to be considered for mem-orizing the state description.

State of IVM At each time, IVM is in a state corresponding to the program it executes.The structure of such a state is illustrated in Figure 4.1.

The memory containing an IVM state is divided in two hierarchical layers. One is theprogram memory containing the memory allocated by the simulated program. Programmemory forms a subset of the other layer, the physical memory of the computer the vir-tual machine is running at. The physical memory contains the contents of CPU-registers(machine) and the stack of the running program. The CPU registers hold the stack andframe pointer (SP,FP), the program counter (PC) as well as several registers for integerand floating point arithmetic (Ri,Fi). The physical memory also holds the object file im-age (MI), which is essentially a copy of the compiled ELF-binary. The MI stores amongother program information the machine code and the Data- and BSS-sections. Note,that the space for storing values of initialized global variables resides directly in the MI,while for uninitialized variables an additional block of memory is allocated at programstartup.

In its original implementation, IVM does not provide a data structure which encapsu-lates its state, as it is implicitly described by the contents of the memory areas depictedin figure 4.1 and changes in a deterministic fashion. A fist goal in the development ofStEAM is to formulate a state description as a data structure, which captures all rele-vant information about a program’s state.

Page 49: Esto es una cagada

4.3. ENHANCEMENTS TO IVM 49

Figure 4.1: State of IVM.

4.3 Enhancements to IVM

In the following, we give a technical description of the enhancements used to tailor amodel checking functionality to IVM. These include a state description, a multi-threa-ding capability and a set of search algorithms that enumerate the program’s state basedon our description.

4.3.1 States in StEAM

As stated in Section 4.2, the fist step is to encapsulate the state description in a datastructure, which contains all IVM components from Figure 4.1 as well as additionalinformation for the model checker. The latter includes:

Threads: As model checking is particularly interesting for the verification of concur-rent programs, the description should include all relevant information about an arbi-trary number of running processes (threads).

Locks: Many concurrency errors involve deadlocks or the violation of access rights.Threads may claim and release exclusive access to a resource by locking and unlockingit. Resources are usually single memory cells (variables) or whole blocks of memory.Deadlocks occur, if all running threads wait for a resource to be released by another

Page 50: Esto es una cagada

50 CHAPTER 4. STEAM

Figure 4.2: System state of a StEAM program.

thread, while privilege violations are caused by the attempt to read, to write or to unlocka resource that is currently locked by another thread.

Memory: The state should include information about the location and size of dynam-ically allocated memory blocks, as well as the allocating thread. Figure 4.2 shows thecomponents that form the state of a concurrent program for StEAM.

Enhancing the state of IVM to the state of StEAM involves various changes:

Memory Layers: First of all, the memory is now divided in three layers: The outer-most layer is the physical memory which is visible only to the model checker. The subsetVM-memory is also visible to the virtual machine and contains information about themain thread, i.e., the thread containing the main method of the program to check. Theprogram memory forms a subset of the VM-memory and contains regions that are dy-namically allocated by the program.

Stacks and Machines For n threads, we have stacks s1, . . . , sn and machines m1, . . . ,mn,where s1 and m1 correspond to the main thread that is created when the verificationprocess starts. Therefore, they reside in VM-memory. The machines contain the hard-ware registers of the virtual machine, such as the program counter (PC) and the stackand frame pointers (SP, FP). Before the next step of a thread can be executed, the con-tent of machine registers and stacks must refer to the state immediately after the last

Page 51: Esto es una cagada

4.3. ENHANCEMENTS TO IVM 51

Figure 4.3: Class diagram of StEAM’s state description.

execution of the same thread, or if it is new, directly after initialization.

Memory- and Lock-Pool The memory-pool is used by StEAM to manage dynamicallyallocated memory. It consists of an AVL-tree of entries (memory nodes), one for eachmemory region. They contain a pointer to address space which is also the search key,as well as some additional information such as the identity of the thread, from which itwas allocated.

The lock-pool stores information about locked resources. Again an AVL-tree stores lockinformation.

4.3.2 The State Description

Figure 4.3 depicts the diagram of classes which encapsulate the system state in StEAM.

For the memory-efficient storing of states, we rely on the concept of stored components.That is, for components that will remain unchanged by many state transitions, we definean additional container class. During successor generation we do not store copies ofunchanged components in the system state. Instead, we store a pointer to the containerof the respective component in the predecessor state and increment the user counterfor the component. The user counter is needed when states are deleted. In this case theuser counter of each component is decreased and its content is deleted only if the counterreaches zero.

On the top level of our state description, we have the class System State, which memo-rizes the number of running threads and a pointer to its predecessor state in the searchtree.

Page 52: Esto es una cagada

52 CHAPTER 4. STEAM

A system state relates to a set of Thread States - one for each running thread. Eachthread state contains the unique ID of the thread it corresponds to, a user counter and aninstance of ICVMMachine - the structure which encapsulates the state of CPU-registersin IVM. Note that we do not use a container class for the latter, as the machine registersconstitute the only system component that is assured to change for each state transi-tion.Furthermore a thread state relates to a Stored Stack - the container class for storingstack contents. During initialization, IVM allocates 8 MB of RAM for the stack. How-ever, we only need to explicitly store that portion of the stack in our state which liesbelow the stack pointer of the corresponding thread.A thread state also relates to two objects of type Stored Section. This class defines thecontainer for the contents of the Data- and BSS-section.Finally, we have two relations to objects of type Stored Pool - the container class for thelock- and memory-pool.

4.3.3 Command Patterns

Command patterns form the base technology that enables us to enhance and controlIVM. A command pattern is basically a sequence of machine instructions in the compiledcode of the program. Rather than being executed, such a pattern is meant to provide themodel checker with information about the program or the investigated properties. Acommand pattern begins with a sequence of otherwise senseless instructions that willnot appear in a real program. The C++ code:

int i;i++;--i;i++;--i;i=17;i=42;

Declares a local variable i and in- and decrements it twice. The code compiles to thefollowing sequence of machine instructions:

a9de ecfd inc1l (-532,%fp)75de ecfd dec1l (-532,%fp)a9de ecfd inc1l (-532,%fp)75de ecfd dec1l (-532,%fp)ebe4 1100 ecfd movewl #0x11,(-532,%fp)ebe4 2a00 ecfd movewl #0x2a,(-532,%fp)

Page 53: Esto es una cagada

4.3. ENHANCEMENTS TO IVM 53

The numbers before each instructions’ mnemonic denote the opcode and parameters inLittle-Endian notation. The number −532 or fdec hexadecimal is an offset in the stackframe used to address the local variable. This offset varies depending on the state of theprogram. As a first enhancement, IVM was taught to understand command patterns.Concretely, this means that between the execution of two instructions, IVM checks if thevalue of the next four opcodes indicate a command pattern. In this case, the next fourinstructions are skipped which is equivalent to increasing the PC by 16 bytes. For thefollowing two instructions, IVM reads the first parameter (i.e. the 2-byte value at offset2 from the increased PC) but also skips the execution of the actual opcodes. As a result,the values 17 and 42 have been determined as the parameters of the command pattern.The fist parameter indicates the type of information transmitted by the pattern. Thesecond parameter gives the value of this information. For example the type may beinformation about the source code line which corresponds to the current position in themachine code and the value may give the concrete line number.

In StEAM, command patterns are generated through parameterized C++-macros, whichcompile to the respective sequence of machine instructions. The macros are then usedto annotate the source code of the respective program. Depending on the pattern type,this is either done by the user or automatically by a script, which is executed prior tothe model checking. Command pattern serve various purposes, including:

File Information This information tells the model checker the name of the source filewhich a block of machine code was compiled from.

Line Information The information about the source code line, which the PC of theexecuted thread’s state corresponds to. This is - among other reasons - needed to con-struct the trail for a detected error.

Capturing the Main Thread The main thread refers to the part of the program thatinstantiates and starts instances of user-defined thread-classes. The beginning of themain() method in the main thread is recognized by IVM through the first line informa-tion pattern which occurs in the compiled code.

Locking and Unlocking Exclusive access to a memory region is claimed and releasedby lock and unlock macros. These compile to command patterns which tell IVM the

address and size of the respective region.

Atomic Regions The user can use macros to mark atomic regions. These are executedas a whole without switching to another thread. The user can declare sections of code

as atomic regions if they are known to be correct, which can significantly truncate thestate space of the program.

Errors IVM can be notified about a program error through a respective command pat-tern. This pattern is only reached, if a previous logical expression over the programvariables evaluates to false (see also section 4.4).

Page 54: Esto es una cagada

54 CHAPTER 4. STEAM

Non-Determinism Model checking requires that the investigated program is not purelydeterministic, but that it can take several paths while being executed. For multi-

threaded programs, non-determinism arises from the possible orders of thread execu-tions. In StEAM, an additional degree of non-determinism is given by the use of state-ments, which tell IVM to nondeterministically choose a variable value from a discreterange of the variable’s domain. The user can induce non-determinism into the programby annotating the source code using macros. These compile to command patterns, whichare recognized by IVM. Formally, we can say, a nondeterministic statement σ, affectingvariable x in state s, transforms s into m states s[x = σ(1)], . . . , s[x = σ(m)], by replacingthe content of variable x with values σ(1), . . . , σ(m).

4.3.4 Multi-Threading

As previously mentioned, standard C++ does not support multi-threading. It was there-for decided to implement a custom multi-threading capability into IVM. On the pro-gramming level, this is done through a base class ICVMThread , from which all threadclasses must be derived. A class derived from ICVMThread must implement the meth-ods start , run and die . After creating an instance of the derived thread-class, a call tostart will initiate the thread execution. The run-method is called from the start-methodand must contain the actual thread code. From the running threads, new threads canbe created dynamically. Program counters (PCs) indicate the byte offset of the next ma-chine instruction to be executed by the respective thread, i.e., they point to some positionwithin the code-section of the object file’s memory image (MI).

IVM recognizes new threads through a command pattern in the run-method of a threadclass. The pattern is generated by an automated annotation of the program source code.

4.3.5 Exploration

In the following we describe how StEAM explores the state space of a program. We firstillustrate the steps needed to generate a successor state. Then we discuss the supportedsearch algorithms and the hashing in StEAM. We close with an example on how a simpleconcurrent program is model checked by StEAM and an illustration of its search tree.

State Expansion

According to section 4.3.1, a state in StEAM is described by a vector

(m1, . . . ,mn, s1, . . . , sn, BS, DS,MP, LP ),

where mi and si denote the machine registers and stack contents of a thread, the compo-nents BS,DS are the state of the BSS- and Data-sections, and MP,LP the state of thememory- and lock-pool.

State Expansion is performed in several steps:

1. Initialize the set of successor states Γ← ∅.

Page 55: Esto es una cagada

4.3. ENHANCEMENTS TO IVM 55

2. Choose one of the running threads.

3. Restore the contents of the CPU-registers, stack, and the variable sections in thephysical addresses.

4. Execute a state transitions - i.e. execute a single instruction or an atomic block ofinstructions starting at the PC of the thread.

5. Read the contents of the physical adresses to generate a single-element set of suc-cessor states Γ′ ← t

6. If a nondeterministic statement σ with m possible choices was executed, setΓ′ ← Γ′\t ∪ t[x = σ(1)], ..., t[x = σ(m)].

7. Set Γ← Γ ∪ Γ′.

8. Repeat the steps 2-7 for all running threads.

Algorithms

StEAM provides several state exploration algorithms, which determine the order inwhich the states of a program are enumerated. Besides the uninformed algorithmsdepth-first search (DFS) and breadth-first search (BFS), the heuristic search algorithmsbest-first and A* are supported (cf. Chapter 3).

Hashing

StEAM uses a hash table to store already visited states. When expanding a state, onlythose successor states not in the hash table are added to the search tree. If the expansionof a state S yields no new states, then S forms a leaf in the search tree. To improvememory efficiency, we fully store only those components of a state which differ from thatof the predecessor state. If a transition leaves a certain component unchanged - whichis often the case for e.g. the lock pool - only the reference to that component is copiedto the new state. This has proved to significantly reduce the memory requirementsof a model checking run. The method is similar to the Collapse Mode used in Spin[Hol97c]. However, instead of component indices, StEAM directly stores the pointersto the structures describing respective state components. Also, only components of theimmediate predecessor state are compared to those of the successor state. A redundantstorage of two identical components is therefore possible. Additional savings may begained through reduction techniques like heap symmetry [Ios01], which are subject tofurther development of StEAM.

Glob

Figure 4.4 shows a simple program Glob , which generates two threads from a derivedthread class MyThread , that access a shared variable glob . Note, that including themain program this results in three running threads.

Page 56: Esto es una cagada

56 CHAPTER 4. STEAM

01. #include "IVMThread.h"02. #include "MyThread.h"04. extern int glob;05.06. class IVMThread;07. MyThread::MyThread()08. :IVMThread::IVMThread()09. 10. void MyThread::start() 11. run();12. die();13. 14.15. void MyThread::run() 16. glob=(glob+1) * ID;17. 18.19. void MyThread::die() 20. 21.22. int MyThread::id_counter;

01. #include <assert.h>02. #include "MyThread.h"03. #define N 204.05. class MyThread;06. MyThread * t[N];07. int i,glob=0;08.09. void initThreads () 10. BEGINATOMIC11. for(i=0;i<N;i++) 12. t[i]=new MyThread();13. t[i]->start();14. 15. ENDATOMIC16. 17.18. void main() 19. initThreads();20. VASSERT(glob!=8);21.

Figure 4.4: The source of the program glob .

The main program calls an atomic block of code to create the threads. Such a block is de-fined by a pair of BEGINATOMICand ENDATOMICstatements. Upon creation, each threadis assigned a unique identifier ID by the constructor of the super class. An instance ofMyThread uses ID to apply the statement glob=(glob+1) * ID .

The main method contains a VASSERTstatement. This statement takes a Boolean ex-pression as its parameter and acts like an assertion in established model checkers likee.g. SPIN [Hol97b]. If StEAM finds a sequence of program instructions (the trail), whichleads to the line of the VASSERTstatement, and the corresponding system state violatesthe boolean expression, the model checker prints the trail and terminates.

In the example, we check the program against the expression glob!=8 . Figure 4.5shows the error trail of StEAM, when applied to glob . Thread 1 denotes the mainthread, Thread 2 and Thread 3 are two instances of MyThread . The returned errortrail is easy to trace. First, instances of MyThread are generated and started in oneatomic step. Then the one-line run -method of Thread 3 is executed, followed by therun -method of Thread 2 . We can easily calculate why the assertion is violated. AfterStep 3, we have glob=(0+1) * 3=3 and after step 5 we have glob=(3+1) * 2=8 . Afterthis, the line containing the VASSERT-statement is reached.

The assertion is only violated, if the run method of Thread 3 is executed before the oneof Thread 2 . Otherwise, glob would take the values 0, 2, and 9. By default, StEAMuses depth first search (DFS) for a program exploration. In general, DFS finds an errorquickly while having low memory requirements. As a drawback, error trails found with

Page 57: Esto es una cagada

4.4. SUMMARY OF STEAM FEATURES 57

Step 1: Thread 1 - 28: ENDATOMICStep 2: Thread 3 - 29: glob=(glob+1) * ID;Step 3: Thread 3 - 30: Step 4: Thread 2 - 29: glob=(glob+1) * ID;Step 5: Thread 2 - 30: Step 6: Thread 1 - 29: Step 7: Thread 1 - 33: VASSERT(glob!=8);

Figure 4.5: The error-trail for the ’glob’-program.

DFS can become very long, in some cases even too long to be traceable by the user.

The Search Tree

Figure 4.6 shows a search tree up to level 2, as produced for the glob example. Here,the components of each state are depicted as symbols of different shapes. Ovals symbol-ize whole states. DATA is depicted as a diamond, circles denote BSS. A thread state issymbolized by a square, a stack by a semi-circle and a cross denotes the memory pool.We will disregard the lock pool here, because no locks occur. A dot in place of a symbolmeans, that the corresponding state component is the same as in the predecessor state.The initial state S0 corresponds to the program state with only the main thread run-ning. Apparently, S0 has only one successor state - S1,1 - which results from iterating themain thread. As shown in Figure 4.4, this calls and executes the initThreads methodin one atomic step. As the result S1,1 has three running threads. DATA is modified,because the main thread writes to the un-initialized class pointers t[i]. BSS remainsunchanged. The stack of the main thread is changed by using the local variable i. Thememory pool is changed, because the call of the new-constructor during thread gener-ation allocates memory for the instances of the MyThread-class. The CPU registers ofthe main thread change, because they include the program counter, which was alteredby the state transition. From S1,1 we have three potential successor states, S2,1 S2,2

and S2,3 that correspond to executing one of the running threads. Iterating the mainthread executes the VASSERT statement leading to state S2,1. This will only affect theprogram counter of the main thread and thus change the main-thread’s state. All othercomponents, including the main thread’s stack, remain unchanged. The states S2,2 andS2,3 correspond to iterating one of the dynamically generated threads. This executes thestatement ’glob=(glob+1) * ID ’, alternating the zero-initialized global variable glob ,and thus BSS. All other components - with the exception of the respective thread state -remain unchanged.

4.4 Summary of StEAM Features

In this section, we will summarize the features of StEAM.

Page 58: Esto es una cagada

58 CHAPTER 4. STEAM

Figure 4.6: The search tree of the glob example.

4.4.1 Expressiveness

Due to the underlying concept which uses a virtual machine to interpret machine codecompiled by a real C/C++ compiler3 there are in principle no syntactic or semantic re-strictions to the programs that can be checked by StEAM, as long as they are validC/C++ programs.

4.4.2 Multi-Threading

As opposed to Java, there is currently no real standard for multi threading in C++. Wedecided to implement a simple multi threading capability our own. To create a newthread class, the user must derive it from the base class ICVMThread and implementthe methods start(), run() and die(), as it was done in e.g. Figure 4.4. If in the near futurea standard for multi threading will arise, StEAM may be enhanced to support it.

4.4.3 Special-Purpose Statements of StEAM

The special-purpose statements in StEAM are used to define properties and to guide thesearch. In general, a special-purpose statements is allowed at any place in the code,where a normal C/C++ statement would be valid4.

BEGINATOMICThis statement marks an atomic region. When such a statement is reached, the ex-

ecution of the current thread will be continued until a consecutive ENDATOMIC. ABEGINATOMIC statement within an atomic block has no effect.

3IVM uses a modified version of GCC to compile its code.4A following semicolon is optional.

Page 59: Esto es una cagada

4.4. SUMMARY OF STEAM FEATURES 59

ENDATOMICThis statement marks the end of an atomic region. An ENDATOMIC outside an atomicregion has no effect.

RANGE(<varname>, int min, int max)This statement defines a nondeterministic choice over a discrete range of numeric vari-able values. The parameter <varname> must denote a variable name that is valid in thecurrent scope. The parameters min and max describe the upper and lower border of thevalue range. Internally, the presence of a RANGE-statement corresponds to expandinga state to have max −min successors. A RANGE statement must not appear within anatomic region. Also, the statement currently only works for int variables.

VASSERT(bool e)This statement defines a local property. When, during program execution, VASSERT(e)is encountered, StEAM checks if the corresponding system state satisfies expression e.If e is violated, the model checker provides the user with the trail leading to the errorand terminates.

VLOCK(void * r)A thread can request exclusive access to a resource by a VLOCK. This statement takes

as its parameter a pointer to an arbitrary base type or structure. If the resource isalready locked, the thread must interrupt its execution until the lock is released. If alocked resource is requested within an atomic region, the state of the executing threadis reset to the beginning of the region.

VUNLOCK(void * r)Unlocks a resource making it accessible for other threads. The executing thread must

be the holder of the lock. Otherwise this is reported as a privilege violation, and theerror trail is returned.

4.4.4 Applicability

StEAM offers new possibilities for model checking software in several phases of thedevelopment process. In the design phase we can check, if our specification satisfiesthe required properties. Rather than using a model written in the input language of amodel checker like e.g. SPIN, the developers can provide a test implementation writtenin same programming language as the end product, namely C/C++. On the one handthis eliminates the danger of missing errors that do not occur in the model but in theactual program. On the other hand, it helps to save the time used to search for so-calledfalse positives, i.e. errors that occur due to an inconsistent model of the actual program.More importantly, the tool is applicable in the testing phase of an actual implementation.Model checking has two major advantages over pure testing. First, it will not miss subtleerrors, since the entire state space of the program is explored. Second, by providing anerror trail, model checking gives an important hint for finding the source of the errorinstead of just claiming, that an error exists.

Page 60: Esto es una cagada

60 CHAPTER 4. STEAM

4.4.5 Detecting Deadlocks

StEAM automatically checks for deadlocks during a program exploration. A thread cangain and release exclusive access to a resource using the statements VLOCKand VUNLOCKwhich take as their parameter a pointer to a base type or structure. When a threadattempts to lock an already locked resource, it must wait until the lock is released. Adeadlock describes a state where all running threads wait for a lock to be released. Adetailed example is given in [ML03].

4.4.6 Detecting Illegal Memory Accesses

An illegal memory access (IMA) constitutes a read- or write operation to a memory cellthat neither lies within the current stack frame, nor within one of the dynamically allo-cated memory regions, nor in the static variable sections. On modern operating systems,IMAs are usually caught by the memory protection which reports the error (e.g., seg-mentation fault on Unix) and terminates the program.

IMAs are not only one of the most common implementation errors in software, they arealso among most time consuming factors in software development as these errors tendto be hard to detect.

On Wikipedia 5, the free online encyclopedia, the term segmentation fault is exemplifiedwith the following small c program.

int main(void)

char * p = NULL;

* p = ’x’;return 0;

When the program is compiled with gcc on Linux, and then being run the operationsystem reports a segmentation fault. Despite the minimalistic program, the error isnot immediately obvious (at least to less experienced programmers), as it results from awrong view of the level of variable p.

Invoking StEAM to the the (IVM-)compiled code of the program, returns the output:

.

.

Illegal memory write, printing trail!depth: 2Step 1: Thread 1 - 19: char * p = 0;Step 2: Thread 1 - 20: * p = ’x’;

5http://en.wikipedia.org/

Page 61: Esto es una cagada

4.4. SUMMARY OF STEAM FEATURES 61

.

.

This is already more informative than the lapidary ”segmentation fault” returned by theoperating system when the program is merely executed. If we insert a line as follows:

int main(void)

char * p = NULL;p=new int[1];

* p = ’x’;return 0;

and re-run StEAM on the compiled code, we get

.

.

(unique) States generated : 5Maximum search depth: : 5

.

.Verified!

.

.

This implies, that StEAM could enumerate all (five) states of the program without find-ing an error.

Though nice for illustration purposes, the example is not overly impressive and one mayclaim, that no model checker was needed for find the error. StEAM however aims atfinding less trivial errors that do not occur immediately at the beginning of the program.

A common practice of programmers is to manually annotate the source code with a largeamount of printf statements and monitor the output of the program to track down theerror. This is a tedious practice as it requires the user to run the programs several timeswhile gradually adding more statements in a binary-search fashion until the exact lineof the error was found. In contrast, StEAM only needs a single run on the inspected pro-gram to provide the user with that information. Moreover, in the first case the printf ’sbecome redundant after the error was found and have to be removed in another tediouspass.

More advanced users may rather use a debugger like gdb to detected illegal memoryaccesses. Although preferable to the printf-method, a debugger is still limited in theamount of information it can give to the user. That information merely consists of thestack trace of the error state s - i.e. the memory contents in s and the local variable

Page 62: Esto es una cagada

62 CHAPTER 4. STEAM

contents of s and its calling functions up to the top level (usually the main function). Incontrast, StEAM memorizes the complete path from the initial state of the program to s.This also includes the CPU registers and memory contents of all preceding states, whichenables the user to precisely analyse the history of the error.

Moreover, StEAM is able to detect illegal memory accesses that may pass unnoticedby both, the operating system’s memory protection and by debuggers. Consider thefollowing program:

void main(int argc, char ** argv) int i[100];i[500]=42;

Here, it is attempted to access an element outside of a locally declared array. This error isexclusively reported by StEAM, since it checks, that all write accesses to the stack mustreside within the current stack frame, while the system’s memory protection merelychecks, whether the respective memory cell lies inside an allocated memory region.

Checking for illegal memory access is a time-consuming task, since before the executionof any machine instruction it must be checked, whether it accesses a memory cell andwhether this cell resides within the current stack frame, a dynamically allocated mem-ory region or in the static variable sections. Hence, StEAM must be explicitly advised tosearch for illegal memory accesses through a parameter.

4.5 Heuristics

The use of heuristics constitutes one of if not the most effective way to improve the effec-tiveness of a model checker. Despite the state explosion problem, empirical evaluationsshow, that the minimal depth of an error state in the search tree often grows only lin-early in the number of processes. Thus, the model checker only needs to explore a smallfraction of the entire state space to find the error. Heuristics help to explore those pathsfirst, that are more likely to lead to an error state.

Heuristic search algorithms, such as best-first search or A* [HNR68] (cf. Chapter 3)can be used in model checking to accelerate the search for errors, to reduce memoryconsumption, and to generate short and comprehensible counter examples (trails). Notethat heuristics are suitable only for the finding of errors, and not for proving correctness,since the latter requires an exhaustive exploration of the program.

Heuristics for model checking generally differ from heuristics for other state space searches,as we usually do not have a complete description of an error state. Exceptions aretrail-directed heuristics, like Hamming- or FSM-distance [EM03], which use a two-stagesearch for shortening suboptimal trails. The lack of information about the searched statemakes it hard to devise estimates that are admissible or even consistent .

In the following, we discuss the heuristics currently supported by StEAM. Some of them

Page 63: Esto es una cagada

4.5. HEURISTICS 63

are derived from the heuristic explicit state model checker HSF-Spin [ELL01] and fromthe Java Byte Code model checker JPF [GV02], others are new.

4.5.1 Error-Specific Heuristics

Here, the term ”error-specific” refers to the class of heuristics, which are targeted tofinding a certain type of error.

Most Blocked A typical example for an error-specific heuristic is most-blocked [GV02,ELL01, LL03], which was designed to speed up the search for deadlocks. Most-blockedtakes the number of non-blocked processes as the estimated distance to the error state.A process is blocked, if it waits for access to a resource.

The most-blocked heuristic is consistent, if a state transition can at most block oneprocess. In StEAM, this is assured, if the investigated program does not contain atomicregions, as a LOCK statement (cf. Section 4.3.1) can merely block the executed threadand only LOCK statements can block threads.

If atomic blocks are present, the most-blocked heuristic may become inadmissible: Con-sider a state u with processes p1, .., pn (n > 2), none of which is blocked, hence h(u) = n.For thread t1, the next two actions are to lock a resource r1, then a resource r2. All otherprocesses will lock r2 first, then r1. In a transition u → v, thread t1 locks r1. In another,transition v → w, thread t2 locks r2. Thus, w is a deadlock state, as all threads are mu-tually waiting for either r1 or r2 to be released. This means, the real distance of u to anerror state is 2.

Lock and Block For StEAM, an improved version of most-blocked was devised, namelylock-and-block. This heuristic uses the number of non-blocked processes plus the num-ber of non-locked resources as the estimated distance to the error. Here, ’locked’ meansthat a process has requested or holds the exclusive access to a resource. Clearly, a dead-lock occurs, if all processes wait for a lock to be released by another process. Hence locksare an obvious precondition for threads to get in a blocked state. As the experimental re-sults in Chapter 8 will show, lock-and-block constitutes a significant improvement overthe most-blocked heuristic.

Obviously, Lock and Block is inadmissible as deadlocks do not require all available re-sources to be locked. Nevertheless, the heuristic is more informative than most-blockedand hence preferable over most-blocked - the experimental results in Chapter 8 willsupport this claim.

Formular-Based Heuristic This heuristic aims to speed up the search for violationsof properties given by a logical formula f . The value h(s) estimates the minimal numberof transitions needed to make f true (or to make ¬f true, respectively). The heuristicwas devised and implemented for the SPIN-based heuristic model checker HSF-Spin[ELL01]. An integration into StEAM or other assembly-level model checkers would bedifficult, as the heuristic requires full information about how a logical expression (e.g.

Page 64: Esto es una cagada

64 CHAPTER 4. STEAM

in an assertion) relates to a variable’s name and its value, while the compiled machinecode operates on stack offsets and memory cells in the DATA- and BSS-sections.

4.5.2 Structural Heuristics

In contrast to error-specific heuristics, structural heuristics [GV02] are not designed tofind a certain type of error. Instead, they exploit structural properties of the underly-ing programming language to speed up the finding of errors in general. Two examplesof structural heuristics are the interleaving and branch-coverage heuristics that weredevised to be used in the Java PathFinder.

Structural heuristics differ from other heuristics in so far, as they prefer states thatmaximize a function m, rather than states that minimize the estimated distance to anerror. The function m may e.g. correlate with the degree of thread interleaving. Tofit into the concept of heuristic search algorithms such as best-first, which demand anestimated error distance of some state u, we need to subtract f(u) from a constant c,which is hopefully always greater than f(u) for all states u. Obviously, this makes moststructural heuristics inadmissible.

Interleaving With interleaving, those paths are explored first, which maximize thenumber of thread interleavings. Many program errors are caused by the illegal inter-leavings of threads. For instance, in a block of instructions, a thread may read thecontent (e.g. the value 1000) of a memory cell c (account balance) into a processor regis-ter, perform some arithmetic operation to the register (e.g. add 100) and write the resultback to c. If between the reading and the writing, another process is executed, whichchanges c (subtract 50), the contents of c become wrong (c now contains 1100 although itshould be 1050). Thus, maximizing the interleaving of threads, may speed up the findingof such concurrency errors.

Branch-Coverage With the branch-coverage heuristic the paths are favored, thatmaximize the number of branch instructions which were executed at least once. Agreater branch coverage also implies a greater source coverage of the program includingthose parts that may cause an error.

StEAM supports a subset of JPFs structural heuristic as well as some new ones. Besideslock-and-block, the new heuristics of StEAM regard the access to memory cells and theactiveness of threads.

Read-Write The read-write heuristic simply counts the number of reads and writes toany memory cells. States that maximize the number of reads and writes are preferred inthe exploration. The intuition behind the heuristic is obvious: Accesses to memory cellsimply changes to program variables. As safety and reachability properties (cf. Section2.1.6) in programs usually relate to variable values, more changes increase the proba-bility of an error.

Page 65: Esto es una cagada

4.5. HEURISTICS 65

Alternating Access The alternating access heuristic prefers paths, that maximizethe number of subsequent read and write accesses to the same memory cell in the globalvariable sections. The intuition is, that changes in a global variable x by one thread mayinfluence the behavior of another thread that relies on x. For example, a thread mayread the contents of x into a CPU register r, and perform some algebraic operations onr and write the result back to x. If in between another thread changes the content of x,a data inconsistency arises.

Threads Alive The number of threads alive in a program state is a factor that cansafely be combined with any other heuristic estimator. Obviously, only threads that arestill alive can cause an error.

Page 66: Esto es una cagada

66 CHAPTER 4. STEAM

Page 67: Esto es una cagada

Chapter 5

Planning vs. Model Checking

Automated planning has established itself as an important field of research in computerscience. Indicators for this are events such as the annual International Conference onAutomated Planning and Scheduling (ICAPS) 1 and the biennial International PlanningCompetition 2.

Research on planning has created sophisticated techniques that aim to alleviate thesearch for (preferably optimal) plans. This is all the more interesting, as the fields ofplanning and model checking are highly related and achievements in one field suchas heuristics and pruning techniques can easily be carried over from one field to theother. In fact, model checking tasks can straightforwardly be represented as planningproblems and vice versa. For example [Ede03] proposes a conversion from the inputlanguage of the model checker SPIN to an action planning description language, whichmakes protocols accessible to action planners.

Consequently, this chapter makes a disgression to the domain of planning and its rela-tion to model checking. We will point out similarities and differences between planningand model checking and discuss how the two fields of research can benefit from eachother. Finally, to prove the claimed relation between the two fields, we show an ap-plication of the model checker StEAM for solving instances of the n2 − 1-puzzle andmulti-agent planning problems.

Planning generally refers to the finding of a plan to reach a goal - i.e., the finding ofan operator sequence, which transforms an initial state into a goal state. The simplestclass of planning problems is called propositional planning. A propositional planningproblem (in STRIPS notation) is a finite state space problem P =< S,O, I,G >, whereS ⊆ 2AP is the set of states, I ∈ S is the initial state, G ⊆ S is the set of goal states, andO is the set of operators that transform states into states. Operators o = (P,A, D) ∈ Ohave propositional preconditions P , and propositional effects (A,D), where P ⊆ AP isthe precondition list, A ⊆ AP is the add list and D ⊆ AP is the delete list. Given a stateS with P ⊆ S, its successor S′ = o(S) is defined as S′ = (S \D) ∪A.

For an operator o = (P,D,A), let pre(o) = P , del(o) = D and add(o) = A. A planfor the propositional planning problem Π is a sequence of operators o1, .., om, implying

1http://www.icaps-conference.org/2http://ls5-www.cs.uni-dortmund.de/ simedelkamp/ipc-4/

67

Page 68: Esto es una cagada

68 CHAPTER 5. PLANNING VS. MODEL CHECKING

a sequence of states I = s1, .., sm+1 ⊇ G, such that for i = 1, ..,m, pre(oi) ⊆ si andsi+1 = (si \ del(oi)) ∪ add(oi).

Extensions to propositional planning include temporal planning, which allows parallelplan execution. In temporal planning, each operator o has a duration dur(o). In a par-allel plan execution, several operators o1, .., ol may be executed parallel in state s, if allpreconditions hold in s and if the effect of one operator does not interfere with the pre-conditions or effect of another operator - i.e.: For all i = 1, .., l: pre(oi) ⊆ s and for alli 6= j: del(oi)∩ pre(oj) = ∅ ∧ del(oi)∩ add(oj) = ∅. A common goal in temporal planning isto minimize the makespan of a plan. In parallel plans, each of its operator o has a timeτ(o) at which the operator starts. The makespan of a parallel plan o1, .., om is defined asthe point maxo∈o1,..,onτ(o)+dur(o), when all operators are completed. The case study insection 5.5 refers to planning in a multi-agent environment, where we want to minimizethe makespan of a plan, that gets a set of jobs done.

5.1 Similarities and Differences

Search An obvious similarity of planning and model checking is that both methodsrely on the exploration of state spaces. In model checking, the state space describes theset of configurations of a system model or program, while in planning the states consti-tute sets of propositions which are true in a state. Like transitions in model checkinglead from system configurations to system configurations, operators in planning leadfrom sets of propositions to sets of propositions.

Path Both, planning and model checking, return a sequence of state transitions (oper-ators) to the user. In model checking, a path from the initial configuration to an errorstate serves as a counter example (or error trail), which alleviates the task to find thesource of the error in the program or system specification. In planning, the returned se-quence of operators describes a plan which leads from the initial state to the designatedtarget state.

Quality In both technologies the returned path can be of different quality. In modelchecking, the quality of a counterexample is usually negatively correlated with its length,since shorter trails are in general easier to track by the user. The quality of a plan de-pends on the respective domain and the criterion that is to be optimized. For instance, ifthe criterion is the costs of the plan and each operator implies the same costs, a plan withless operators is always better. Alternatively, we may want to minimize the makespanof a parallel plan. In this case, a plan with more operators can be better.

Plan/error finding vs verification The goal of planning is always to find a plan -desirably the optimal one. An optimal plan can be obtained by applying an admissiblesearch strategy to the respective problem description. For instance, if we want to mini-mize the number of operators, a plan returned by breadth-first search is always optimal.When such a plan is found, the remaining state space of the planning problem is of nofurther interest.

Page 69: Esto es una cagada

5.1. SIMILARITIES AND DIFFERENCES 69

Figure 5.1: The Eight-, Fifteen-, and Twenty-Four-Puzzle.

The same applies, when we search for errors in programs. For instance, in the imple-mentation phase of a program, the presence of errors is very probable. Hence the goal isto find one error at a time and correct it with the help of the returned error trail. Here,directed search with appropriate heuristics can help to reduce the number of states vis-ited until an error is found. At a later point, e.g. before the software is shipped, we maybe interested in the verification of a program. This means, we need to do an exhaus-tive enumeration of all possible system states, while each visited state is checked forerrors. At this point, the use of directed search is not longer appropriate, as it merelyprioritizes the order in which the states are visited. Also, on a per-state base, heuristicsearch is usually computationally more expensive than undirected search methods, suchas depth-first or breadth-first search.

Planning via Model Checking The goal in the n2 − 1-puzzle (a.k.a. Sliding TilePuzzle) is to arrange a set of n2− 1 numbered tiles from left to right and top to bottom inascending order. As the only operation, the blank may switch positions with one of it’sneighbouring tiles. Figure 5.1 illustrates examples for n = 3, 4, 5.

The paper [HBG05] exploits a PDDL encoding of the n2 − 1 puzzle which can be solvedby established planners. Likewise, it is easy to formulate the puzzle as a C++ program,such that instances can be solved by StEAM. The main program is depicted in Figure 5.2.The program accepts as its parameters the scale n followed by a the initial arrangementof the tiles from left to right and top to bottom. The parsing and initialization is donein one atomic step. Afterwards, an infinite loop is executed which first tests, if thetiles are arranged in their goal configuration through an assertion. More precisely, theassertion is violated if and only if the goal state of the puzzle instance was reached. If theassertion holds, i.e., if at least one tile is not at its final position, the program uses a non-deterministic statement to choose one of the functions ShiftUp, ShiftDown, ShiftLeft andShiftRight, which exchange the blank with one of its neighboring tiles. Figure 5.3 liststhe function ShiftUp - the other three functions are implemented analogously.

We can invoke StEAM with breadth-first search to the instance of the 8-puzzle depictedin Figure 5.1 by:

steam -srctrl -BFS npuzzle 3 1 4 2 3 7 5 6 8 0 | grep Shift

The output of the model checker is pipelined to the Unix command grep such that onlythe calls to the Shift methods in the returned error trail are displayed. The resulting

Page 70: Esto es una cagada

70 CHAPTER 5. PLANNING VS. MODEL CHECKING

output is:

Step 8: Thread 1 - 96: ShiftLeft();Step 17: Thread 1 - 90: ShiftUp();Step 26: Thread 1 - 90: ShiftUp();Step 35: Thread 1 - 96: ShiftLeft();

As you can easily verify, this is the shortest ”plan” to solve the puzzle instance.

As a second example, we present a method, which uses StEAM as a planner in concur-rent multi-agent systems. In contrast to classical planning, we avoid the generation ofan abstract model. The planner operates on the same compiled code that controls theactual system.

Page 71: Esto es una cagada

5.1. SIMILARITIES AND DIFFERENCES 71

int blank;int n;short tmp;short * tiles;

void main(int argc, char ** argv)

int i=-1;

BEGINATOMIC;

if(argc<6) fprintf(stderr, "Illegal number of Arguments!\n");exit(1);

n=atoi(argv[1]);if(argc!=n * n+2)

fprintf(stderr, "Illegal Number of Arguments\n");exit(1);

tiles=(short * ) malloc(n * n* sizeof(short));s=(Shifter ** ) malloc(4 * sizeof(Shifter * ));for(i=0;i<n * n;i++)

tiles[i]=atoi(argv[i+2]);if(tiles[i]==0) blank=i;

ENDATOMIC;

while(1) for(i=0;i<n * n && tiles[i]==i;i++);VASSERT(i<n * n);RANGE(i,0,3);switch(i) case 0:

ShiftUp();break;

case 1:ShiftDown();break;

case 2:ShiftLeft();break;

case 3:ShiftRight();break;

Figure 5.2: Main Program of the C++ Encoding of the n2 − 1 puzzle.

Page 72: Esto es una cagada

72 CHAPTER 5. PLANNING VS. MODEL CHECKING

void ShiftUp() BEGINATOMIC;if(blank>=n)

tmp=tiles[blank-n];tiles[blank-n]=0;tiles[blank]=tmp;blank-=n;

ENDATOMIC;

Figure 5.3: The method ShiftUp of the C++ Encoding of the n2 − 1 puzzle.

5.2 Related Work

Software model checking is a powerful method, whose capabilities are not limited to thedetection of errors in a program but to general problem solving. For instance, [EM03]successfully uses JPF to solve instances of the sliding-tile puzzle. The adaption is simple:action selection is incorporated as non-deterministic choice points into the system.

Symbolic model checkers have been used for advanced AI planning, e.g. the Model-Based Planner (MBP) by Cimatti et al.3 has been applied for solving non-deterministicand conformant planning problems, including partial observable state variables and tem-porally extended goals.

An architecture based on MBP to interleave plan generation and plan execution is pre-sented in [BCT03], where a planner generates conditional plans that branch over obser-vations, and a controller executes actions in the plan and monitors the current state ofthe domain.The power of decision diagrams to represent sets of planning states more efficiently hasalso been recognized in probabilistic planners, as the SPUDD system [HSAHB99] andits real-time heuristic programming variant based on the LAO* algorithm [HZ01] show.These planners solve factored Markov decision process problems and encode probabilitiesand reward functions with algebraic decision diagrams.

TL-Plan [BK00] and the TAL planning system [KDH00] apply control pruning rules,specified in first order temporal logic. The hand-coded rules are associated together withthe planning problem description to accelerate plan finding. When expanding planningstates the control rules for the successors are derived by progression.

Planning technology has been integrated in existing model checkers, mainly by provid-ing the option to accelerate error detection by heuristic search [ELLL04]. These effortsare referred to as directed model checking. First model checking problems have been au-

3http://sra.itc.it/tools/mbp

Page 73: Esto es una cagada

5.3. MULTI AGENT SYSTEMS 73

tomatically converted to serve as benchmarks for international planning competitions4.

The work described in [AM02] reduces job-shop problems to finding the shortest pathin a stopwatch automaton and provides efficient algorithms for this task. It shows thatmodel checking is capable to solve hard combinatorial scheduling problems.

The paper [DBL02] describes a translation from Level 3 PDDL2.1 to timed automata.This makes it possible to solve planning problems with the real-time model checker UP-PAAL. The paper also describes a case study about an implementation of the translationprocedure which is applied to the PDDL description of a classical planning problem.

Some work on multiagent systems shares similarities with our proposal. In [dWTW01]a framework called ARPF is proposed. It allows agents to exchange resources in anenvironment which fully integrates planning and cooperation. This implies, that theremust also be an inter-agent communication.

The work [BC01] proposes model checking for multi-agent systems, with a frameworkthat is applied to the analysis of a relevant class of multi-agent protocols, namely se-curity protocols. In a case study the work considers a belief-based exploration of theAndrew Authentication Protocol with the model checker nuSMV that is defined by a setof propositions and evolutions of message and freshness variables.

Multi-agent planning is an AI research of growing interest. A forward search algo-rithm [Bre03] based on the single-agent planner FF that solves multiagent problemssynthesizes partially ordered temporal plans and is described in an own formal frame-work. It also presents a general distributed algorithm for solving these problems withseveral coordinating planners.

The paper [PvV+02] is closely related to the work at hand. It describes ExPlanTech,as an enhanced implementation of the ProPlanT multiagent system used in an actualindustrial environment. ExPlanTech introduces the concept of meta-agents, which donot directly participate in the production process, but observe how agents interact andhow they carry out distributed decision making. The meta-agent in ExPlanTech servesa similar purpose as the model checker in our system, as it is able to induce efficiencyconsiderations from observations of the community workflow. Still, the meta-agent usesits own abstract model of the system.

The approach we consider in this paper differs from all the above in that it uses programmodel checking and that it is applied to multiagent systems in order to interleave plan-ning and execution and learn from an existing executable of a given implementation.

5.3 Multi Agent Systems

Modern AI is often viewed as a research area based on the concept of intelligent agents[RN95]. The task is to describe and build agents that receive signals from the environ-ment. In fact, each such agent implements percepts to actions. So distributed multiagentsystems have become an important sub-field of AI, and several classical AI topics are nowbroadly studied in a concurrent environment. Planning for multiagent systems extendsclassical AI Planning to domains where several agents act together. Application areas

4http://ipc.icaps-conference.org

Page 74: Esto es una cagada

74 CHAPTER 5. PLANNING VS. MODEL CHECKING

include multirobot environments, cooperating Internet agents, logistics, manufacturingetc. Approaches differ for example in their emphasis on either the distributed plan gen-eration or the distributed plan execution process, and in the ways communication andperception is used.

The largest discrepancy between generating a plan and executing it, is that multiagentplanners usually cannot directly work on existing programs that are executed. Here, weshow how StEAM can be utilized to serve as a planning and execution unit to improvethe performance of concurrent multiagent system implementations.

Directing the exploration towards the planning goal or specification error turns out tobe one of the chances to tackle the state explosion problem, that arises due to the com-binatorial growth of system states. As StEAM works with different object-code distancemetrics to measure the efforts needed to encounter the error, we can also address theevaluation problem that multiagent systems or multirobot teams have. By having ac-cess to object code we can attribute execution time to a set of source code instructions.

The goal is to develop a planning system which interacts with the software units to im-prove the quality of the results. Since the performance of the system on test increasesover time, our approach considers incremental learning. As a special feature, the plan-ner should not build an abstract model of the concurrent system. Instead, planningshould be feasible directly on the implementation of the software units. This introducesplanning on the source-code and assembly-level.

5.4 Concurrent Multiagent Systems in StEAM

We regard a concurrent multiagent system as a set of homo- or heterogeneous autonomoussoftware units operating in parallel. Some components of the system are shared amongall units - such as tasks or resources. The units cooperate to reach a certain goal and theresult of this cooperation depends on what decisions are made by each agent at differenttimes.

Concurrent multiagent systems can be implemented into StEAM in a straightforwardfashion. Software units are represented as threads. Each unit is implemented by aninstance of a corresponding C++-class. Each such class is derived from StEAM’s thread-class IVMThread. After creating a class instance, a call to the start-method adds it tothe list of concurrent processes running in the system. The shared components of thesystem are represented by global variables.

Model checking in StEAM is done by performing exploration on the set of system states,which allows to store and retrieve memorized configurations in the execution of the pro-gram. In the simulation mode e.g. controlled by the user, by random choices, or by anexisting trace, the model checker executes the program along one sequence of sourcecode instructions.

Here, StEAM serves two purposes. First as a platform to simulate the system, secondas a planner which interacts with the running system to get better results. We assumean online scenario, where planning has to be done in parallel to the execution of thesoftware units. In particular, it is not possible to find a complete solution in advance.

Page 75: Esto es una cagada

5.4. CONCURRENT MULTIAGENT SYSTEMS IN STEAM 75

We assume that the planner performs a search on system states, which are of the sametype as that of the actual concurrent system. As a result, the planner finds a desired tar-get state t, which maximizes the expected solution quality. State t will not necessarilyfulfill the ultimate goal, because an exhaustive exploration is in general not possible dueto time and space restrictions. As the next step, the planner must carry over the searchresults to the environment. However - in contrast to the plan generation - the plannerdoes not have full control over the actual system, due to non-deterministic factors in theenvironment, such as the execution order of the agents. Instead, the planner must com-municate with the software units to influence their behavior, so that the actual systemreaches state t or a state that has the same or better properties as t w.r.t. the solutionquality.

5.4.1 Interleaving Planning and Simulation

We want to integrate the required planning ability into the model checker with mini-mal changes to the original code. First, a model checker operates on sequential processexecutions, rather than on parallel plans. To address this issue, the concept of parallelsteps is added to StEAM. A parallel step means that all active processes are iterated oneatomic step in random order. With parallel steps, we can more faithfully simulate theparallel execution of the software units.

Communication is realized by a concept we call suggestion. A suggestion is essentiallya component of the system state (usually a shared variable). Each process is allowedto pass suggestions to the planner (model checker) using a special-purpose statementSUGGEST. The SUGGEST-statement takes a memory-pointer as its parameter, whichmust be the physical address of a 32-bit component of the system state. The modelchecker collects all suggestions in set SUG. When the model checker has finished theplan generation, it overwrites the values of all suggestions in the current state of theactual environment with the corresponding values in target state t. The software unitsrecognize these changes and adapt their behavior accordingly. Formally, the role of sug-gestions can be described as follows: Let V be the set of variables that form a state inour system. A system state is defined as a function s : V → IR, which maps variables totheir domains. Let s be the root state of a planning phase and t the designated targetstate. Then, the subsequent simulation phase starts at state:

s′ = s \

⋃v∈Sug

v 7→ s(v)

∪ ⋃v∈Sug

v 7→ t(v).

A concrete example will be given in the case study in Section 5.5.

5.4.2 Combined Algorithm

The proposed system iterates two different phases: planning and simulation. In theplanning phase all units are frozen in their current action. Starting from a given systemstate, the model checker performs a breadth-first exploration of the state space until atime limit is exceeded or no more states can be expanded. For each generated state u an

Page 76: Esto es una cagada

76 CHAPTER 5. PLANNING VS. MODEL CHECKING

Procedure InterleaveInput: The initial state s of the system, time limit θ, evaluation function h1. loop2. besth← h(s); t← s; open← s3. while (time < θ ∧ open 6= ∅) /* start of planning phase */4. u← getNext(open)5. Γ← expand(u) /* expand next state in horizon list */6. for each v ∈ Γ7. if (h(v) > besth) t← v; besth← h(v)8. for each σ ∈ SUG9. s.σ ← t.σ /* write suggestions */10. c← 011. while (¬c) /* begin of simulation phase */12. for each agent a: s← iterate(s, a) /* do a parallel step */13. c← evaluateCriterion()

Figure 5.4: The Implementation of the Interleaved Planning and Simulation System.

evaluation function value h(u) is calculated, which measures the solution quality in u.When a time limit θ is reached, the state with the best h-value is chosen as the targetstate t and the simulation phase starts. At the beginning of the simulation phase, foreach element in the set of suggestions, the model checker overwrites the value of thecorresponding component in s with the value in t. Then parallel steps are executed in suntil the criterion for a new planning phase is met.

Algorithm Interleave in Figure 5.4 illustrates the two phases of the system, and is basedon top of a general state expanding exploration algorithm. Besides the initial state theexploration starts from, it takes the preference state evaluation function as an addi-tional parameter. A time threshold stops planning in case the model checker does notencounter optimal depth. For the ease of presentation and the online scenario of in-terleaved execution and planning, we chose an endless loop as a wrapper and have notincluded a termination criterion in case the goal is encountered.

The function evaluateCriterion determines, whether a new planning phase should beinitiated or not. The function is not defined here, because this criterion varies dependingon the kind of system we investigate. The function iterate(s,a) executes one atomic stepof an agent (thread, process) a in state s and returns the resulting state.

Figure 5.5 shows how the generated system states switch from a tree structure in theplanning phase to a linear sequence in the simulation phase. We see that the simulation(path on right side of the figure) is accompaigned by intermediate planning phases (treestructure, top left and bottom right of the figure). The search tree contains the (interme-diate) target state t - i.e. the generated state with the highest h-value. As we will see,state t is potentially - but not necessarily traversed in the simulation phase.

Ideally, the two phases run in parallel, that is the running multiagent system is not in-terrupted. Since we use the same executable in our model checker to simulate and plan,

Page 77: Esto es una cagada

5.5. CASE STUDY: MULTIAGENT MANUFACTURING PROBLEM 77

Figure 5.5: State generation in the two different exploration phases.

we store system states that are reached in the simulations and suspend its execution.The next planning phase always starts with the last state of the simulation, while thesimulation continues with the root node of the planning process. The information thatis learned by the planning algorithm to improve the execution is stored in main memoryso it can be accessed by the simulation.

5.5 Case Study: Multiagent Manufacturing Problem

In the following we formalize a special kind of a concurrent multiagent system, whichis used to evaluate our approach. A multiagent manufacturing problem, MAMP forshort, is a six-tuple (A, J, R, req, cap, dur), where A = a1, . . . , an denotes a set of agents,J = j1, . . . , jm, is a set of jobs including the empty job , and R = r1, . . . , rl is a setof resources. The mappings req, cap, and dur are defined as follows:

• req : J → 2R defines the resource requirements of a job,

• cap : A→ 2R denotes the capabilities of an agent, and

• dur : J → IN is the duration of a job in terms of a discrete time measure withdur() = 0.

A solution to a MAMP m = (A, J,R, req, cap, dur), is a function sol : A → 2IN×J . For(t, ι) ∈ IN×J , let job((t, ι)) = ι and time((t, ι)) = t. Furthermore, let alljobssol : A→ 2J bedefined as alljobssol(a) =

⋃s∈sol(a) job(s). We require sol to have the following properties.

i. For each sol(a) = (t0, ι0), . . . , (tq, ιq) we have that i 6= j implies either (ιi = ιj = )or ιi 6= ιj and for each i ≥ 0 we have ti+1 ≥ ti + dur(ιi).

ii. For each s ∈ sol(a): req(job(s)) ⊆ cap(a).

Page 78: Esto es una cagada

78 CHAPTER 5. PLANNING VS. MODEL CHECKING

iii. For all a 6= a′ and all s ∈ sol(a), s′ ∈ sol(a′) with req(job(s)) ∩ req(job(s′)) 6= ∅ wehave either time(s) + dur(job(s)) ≤ time(s′) or time(s) ≥ time(s′) + dur(job(s′))

iv. For all a 6= a′ we have (alljobssol(a)\) ∩ alljobssol(a′) = ∅.

v. Last but not least, we have⋃

a∈A alljobssol(a) = J\.

Property i. demands, that each agent can do at most one job at a time. Property ii. says,that an agent can only do those jobs, it is qualified for. Property iii. means, that eachresource can only be used by one agent at a time and properties iv. and v. demand, thateach job is done exactly once.

An optimal solution sol minimizes the makespan τ until all jobs are done:

τ(sol) = maxa∈A,s∈sol(a)

(time(s) + dur(job(s)).

MAMPs are extensions to job-shop scheduling problems as described e.g. in [AM02]. Asthe core difference, MAMPs also have a limited set of agents with individual capabilities.While in a job-shop problem one is only concerned with the distribution of resources tojobs, MAMPs also require that for each job we have an agent available that is capable touse the required resources.

5.5.1 Example Instance

Apparently, the solution quality of a MAMP relates to the degree of parallelism in themanufacturing process. A good solution takes care that each agent works around theclock - if possible. Consider a small MAMP m = A, J,R, req, cap, dur with A = a1, a2,J = j1, j2, j3, R = r1, r2, r3, r4, r5, req(j1) = r1, r2, req(j2) = r3, r4, and req(j3) =r3, r5. Furthermore cap(a1) = R, cap(a2) = r3, r4, r5, and dur(j1) = 2, dur(j2) =dur(j3) = 1.

Figure 5.6 illustrates a best, an average and a worst case solution for m. Here, we have atime line extending from left to right. For each solution, the labeled white boxes indicatethe job, which is performed by the respective agent at a given time. Black boxes indicatethat the agent is idle. In the optimal solution, a1 starts doing j1 at time = 0. Parallel tothat, a2 performs j2 and afterwards j3. The total time required by the optimal solutionis 2. Note, that the solution does not contain any black boxes. In the average solution,which is in the middle, a1 executes j2 at time = 0 and j1 afterwards. This implies that a2

has to be idle up to time = 1, because it is incapable to use r1 needed for j1 and r3 is usedby j1. Then, a2 can at least commence doing j3 at time = 1 when r3 is available again.The time needed by the average solution is 3. Finally, in the worst solution, which is thebottom-most in Figure 5.6, a1 first performs j2, then j3 and j1, while a2 is compelled tobe idle at all time. The worst solution needs 4 time units to get all jobs done.

5.5.2 Implementation

For the chosen concurrent multiagent manufacturing system, our goal is to develop aplanning procedure for an instance that interacts with the multiagent system to max-imize the level of parallelism and thus to minimize the idle times of agents, which is

Page 79: Esto es una cagada

5.6. SEARCH ENHANCEMENTS 79

Figure 5.6: Three different solutions for the same MAMP m.

expected to improve the quality of the resulting MAMP-solution. Additionally, as animmediate result of our proposal, we avoid the generation of an abstract model of thesystem. Instead, we plan directly on the actual implementation.

Information about the state of the jobs and the available resources are stored in globalvariables. When running, each agent a automatically searches for available jobs he isqualified for and executes them. If a finds a free job j, such that req(j) ⊆ caps(a) and allresources in req(j) are available, the agent requests the resources and starts executingthe job. After finishing the job - i.e. when dur(j) time units have passed, a will releasethe resources and search for the next job. If a cannot find a job, he waits one timeunit and searches again. In the case of our multiagent system, each agent passes as asuggestion the local variable assigned, which stores the next job to be executed by theagent. Before an agents autonomously seeks a job, it checks if the default-value of itsassigned-variable has changed. This implies, that - as a result of the previous planningphase - the model checker has assigned a job for this agent. If this is the case, the agentskips the procedure, which looks for a feasible job and starts allocating the requiredresources.

5.6 Search Enhancements

When using breadth-first search, for n active processes (agents), the number of expandedstates in the planning phase is bounded by O(nd) where d is the depth of the searchtree. Without search enhancements, it is hard to reach a search depth which is deepenough to gain information for the simulation phase. As a result, the solutions obtainedthrough the combination of planning and simulation would not be any better, than thatof a purely autonomous system without planning. Several pruning and refinement tech-niques help our planning system to reach a larger search depth.

Page 80: Esto es una cagada

80 CHAPTER 5. PLANNING VS. MODEL CHECKING

5.6.1 State Space Pruning

First of all, the model checker StEAM uses a hash table to store the already visitedstates. When a state is expanded, only those successor states are added to the Open list,which are not already in the hash table.

Second for a multiagent management system, it does not make sense to generate thosestates of the search tree that correspond to an execution step of a job. Such an executiononly changes local variables and thus does not influence the behavior of the other agents.To realize this pruning, threads can announce themselves busy or idle to the modelchecker using special-purpose statements. In the case of the MAMP, an agent declareshimself busy, when he starts working on a job and idle when the job is finished. Whenexpanding a state, only those successors are added to the Open list that correspond tothe execution of an idle thread.

Third, an idle agent looks for a job in one atomic step. Therefore, the execution of anagent that does not result in a job assignment yields no new information compared tothe predecessor state. For the given multiagent system, this implies that we only needto store those successor states that result in the change of a suggestion (i.e. in a jobassignment).

5.6.2 Evaluation Functions

As shown in Figure 5.5 the planner cannot always enforce the target state t to bereached, due to clashes in the simulation phases. A clash occurs, if an agent a is it-erated before another agent b and autonomously picks a job assigned to b. Our approachconsiders the value of suggestions to reflect the individual choices of each agent - in thiscase the job it will do next. The value of this choice is determined by the agent or theplanner, before the job itself is taken. Since an agent has no information about the in-ternal values of other agents, clashes are inevitable. This raises the question, what theplanner can rely on as a result of its interaction with the system. The answer to thislies in the choice of the evaluation function that is used to determine the intermediatetarget state during the planning phase. In our case, we use the number of allocated jobsas the evaluation function h, where allocated means that an agent is working on thatjob. It holds, that h(s) ≤ h(s′) for any successor state of s, since the counter of assignedjobs never decreases - even if a job is finished. In other words h describes the number ofcurrently allocated, plus the number of finished jobs.

Theorem 5.1. Let t be the intermediate target state in the search tree. Furthermore, letsi describe the i-th state of the simulation phase and pi denote the i-th state on the pathfrom the initial state to t in the search tree, where i ∈ 0, . . . ,m and pm = t. Then foreach 0 ≤ i ≤ m we have that h(si) ≥ h(pi).

Proof. In fact, we have h(s0) = h(p0) and h(s1) ≥ h(t). The equality h(s0) = h(p0) istrivial, since the two states are equal - except for the values of the suggestions. Thisimplies that in particular, the variable which counts the number of assigned jobs hasthe same value in s0 and p0. Furthermore all suggestions in s0 are initialized with thevalues from t. If we now consider the first simulation step, there are two possibilities:

Page 81: Esto es una cagada

5.6. SEARCH ENHANCEMENTS 81

In the first case, we have no clashes during the parallel step performed between s0 ands1. This implies that each agent picks the job assigned by the model checker (if any).So we have at least h(s1) = h(t). Additionally, agents that have no job in t may pick anunassigned job, so that h(s1) > h(t).In the second case we assume to have clashes. For each such clash, we have one agent,that cannot pick its assigned job which decreases h(s1) by one. However, we also haveanother agent, that picks a job, although it has no job in t, which increases h(s1) by one.So, a clash does not influence the value of h(s1) and thus we have h(s1) ≥ h(t).

Altogether, we have h(s0) = h(p0), h(s1) ≥ h(t) and since the number of taken jobs neverdecreases, it holds that for all 1 ≤ i ≤ m, we have h(si) ≥ h(si−1) and h(pi) ≥ h(pi−1) andin particular h(t) ≥ h(pi−1), which implies h(si) ≥ h(pi) for all 0 ≤ i ≤ m.

Experimental results for multi agent planning can be found in section 8.5.

A main insight of the case study is that assembly model checking allows us to develop aplanning methodology, which does not require the construction of an abstract model ofthe environment. Instead, planning is performed directly on the compiled code, which isthe same as used by the simulation. Also, it shows that the capabilities of StEAM arenot limited to verification of software.

In the future, we would like to try the approach on actual multiagent systems - be itpure software environments or physical systems like a group of robots. Certainly, manymultiagent systems are more complex than the MAMP architecture presented here asthey e.g. also involve inter-agent communication. We kept our framework simple forderiving a multiagent prototype to test our planning approach.

In the long term, both areas may highly benefit from the assembly model checking ap-proach, because it eliminates the task of building an abstract model of the actual systemor program. Not only does this save a considerable amount of resources, usually spenton constructing the models, but performing model checking and planning on the actualimplementation also avoids possible inconsistencies between the model and the real sys-tem.

Page 82: Esto es una cagada

82 CHAPTER 5. PLANNING VS. MODEL CHECKING

Page 83: Esto es una cagada

Chapter 6

Hashing

Hashing is essential in software verification, since concurrent programs exhibit a largenumber of duplicate states. In fact, it turns out that ignoring duplicates in StEAM barsthe model checker from finding errors even in very simple programs due to memoryrestrictions.

Hashing is not heavily discussed in other fields related to state space search - simply be-cause the portion of hashing compared to the overall computational effort of the searchis negligible. In this context, program model checking takes a special position among theuniverse of state space problems, since hashing shows to be the computationally mostexpensive operation when expanding a state. This can be explained by the exceptionallylarge state description of a program (cf. Section 4.3.1). Most state transitions of a pro-gram relate to a single statement or a short sequence of statements that only change asmall fraction of the state, e.g., the memory cell of a variable. Since only the changedparts need to be stored, generating a successor state is computationally inexpensive ifwe consider the size of the underlying state description. In contrast, conventional hashfunctions process the entire state, and as a consequence computing the hash code foreach newly generated state dramatically slows down the exploration.

This makes the design of efficient hash functions a central issue of program model check-ing. An obvious solution to overcome the complexity stated above, is to devise an incre-mental hash function, i.e. one that computes the hash code of a state relative to thatof it’s immediate predecessor by looking at the changes caused by the correspondingtransition. Incremental hashing is used in other areas, such as text search. However,existing approaches rely on linear structures with a fixed size. In contrast, a programstate is inherently dynamic, as new memory regions may be allocated or freed. Also, itis structured, in the sense that the state description is subdivided in several parts, suchas the stacks and the global variables.

The following chapter is dedicated to the design of an incremental hash function ondynamic, structured state descriptions that can be implemented into StEAM. We firstdiscuss hashing in general. We then proceed by devising and incremental hash functionfor static linear state vectors. The approach is exemplified for AI puzzles and actionplanning. Finally, we enhance our approach to be usable for dynamic and structuredstate descriptions. The contribution of incremental hashing to the field of program modelchecking is experimentally evaluated using an implementation in the StEAM model

83

Page 84: Esto es una cagada

84 CHAPTER 6. HASHING

checker in chapter 8.

6.1 Hash Functions

A hash function h : S → K maps states from a universe S to a finite set of keys K (orcodes). A good hash function is one that minimizes the number of collisions. A hashcollisions occurs, if for two states s 6= s′ ∈ S, h(s) is equal to h(s′). Since, the numberof possible state configurations is usually greater than |K| (sometimes infinite), hashcollisions cannot be avoided. In state space search, hashing is used to compute a tableaddress on which to store a generated state or to lookup a previously stored one. Asa necessary criterion, to minimize the number of hash collisions, the function must beparameterized with all state components.

Distribution To achieve a uniform distribution of the key mapped to the set of ad-dresses, a random experiment can be performed. Computers, however, can only simulaterandom experiments, aiming at number sequences whose statistical properties deviateas less as possible from the uniform distribution. The Lehmer-generator is a pseudo ran-dom number generator on the basis of linear congruence, and is one of the most commonmethods for generating random numbers. A sequence of pseudo-random numbers xi isgenerated according to x0 = b and xi+1 = (axi + c) mod m for i ≥ 0. A good choice is theminimal standard generator with a = 75 = 16, 807 and m = 231−1. To avoid encounteringoverflows, one uses a factorization of m.

Remainder Method If one can extend S to ZZ, then T = ZZ/mZZ is the quotient spacewith equivalence classes [0], . . . , [m− 1] induced by the relation

z ∼ w iff z mod m = w mod m.

Therefore, a mapping h : S → 0, 1, . . . ,m − 1 with h(x) = x mod m distributes S onT . For the uniformity, the choice of m is important: for example, if m is even then h(x)is even if and only if x is. The choice m = rw, for some w ∈ IN , is also not appropriate,since for x =

∑li=0 air

i we have

x mod m =

(l∑

i=w

airi +

w−1∑i=0

airi

)mod m =

(w−1∑i=0

airi

)mod m

This means that the distribution only takes the last w digits into account.

A good choice for m is a prime which does not divide a number ri± j for small j, becausem | ri ± j is equivalent to rimod m = ∓j so that (case +)

x mod m = j ·l∑

i=0

ai mod m,

i.e., keys with same (alternating) sum of digits are mapped to the same address.

Page 85: Esto es una cagada

6.2. EXPLICIT HASHING 85

6.2 Explicit Hashing

Explicit hashing schemes store the entire state description, without loss of information.In practice, this is usually realized using a pointer table H of size m. For state s withh(s) = c, it is first tried to store s at H[c]. If H[c], already holds a state s′, it is determined,whether s′ is identical to s. In the latter case, s is not stored. Otherwise, we have a hashcollision, that has to be resolved. Resolving can be done, by either probing or successorchaining.

Probing For probing, a function π is used to traverse a sequence of successor positionsH[π(c)], ..,H[π(π(c))], .. in the hash table. This is done, until a position H[c′] is encoun-tered, such that one of the following holds:

• If H[c′] is unoccupied, it is used to store s.

• If H[c′] holds a duplicate of s, the state is not stored.

• If c = c′, the hash table is full. In this case, it must be decided to either not store s,replace another stored state by s or resize the hash table.

Successor Chaining For successor chaining, the hash table entries store a linked listof different states with the same hash value. As an advantage over probing, only stateswith the same hash value must be compared.

Figure 6.1 illustrates an example of four states s1, s2, s3, s4 to be inserted into a hashtable of size 6 using either probing or successor chaining. Let h(s1) = h(s3) = 2 andh(s2) = h(s4) = 3. Furthermore, linear probing is used, i.e. π(c) = c + 1. Obviously, forprobing the hashing scheme needs four state comparisons: s3 ↔ s1, s3 ↔ s2, s4 ↔ s2

and s4 ↔ s3. In contrast, successor chaining only needs two state comparisons: s3 ↔ s1

and s4 ↔ s2. Alternatively, to prevent unnecessary comparisons for probing, the hashcode can be integrated into the state description. However, this implies an additionalmemory overhead. As another advantage of successor chaining, the linked list of a hashtable entry can grow to an arbitrary size. Therefore, the table never becomes full.

A clear disadvantage of successor chaining is the need for a successor pointer that isstored with each hashed state. If that pointer is integrated into the state description, itraises the memory requirements per state by the size of a memory pointer (4 bytes ona 32-bit system). This implies wasted memory in settings, where states are not explic-itly hashed (cf. 6.2.1), as the pointers are never used. Alternatively, one can define acontainer structure like hash node, which holds a pointer to the stored state and to thesuccessor hash node. However, this raises the memory requirements for a stored stateby another 4.

Dynamic Arrays As an alternative, which is also used in StEAM, a hash table entrymay contain a pointer to an array of states, whose size may dynamically grow, whenneeded. This constitutes a compromise between the memory-efficiency of probing andthe speed of successor chaining.

Page 86: Esto es una cagada

86 CHAPTER 6. HASHING

Figure 6.1: Example for explicit hashing with probing (top) and successor chaining (bot-tom).

Here, when an array in the hash table gets full, its size is doubled. Furthermore -without loss of generality - we assume that a memory pointer takes 4 bytes. Let n be thenumber of stored states.

Theorem 6.1. Using dynamic arrays, insertion of one element in the hash table is possi-ble in amortized constant time, while the memory requirements are less or equal than forsuccessor chaining.

Proof. Assume, we want to insert n elements in the hash table. Let A1, .., Ar denotethe set of arrays, which hold at least one element. Obviously, we have r ≤ n.

We start with the proof for the memory requirements. For successor chaining, we alwaysneed 4n bytes to store the pointer to the element and an additional 4n bytes for thesuccessor pointer, which results in a total memory requirement of 8n in all cases.

Moreover, let ni be the number of elements stored in Ai. By definition, the maximalcapacity of Ai is 2dlog(ni)e bytes, which implies a total memory requirement of

∑ri=1 4 ·

2dlog(ni)e for dynamic arrays.

In the worst case, all arrays are half full, i.e. dlog(ni)e = log(ni), which yields a memoryrequirement of

∑ri=1 8ni = 8n bytes.

In the best case, we have r = 1 and there is only one unused element in A1, i.e. we havelog(n + 1) = dlog(n)e, which implies a memory requirement of 4n + 4 bytes. The generalobservation is, that the memory requirement lies in the interval [4m + 4, 8m].

Next, we prove the amortized constant time. This can be done by an induction over thenumber of used arrays r.

r=1: The overall time for inserting n elements in a dynamic array consists of the timeneeded to store the elements and the time needed to resize the array, when it is full.To resize an array from a capacity m to 2m, the new memory needs to be allocated, theelements must be copied from the old memory and the old memory must be freed. Letc, k, l be constants, where c is the time needed to store an element in the array (withoutresizing). The constant k shall denote the time required to read an entry from a memorycell and write it to another. Moreover, l constitutes the time needed to allocate memoryfor the new capacity of an array and to free the old memory. Thus, the time needed to

Page 87: Esto es una cagada

6.2. EXPLICIT HASHING 87

insert n elements is

n · c +log(n)∑i=1

(2i · k + l)

=log(n)∑i=1

2i · c +log(n)∑i=1

(2i · k + l)

= n · c + n · (k + l)= n · (c + k + l)

This means, the insertion of one element in the hash table is possible in amortized con-stant time c + k + l.

r→r+1: For r+1 we have one more array in which n′ ≤ n elements are inserted. For eachof the n′ elements that fall in the new array, we know by the induction hypothesis thatinsertion is possible in amortized constant time t. Moreover it is a trivial observation,that the remaining n − n′ insertions into r arrays cause less or equally many arrayresizings like the insertion of n elements. For the latter, element insertion is possible inamortized constant time t′ according to the induction hypothesis. Thus, for r + 1 arrays,the insertion of n elements is possible in n′ · t + (n − n′) · t′, which implies that eachinsertion takes amortized time t · n′/n ·+t′ − t′ · n′/n ≤ t + t′.

6.2.1 Bit-State Hashing

Bit-state hashing [Hol98] belongs to the class of compacting or approximate hash func-tions. Instead of storing generated states explicitly, the state is mapped to one or morepositions within a bit vector v: Starting with all bits in v set to 0, for each generated states a set of p independent hash functions is used to compute bit positions h1(s), .., hp(s) inv. If all bits at the corresponding positions in v are set to 1, s is considered a duplicate.Otherwise, s inserted into the open list and the corresponding bits in v are set to zero.The approach allows a compact memorization of the set of closed states and is henceparticularly efficient in combination with variations of depth-first search where the setof open states is generally small. In consequence, larger portions of the state space canbe explored. As a drawback the used algorithm becomes incomplete, as hash collisionslead to false prunings in the search tree which may hide the error state.

Bit-state hashing is also the approach of the widely used model-checker SPIN [Hol03],which applies it in the so-called Supertrace algorithm.

Let n be the number of reachable states and m be the maximal number of bits avail-able. As a coarse approximation for single bit-state hashing with n < m, the aver-age probability P1 of a hash collision during the course of the search is bounded byP1 ≤ 1

n

∑n−1i=0

im ≤ n/2m, since the i-th element collides with one of the i − 1 already

inserted elements with a probability of at most (i − 1)/m, 1 ≤ i ≤ n. For multi-bithashing using h (independent) hash-functions with the assumption hn < m, the averageprobability of collision Ph is reduced to Ph ≤ 1

n

∑n−1i=0 (h · i

m)h, since i elements occupy at

Page 88: Esto es una cagada

88 CHAPTER 6. HASHING

most hi/m addresses, 0 ≤ i ≤ n − 1. In the special case of double bit-state hashing, thissimplifies to

P2 ≤1n

(2m

)2 n−1∑i=0

i2 = 2(n− 1)(2n− 1)/3m2 ≤ 4n2/3m2.

6.2.2 Sequential Hashing

An attempt to remedy the incompleteness of partial search is to re-invoke the algorithmseveral times with different hash functions to improve the coverage of the search tree.This technique, called sequential hashing, successively examines various beams in thesearch tree (up to a certain threshold depth). In considerably large protocol verificationproblems, Supertrace with sequential hashing succeeds in finding bugs but still returnslong error trails. As a rough estimate on the error probability we take the following. Ifin sequential hashing exploration with the first hash function covers c/n of the searchspace, the probability that a state x is not generated in d independent runs is (1− c/n)d,such that x is reached with probability 1− (1− c/n)d.

6.2.3 Hash Compaction

Like bit-state hashing, the hash compaction method [SD96] aims at reducing the mem-ory requirements for the state table. However, it stores a compressed state descriptor ina conventional hash table instead of setting bits corresponding to hash values of the statedescriptor. The compression function c maps a state to a b-bit number in 0, . . . , 2b − 1.Since this function is not surjective, i.e., different states can have the same compression,false positive errors can arise. Note, however, that if the probe sequence and the com-pression are calculated independently from the state, the same compressed state canoccur at different locations in the table.

In the analysis, we assume that breadth-first search with ordered hashing using openaddressing is applied. Let the goal state sd be located at depth d, and s0, s1, . . . , sd be ashortest path to it.

It can be shown that the probability pk of a false positive error, given that the tablealready contains k elements, is approximately

pk = 1− 22b

(Hm+1 −Hm−k) +2m + k(m− k)m2b(m− k + 1)

, (6.1)

where Hn =∑n

i=11i = ln n + γ + 1

2n −1

12n2 + O( 1n4 ) denotes a harmonic number.

Let ki be the number of states stored in the hash table after the algorithm has completelyexplored the nodes in level i. Then there were at most ki − 1 states in the hash tablewhen we tried to insert si. Hence, the probability Pmiss that no state on the solutionpath was omitted is bounded by

Pmiss ≥d∏

i=0

pki−1.

Page 89: Esto es una cagada

6.3. PARTIAL SEARCH 89

If the algorithm is run up to a maximum depth d, it can record the ki values online andreport this lower bound on the omission probability after termination.

To obtain an a priori estimate, knowledge of the depth of the search space and the distri-bution of the ki is required. For a coarse approximation, we assume that the table fillsup completely (m = n) and that half the states in the solution path experience an emptytable during insertion, while the other half experiences the table with only one emptyslot. This models (crudely) the typically bell-shaped state distribution over the levels0, . . . , d. Assuming further that the individual values in Equation 6.1 are close enoughto one to approximate the product by a sum, we obtain the approximation

Pmiss =12b

(lnn− 1.2).

Assuming, more conservatively, only one empty slot for all states on the solution pathwould increase this estimate by a factor of two.

6.3 Partial Search

During the study of approximate hash function, such as bit-state hashing, double bit-state hashing, and hash compact, we have seen that the sizes of the hash tables can beincreased considerably. This is paid by a small but affordable lack of search accuracy,so that some synonyms of states can no longer be disambiguated. As we have seen par-tial search is a compromise to the space requirements that full state storage algorithmshave and can be casted as a non-admissible simplification to traditional heuristic searchalgorithms. In the extreme case, partial search algorithms are not even complete, sincethey can miss an existing goal state due to wrong pruning. The probability can be re-duced either by enlarging the bits in the remaining fingerprint vector or by re-invokingthe algorithm with a different hash function.

6.4 Incremental Hashing

If the size of the state description becomes large, a full hash function, i.e. one that takesinto account every component of the state can considerably slow down the exploration.As a compromise, the hash function may only consider a subset of all state componentsand risk an increased number of hash collisions. This may be acceptable, if the full stateis stored in the hash table. In the worst case, there can be a large set of generated stateswith the same hash value and the comparison with newly generated states can makethe exploration even slower than using a full hash function.

Due to the large state spaces in program model checking, we want bitstate-hashing (cf6.2.1) to be an option for the exploration. Its use requires that the underlying functionproduces a minimum of hash collisions, since each collision can result in an unexploredsubtree that may contain the error state. To minimize the number of collisions, the hashfunction must consider all components of the state description.

Incremental hash functions provide both: fast computation over large state descriptionsand a good distribution over the set of possible hash values [EM04]. However, existing

Page 90: Esto es una cagada

90 CHAPTER 6. HASHING

applications of incremental hashing, such as Rabin-Karp Hashing (cf 6.5.1) merely relyon statically-sized structures to be hashed. In program model checking however, the sizeof the state description in inherently dynamic. Also, to best of the authors knowledge,there has been no previous application of incremental hashing to state space search.

We therefore devise an incremental hashing scheme for StEAM in two steps. First, weprovide a static framework to apply incremental hashing for arbitrary state explorationproblems that involve a state vector of static size. We exemplify these considerations onsome classical AI puzzle problems.

After this, we extend the framework to the incremental hashing of dynamic and struc-tured state vectors as they appear in StEAM. An implementation of the dynamic frame-work is evaluated in chapter 8.

6.5 Static Framework

6.5.1 Rabin and Karp Hashing

The hash computation proposed here is based on an extended version of the algorithmof Rabin and Karp.

For this case, the states in S are interpreted as bit strings and divided into blocks of bits.For example blocks of byte-size yield 256 different characters.

The idea originates in matching a text T [1..n] to a pattern M [1..m]. In the algorithmof Rabin and Karp [KR87], a pattern M is mapped to a number h(M), which fits intoa single memory cell and can be processed in constant time. For 1 ≤ j ≤ n − m + 1we will check if h(M) = h(T [j..j + m − 1]). Due to possible collisions, this is not asufficient but a necessary criterion for the match of M and T [j..j + m − 1]. A character-by-character comparison is performed only if h(M) 6= h(T [j..j + m − 1]). To computeh(T [j + 1..j + m]) incrementally in constant time, one takes value h(T [j..j + m− 1]) intoaccount, according to Horner’s rule for evaluating polynomials. This works as follows.Let q be a sufficiently large prime and q > m. We assume that numbers of size q · |Σ|fit into a memory cell, so that all operations can be performed with single precisionarithmetic. To ease notation, we identify characters in Σ with their order. The algorithmof Rabin and Karp as presented in Figure 6.2 performs the matching process.

The algorithm is correct by the following observation.

Theorem 6.2. At the start of the j-th iteration we have

tj =

m+j−1∑i=j

T [i]|Σ|m−i+j−1

mod q.

Page 91: Esto es una cagada

6.5. STATIC FRAMEWORK 91

Procedure Rabin-KarpInput: String T , pattern MOutput: Occurrence of M in T

p← t← 0; u← |Σ|m−1 mod qfor each i ∈ 1, . . . ,m p← (|Σ| · p + M [i]) mod qfor each i ∈ 1, . . . ,m t← (|Σ| · p + T [i]) mod qfor each j ∈ 1, . . . , n−m + 1

if (p = t)if (check (M,T [j..j + m− 1])) return j

if (j ≤ n−m) t← ((t− T [j] · u) · |Σ|+ T [j + m]) mod q

Figure 6.2: Algorithm of Rabin and Karp.

2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1\___ ___/ \___ ___/

\ / mod 13 \ /\ / \ /

8 9 3 11 0 1 7 8 4 5 10 11 7 9 11

Figure 6.3: Example of hashing for string matching.

Proof. Certainly, t1 =(∑m

i=1 T [i]|Σ|m−i)

mod q and inductively we have

tj = ((tj−1 − T [j − 1] · u) · |Σ|+ T [j + m− 1]) mod q

=

m+j−2∑i=j−1

T [i]|Σ|m−i+j−2

− T [j − 1] · u

· |Σ|+ T [j + m− 1]

mod q

=

m+j−1∑i=j

T [i]|Σ|m−i+j−1

mod q.

As an example take Σ = 0,...,9 and q = 13. Furthermore, let M = 31415 and T =2359023141526739921 . The application of the mapping h is illustrated in Figure 6.3.We see that h produces collisions. The incremental computation in the first case worksas follows.

14, 152 ≡ (31, 415− 3 · 10, 000) · 10 + 2 (mod 13)≡ (7− 3 · 3) · 10 + 2 (mod 13)≡ 8 (mod 13)

The computation of all hash addresses has a resulting running time of O(n + m), which

Page 92: Esto es una cagada

92 CHAPTER 6. HASHING

is also the best case overall running time. In the worst case, the matching is still of orderΩ(nm), as the example problem of searching M ∈ 0m in T ∈ 0n shows.

6.5.2 Recursive Hashing

Our approach relates to recursive hashing [Coh97]. Recursive hash functions consist ofa recursive function H and an address function A so that h(S) = A(H(S)). The idea oflinear recursion is that the contribution of one symbol to the hash value is independentof the contribution of the others. Let T be a mapping from Σ to R, where R is a ring,r ∈ R. Similar to the algorithm of Rabin and Karp we compute the value H of stringSi = (si, . . . , si+k−1) as H(S0) =

∑k−1j=0 T (sj) and

H(Si) = rH(Si−1) + T (si+k−1)− rnT (si−1)

for 1 ≤ i ≤ n. With this we have H(Sj) =∑k−1

i=0 rk−i+jT (si+j).

The prime-division method takes R = ZZ and assigns h(Si) = H(Si) mod q. Here q ischosen as a suitable prime to obtain a good distribution. For faster computation thetwo-power division method can be an option. The computation ignores higher order bits,such that the hash address can be computed by word-level bit-operations.

As a compromise between the bad distribution of two-power division and the bad runtime behavior of the prime-division method is polynomial hashing. One can choose ringR = GF (2)[x]/(xw − 1), where the GF (2) is the Galois-field of characteristic 2. The finitefield GF (2) consists of elements 0 and 1 which satisfy the following addition 0 + 0 = 0,0 + 1 = 1, 1 + 0 = 1, and 1 + 1 = 0, and multiplication 0 × 0 = 0, 0 × 1 = 0, 1 × 0 = 0,and 1 × 1 = 1. GF (2)[x] is the polynomial ring with coefficients in GF (2) over [x]. Apolynomial is represented as a tuple of coefficients. A multiplication with x is a cyclic bitshift, since xw − 1 is equal to the zero-polynomial so that for q(x) =

∑w−1i=0 qix

i in R wecompute

xq(x) =w−1∑i=0

qixi+1 ≡ qw−1 +

w−2∑i=0

qixi+1

Instead of GF (2)[x]/(xw+1) one may choose ring GF (2)[x]/p(x) with p(x) = xw+∑w−1

i=0 pixi

being an arbitrary polynomial. In GF (2)[x]/p(x) we have p(x) ≡ 0 and for q =∑w−1

i=0 qixi

we have

xq(x) = ∑w−1

i=1 qi−1xi, if qw−1 = 0

p0 +∑w−1

i=1 (qi−1 + pi)xi, if qw−1 = 1

Although some model checkers like SPIN support recursive and polynomial hashing,they still process the entire state vector and are not truly incremental.

6.5.3 Static Incremental Hashing

We regard state vectors with component domains in Σ = 0, . . . , l − 1. In other words,states are strings over the alphabet Σ. This notational simplification is not a restriction,

Page 93: Esto es una cagada

6.5. STATIC FRAMEWORK 93

since the approach generalizes to vectors v = (v1, . . . , vk) with vi ∈ Di, |Di| < ∞, i ∈1, . . . , k. Moreover, for practical software model checking we might expect the statevector to be present in form of a byte array.

A static vector is a vector with fixed length. It may for example represent the dataarea where the values of global program variables are stored. In the simplest case,only the value of one component is changed. This will be very common in software modelchecking, since a single statement like var =< expression > will only change the contentof the memory cell associated to to var. Let i ∈ 1, . . . , k be the position of the changedcomponent. For a state vector v = (v1, . . . , vn) and its successor state v′ = (v′1, . . . , v

′n)

with v′i = vi, i 6= j, we calculate

h(v′) =k∑

j=1

vj · |Σ|j − vi · |Σ|i + v′i · |Σ|i mod q

= h(v)− vi · |Σ|i + v′i · |Σ|i mod q.

This implies, that h(v′) can be computed in constant time. We can generalize the above tothe case, where more than one component is changed by the state transition. This occurs,for instance, if several instructions are declared to form an atomic block. These blocksare often used in concurrent programs to prevent a thread switch in critical sections ofthe code. Let I(v, v′) = i | vi 6= v′i be the set of indices of all modified components, then

h(v′) =n∑

i=1

vi · |Σ|i −∑

i∈I(v,v′)

vi · |Σ|i +∑

i∈I(v,v′)

v′i · |Σ|i mod q

= h(v)−∑

i∈I(v,v′)

vi · |Σ|i +∑

i∈I(v,v′)

v′i · |Σ|i mod q.

Choosing a prime for q is known to provide a good distribution over the set of possi-ble hash codes [Knu98]. This property is also exploited by e.g. the Lehmer-generator[Leh49] to generate pseudo random numbers.

Theorem 6.3. Computing the hash value of h(v′) given h(v) is available in time

• O(|I(v, v′)|); using O(k) extra space, where I(v, v′) is the set of indices that change

• O(1); using O((k · |Σ|)Imax) extra space, where Imax = max(v,v′) |I(v, v′)|.

Proof. In the first case, we store |Σ|i mod q for all 1 ≤ i ≤ k. In the second case, weprecompute ∑

j∈I(v,v′)

−vj |Σ|j + v′j |Σ|j mod q

for all possible transitions (v, v′), of which there are at most(

kImax

)·|Σ|Imax = O((k·|Σ|)Imax)

different ones.

A good hash function should minimize the number of address collisions. Given a hashtable of size m and the sequence k1, . . . , kn of keys to be inserted, we can define Xij =

Page 94: Esto es una cagada

94 CHAPTER 6. HASHING

1, if h(ki) = h(kj), and 0, otherwise. Then X =∑

i<j Xij is the sum of collisions. Assum-ing a random hash function with uniform distribution, we have

E(X) = E

∑i<j

Xij

=∑i<j

E(Xij) =∑i<j

1m

=(

n

2

)· 1m

.

We empirically tested the distribution of incremental hashing on a vector of size 100 andcompared the results to randomly generated numbers with m = 10000019 (the smallestprime greater than 10000000) and n = 1000000. For randomly generated numbers, wegot 48,407 collisions, while incremental hashing gave 50,304, an insignificant increase.

6.5.4 Abstraction

Incremental hashing is of particular interest in state space searches, which use abstrac-tions to derive a heuristic estimate, because several hash functions are involved - onefor the original state space and one for each abstraction.

State space abstractions φ map state vectors v = (v1, . . . , vk) to abstract state vectorsφ(v) = (φ(v1), . . . , φ(vk)). This includes

• data abstraction [CGP99], which exploits the fact that specifications for softwaremodels usually consider fairly simple relationships among the data values in thesystem. In such cases, one can map the domain of the actual data values into asmaller domain of abstract data values. This induces a mapping of the systemstates, which in turn induces an abstract system.

• predicate abstraction [GS97] , where the concrete states of a system are mappedto abstract states according to their evaluation under a finite set of predicates.Automatic predicate abstraction approaches have been designed and implementedfor finite and infinite state systems.

Both data and predicate abstraction induce abstract systems that simulate the originalone. Every behavior in the original system is also present in the abstract one.

Pattern databases [CS98, ELL04, QN04] are hash tables for fully explored abstract statespaces, storing with each abstract state the shortest path distance in the abstract spaceto the abstract goal. They are constructed in a complete traversal of the inverse abstractsearch space graph. Each distance value stored in the hash table is a lower bound on thesolution cost in original space and serves as a heuristic estimate. Different abstractiondatabases can be combined either by adding or maximizing the individual entries for astate. As abstraction databases are themselves hash tables, incremental hashing canalso be applied.

Theorem 6.4. Let φi be a mapping v = (v1, . . . , vk) to φi(v) = (φi(v1), . . . , φi(vk)), 1 ≤ i ≤ l.Combined incremental state and abstraction state vector hashing of state vector v′ withrespect to its predecessor v is available in time

O

(|I(φ(v, v′))|+

l∑i=1

|I(φi(v), φi(v′))|

)

Page 95: Esto es una cagada

6.6. EXAMPLES FOR STATIC INCREMENTAL HASHING 95

using O(kl) extra space, where I(v, v′) is the set of indices that change in originalspace, and set I(φi(v), φi(v′)) denotes the set of affected indices in database i

• O(l); using O(l · (k · |Σ|)Imax) extra space.

Proof. Let h(v) =∑k

i=1 vi|Σ|i mod q be the hash function for vector v in original spaceand hi(φi(v)) =

∑kj=1 φi(vj)φi(|Σ|j) mod q be the i-th hash function for addressing the ab-

stract state φi(v), 1 ≤ i ≤ l, with φi mapping v = (v1, . . . , vk) to φi(v) = (φi(v1), . . . , φi(vk)).In the first case, we store φi(|Σ|j) mod q, for 1 ≤ i ≤ l and 1 ≤ j ≤ k. In the second case,for all possible transitions (v, v′) we precompute∑

j∈I(v,v′)

−vj |Σ|j + v′j |Σ|j mod q

and ∑j∈I(φi(v,v′))

−φi(vj)φi(|Σ|j) + φi(v′j)φi(|Σ|j), for i ∈ 1, . . . , l.

6.6 Examples for Static Incremental Hashing

For illustration purposes, we exemplify incremental hashing on static state vectors, be-fore proceeding to considerations about dynamic and distributed state vectors.

6.6.1 Incremental Hashing in the (n2 − 1)-Puzzle

Our first example is the (n2 − 1)-Puzzle presented in Chapter 5. We extend a solverfor n2 − 1 puzzles that uses IDA* [Kor85] with the Manhattan distance estimate. LetΣ = 0, . . . , 15. The natural vector representation for state u is (t0, . . . , t15) ∈ Σ16, whereti = l means that the tile labeled with l is located at position i, and l = 0 is the blank. Thehash value of u is h(u) = (

∑15i=0 ti ·16i) mod q. Let state u′ with representation (t′0, . . . , t

′15)

be a successor of u. We know that there is only one transposition in the vectors t and t′.Let j be the position of the blank in S and k be the position of the blank in v. We havet′j = tk, t′k = 0, and for all 1 ≤ i ≤ 16, with i 6= j, i 6= k it holds that t′i = ti. Therefore,

h(u′) =

((15∑i=0

ti · 16i

)− tj · 16j + t′j · 16j − tk · 16k + t′k · 16k

)mod q

=

(((15∑i=0

ti · 16i

)mod q

)− 0 · 16j + t′j · 16j − tk · 16k + 0 · 16k

)mod q

=(h(u) + (t′j · 16j) mod q − (tk · 16k) mod q

)mod q.

To save time, we precompute (k · 16l) mod q for each k and l in 0, . . . , 15. If we store(k ·16j) mod q−(k ·16l) mod q) for each value of j, k, and l, we can save one more addition.

Page 96: Esto es una cagada

96 CHAPTER 6. HASHING

H

O

H

H O H

Figure 6.4: One level of Atomix. The depicted problem in can be solved with 13 moves,where the atoms are numbered left-to-right in the molecule: 1 down left, 3 left downright up right down left down right, 2 down, 1 right.

As h(u) ∈ [0..q−1] and ((k·16j) mod q−(k·16k) mod q) ∈ [0..q−1] we may further substitutethe last mod q operation, by an addition or a subtraction operation.

The savings are larger, when the state description vector grows. For the (n2 − 1)-Puzzlenon-incremental hashing results in Ω(n2) time, while in incremental hashing the effortsremain constant.

The (n2 − 1)-Puzzle has underwent different designs of heuristic functions. One lowerbound estimate is the Manhattan distance (MD). For every two states S and S′, it isdefined as sum of the vertical and horizontal distances for each tile. MD is consistent,since the differences in heuristic values between two states S and S′ are 1, such that|h(S′)− h(S)| = 1 and h(S)− h(S′) + 1 ≥ 0. The heuristic can be computed incrementallyin O(1) time, given a two-dimensional table, which is addressed by the label, the currentdirection and the position of the tile that is being moved.

6.6.2 Incremental Hashing in Atomix

The goal of Atomix [HEFN01] (cf. Figure 6.4) is to assemble a given molecule from atoms.The player can select an atom at a time and push it towards one of the four directionsleft, right, up, and down; it will keep on moving until it hits an obstacle or another atom.The problem is solved when the atoms form the same constellation (the “molecule”) asdepicted beside the board. Atomix has been proven to be PSPACE complete [HS01].Note, that (n2 − 1)-Puzzle solving is NP complete, while Sokoban and STRIPS planning(cf 6.6.3) are PSPACE complete. A concrete Atomix problem, given by the original atompositions and the goal molecule, is called a level of Atomix.

For solving Atomix problems we use IDA* search, which compared to A* considerably re-duces the memory requirements, while preserving optimality if the underlying heuristicis admissible. The number of duplicates in the search is larger than in regular domainslike the (n2 − 1)-Puzzle, so state storage will be essential. The design of appropriatestate abstractions for abstraction database storage are not trivial. The problem is thatby stopper atoms, sub-patterns of atoms can be unsolvable while the original problem isnot.

Page 97: Esto es una cagada

6.6. EXAMPLES FOR STATIC INCREMENTAL HASHING 97

For IDA* exploration Atomix has a global state representation that contains both theboard layout in form of a two dimensional table and the coordinates for the atoms in formof a simple vector. This eases many computations during successor generation. After arecursive call the changes due to a move operation are withdrawn in a backtrack fashion.Consequently – disregarding the computation time for computing the heuristic estimate,the hashing efforts (computation of the function, state comparisons and copyings), andfinding the correct location for an atom in a given move direction – successor generationin Atomix is a constant time operation. We will see that the three remaining aspects canall be implemented incrementally.

Incremental Heuristic A heuristic for Atomix can be devised by examining a modelwith relaxed restrictions. We drop the condition that an atom slides as far as possible:it may stop at any closer position. These moves are called generalized moves. Note thatthe variant of Atomix which uses generalized moves has an undirected search graph.Atomix with generalized moves on an n × n board is also NP-hard. In order to obtainan easily computable heuristic, we also allow that an atom to slide through other atomsor share a place with another atom. The goal distance in this model can be summed upfor all atoms to yield an admissible heuristic for the original problem: The heuristic isconsistent, since the h-values of child states can differ from that of the parent state by 0,+1 or −1.

Since each atom attributes one number to an overall sum it is easy to see that the heuris-tic estimate can be computed incrementally in constant time by subtracting the valuefor the currently moving atom from its start location and by adding the value for its finaldestination. We only need to precompute a distance table for each atom.

Incremental Successor Generation For move execution, the implementation, westarted with, looked at the set of adjacent squares into the direction of a move unlessan obstacle is hit. By the distance of a move between the start and the target location,this operation is not of constant time. Therefore, we looked for alternatives. The idea tomaintain a doubly linked neighbor graph in x and y direction fails on the first attempt.The neighbor graph would include atoms as well as surrounding walls. When moving anatom, say vertically, at the source location, the update of the link structure turns out tobe simple, but when we place it at its target location, we have to determine the positionin the linked list for the orthogonal direction. In both cases we are left with the problemto determine the maximal j in a list of numbers that is smaller than a given i. With anarbitrary large set of numbers, the problem is not trivial.

However, we can take advantage that the number of atoms and walls in a given row orcolumn are bounded by a small value (15 in our case). This yields the option to encode allpossible layouts in a row or column as a bit-string with integer values in 0, . . . , 215− 1.For each of the query positions i ∈ 0, . . . , 15 we store a table Mi of size 215, with entryMi[b] denoting the highest bit j < i and j ≥ 0 in b that is smaller than i. Each tableMi consumes 215 byte or 32 KByte. These tables can be used to determine in constanttime the stopping position of an atom that is pushed to the right or downwards startingin row or column i. Analogously, we can build a second set of tables for left and upwardpushes. Together, the tables require 215 · 15 · 2 bytes (≈1MB).

Page 98: Esto es una cagada

98 CHAPTER 6. HASHING

Incremental Hashing A move in Atomix will merely change the position of one atomon the board. Hence a state for a given Atomix level is sufficiently described by thepositions (p1, . . . , pm) of atoms on the board. We exploit the fact, that only one atom ischanged to calculate the hash value of a state incrementally. This constitutes a specialcase of the concept described in section 6.4. Let s = (p1, . . . , pm) be a state for an Atomixlevel. We define its hash value as h(s) =

(∑mi=1 pi · 152i

)mod q. The given Atomix solver -

Atomixer [HEFN01] - uses a hash table of fixed size ts (40MB by default). We adjust thisvalue to the smallest prime that is greater or equal ts and use it as our q to get a betterdistribution for our hash function [Knu98]. The full calculation of the hash code is onlyneeded for the initial state. Let s′ be an immediate successor state which differs from itspredecessor s only in the position of atom i. Then we have

h(s′) = ((h(s)− ((pi · 152i) mod q) mod q)− ((p′i · 152i) mod q))mod q

We can use a pre-calculated table t with 152 ·m entries which, for each i ∈ 1, . . . , 152and each j ∈ 1, . . . ,m, stores ti,j = (i · 152j) mod q. We can use t to substitute pi · 152i

with tpi,i and p′i · 152i with tp′i,i. Furthermore, we know that 0 ≤ h(s) < q and apparently0 ≤ amod q < q for any a. This implies, that each remaining expression taken mod qholds a value between −q + 1 and 2q − 2. As a consequence, each sub-term amod q canbe substituted by a− q if a ≥ q, q + a if a < 0 or a if 0 ≤ a < q. Now we can calculate h(s′)incrementally by the following sequence of program statements:

1. if(tpi,i > h(s)) h(s′)← q + h(s)− tpi,i

2. else h(s′)← h(s)− tpi,i

3. h(s′)← h(s′) + tp′i,i

4. if(h(s′) ≥ q) h(s′)← h(s′)− q

This way, we avoid the computationally expensive modulo operators.

6.6.3 Incremental Hashing in Propositional Planning

It is not difficult to devise an incremental hashing scheme for STRIPS planning (cf. 5)that bases on the idea of the algorithm of Rabin and Karp. For S ⊆ AP we may startwith h(S) = (

∑pi∈S 2i) mod q. The hash value of S′ = (S \D) ∪A is

h(S′) =

∑pi∈(S\D)∪A

2i

mod q

=

∑pi∈S

2i −∑pi∈D

2i +∑pi∈A

2i

mod q

=

∑pi∈S

2i

mod q −

∑pi∈D

2i

mod q +

∑pi∈A

2i

mod q

mod q

=

h(S)−∑pi∈D

2i +∑pi∈A

2i

mod q.

Page 99: Esto es una cagada

6.7. HASHING DYNAMIC STATE VECTORS 99

Since 2i mod q can be pre-computed for all pi ∈ AP , we have a running time thatis of order O(|A| + |D|), which is constant for most STRIPS planning problems. Itis also possible to achieve constant time complexity if we store the values inc(o) =(∑

pi∈A 2i) mod q − (∑

pi∈D 2i) mod q together with each operator. Either complexityis small, when compared to ordinary hashing of the planning state.

Pattern Database Search

Abstraction functions φ map states S = (S1, . . . , Sk) to patterns φ(S) = (φ(S1), . . . , φ(Sk)).Pattern databases [CS98] are hash tables for fully explored abstract state spaces, storingwith each abstract state the shortest path distance in the abstract space to the abstractgoal. They are constructed in a complete traversal of the inverse abstract search spacegraph. Each distance value stored in the hash table is a lower bound on the solution costin original space and serves as a heuristic estimate. Different pattern databases can becombined either by adding or maximizing the individual entries for a state.

Pattern databases work, if the abstraction function is a homomorphism, so that eachpath in the original state space has a corresponding one in the abstract state space. Indifference to the search in original space, the entire abstract space has to be looked at.As pattern databases are themselves hash tables we apply incremental hashing, too.

If we restrict the exploration in STRIPS planning to some certain subset of propositionsR ⊆ AP , we generate a planning state space homomorphism φ and an abstract planningstate space [Ede01] with states SA ⊆ R. Abstractions of operators o = (P,A, D) are de-fined as φ(o) = (P ∩ R, A ∩ R,D ∩ R). Multiple pattern databases are composed basedon a partition AP = R1 ∪ . . . ∪ Rl and induce abstractions φ1, . . . , φl as well as lookuphash tables PDB1,. . . ,PDBl. Two pattern databases are additive, if the sum of the re-trieved values is admissible. One sufficient criterion is the following. For every pair ofnon-trivial operators o1 and o2 in the abstract spaces according to φ1 and φ2, we havethat preimage φ−1

1 (o1) differs from φ−12 (o2). For pattern database addressing we use a

multivariate variable encoding, namely, SAS+ [Hel04].

6.7 Hashing Dynamic State Vectors

In the previous section, we devised an incremental hashing scheme for static state vec-tors. This is not directly applicable for program model checkers, as they operate on dy-namic and structured states. Dynamic means, that the size of a vector may change. Forexample, a program can dynamically allocate new memory regions. Structured means,that the state is separated in several subvectors rather than a single big vector. InStEAM for example, the stacks, machines, variable sections and the lock/memory poolsconstitute subvectors which together form a global state vector. In the following, we ex-tend the incremental hashing scheme from the last section to be applicable for dynamicand distributed states.

For dynamic vectors, components may be inserted at arbitrary positions.

We will regard dynamic vectors as the equivalent of strings over an alphabet Σ. Inthe following, for two vectors a and b, let a, b denote the concatenation of a and b. For

Page 100: Esto es una cagada

100 CHAPTER 6. HASHING

example, for a = (0, 8) and b = (15), we define a, b = (0, 8, 15).

We define four general lemmas for the hash function h as used in Rabin-Karp hashing (cf.Section 6.5.1). Lemmas 1 and 2 relate to the insertion-, lemmas 3 and 4 to the deletionof components. Afterwards, we apply the lemmas to different types of data structures,such as stacks and queues. We use |a| to denote the size of a vector a.

Lemma 1. For all a, b, c ∈ Σ∗ we have h(a, b, c) = h(a, c)− h(c) · |Σ||a| + h(b) · |Σ||a| + h(c) ·|Σ||a|+|b| mod q.

Proof:

h(a, c) =|a|∑i=1

ai · |Σ|i +|c|∑

i=1

ci · |Σ|i+|a| mod q = h(a) + h(c) · |Σ||a| mod q.

Analogously, we infer the following result.

Lemma 2. For all a, b, c ∈ Σ∗ we have h(a, b, c) = (h(a, c) − h(a)) · |Σ||b| + h(a) + h(b) ·|Σ||a| mod q.

Next, we need to address the removal of components from the vector. Lemma 3 and 4show a way to incrementally compute the hash value of the resulting vector.

Lemma 3. For all a, b, c ∈ Σ∗ we have h(a, c) = h(a, b, c)− h(b, c) · |Σ||a| + h(c)/|Σ||b| mod q.

Lemma 4. For all a, b, c ∈ Σ∗ we have h(a, c) = (h(a, b, c)− h(a, b))/|Σ||b| + h(a) mod q.

Lemmas 3 and 4 require the multiplicative inverse of |Σ||b| modulo q. As explorationalgorithms often vary the size of their hash table and hence the value of q, the compu-tation of the multiplicative inverse must be performed on run time in an efficient way.This can be achieved using an extended version of the Euclidean Algorithm [Sch98] forcalculating the greatest common divisor (gcd) of two natural numbers a and b for a > b.It computes (d, x, y), where d is the greatest common divisor of a and b, and d = ax + by.

It is a known fact [Sch98], that (Z∗q , ·mod q) forms a group if and only if Z∗

q = a ∈1, . . . , q − 1 | gcd(a, q) = 1. If q is a prime, the latter holds for Z∗

q = 1, . . . , q − 1.Hence, by calculating ExtendedEuclid(a, q), we get a triple (d, x, y), with d = gcd(a, q) =1 = ax + qy. This implies axmod q = 1, which means that x is the multiplicative inverseof a.

6.7.1 Stacks and Queues

If the dynamic vector represents a stack- or queue-structure, components are added orremoved only at the beginning and the end. For all possible cases, the resulting hashaddress can be computed incrementally in constant time:

h(v1, . . . , vn, vn+1) = h(v) + vn+1 · |Σ|n+1 mod q

h(v1, . . . , vn−1) = h(v)− vn · |Σ|n mod q

h(v0, . . . , vn) = v0 · |Σ|+ (h(v) · |Σ|) mod q

h(v2, . . . , vn) =h(v)− v1 · |Σ|

|Σ|mod q =

h(v)|Σ|− v1 mod q

Page 101: Esto es una cagada

6.7. HASHING DYNAMIC STATE VECTORS 101

The first and second equation refer to insertions and deletions at the end of the vector.The third and fourth equation, address the insertions and deletions at the beginning.

6.7.2 Component Insertion and Removal

If components can be inserted or deleted at an arbitrary position of the vector, a constant-time computation of the hash code is no longer possible. Let the i-th prefix pi and the i-thsuffix si of a hash code on a vector v = (v1, . . . , vn) be defined as pi(v) =

∑ij=1 vj |Σ|j mod q

and si(v) =∑n

j=i vj |Σ|j mod q.

For insertion of a (single) component, we have two alternatives by either computing theprefix or suffix up to the respective position in the vector (Lemma applied in brackets):

h(v1, .., vi, w, vi+1, . . . , vn)(1)= h(v)− si(v) + w · |Σ|i+1 + si(v) · |Σ|mod q

h(v1, .., vi, w, vi+1, . . . , vn)(2)= (h(v)− pi(v)) · |Σ|+ w · |Σ|i+1 + pi(v) mod q

Analogously, for the removal of a component we have two alternatives:

h(v1, . . . , vi−1, vi+1, . . . , vn)(3)= h(v)− si+1(v)− vi · |Σ|i +

si+1(v)|Σ|

mod q

h(v1, . . . , vi−1, vi+1, . . . , vn)(4)=

h(v)− pi−1(v)− vi · |Σ|i

|Σ|+ pi−1(v) mod q

Clearly, the proposed method will not change the asymptotic runtime of O(n) comparedto the non-incremental hashing. However, we need to access only mini, n − i + 1 com-ponents, which means bn/2c components in the worst case.

6.7.3 Combined Operations

So far we have only discussed the cases, where either an arbitrary number of componentschange their value or a single element is being inserted or removed. For completeness,we want to cover also those cases where an arbitrary number of value changes as wellas insertions/deletions are applied to the state vector. One solution is to apply the givenincremental hash computations consecutively for each set of value changes and eachinsertion/removal of a component. However, we know that for the removal of one com-ponent in a dynamic vector, we must already access bn/2c components in the worst case.If we have to perform several such computations, the time needed for the incrementalhash computation will quickly exceed that of a non-incremental hash function. A com-promise is to unify all changes to the vector into a single sequence of components suchthat v′ = av′1, . . . , v

′mb, where a, b are either empty sequences or a = v1, . . . , vi is a prefix

and b = vk, ..vn is a suffix of v. Then,

h(v′) = h(v)− si+1(v) + sk(v) · |Σ|m +m∑

j=1

v′j · |Σ|j+i mod q

= (h(v)− pi(v)) · |Σ|m + pi(v) +m∑

j=1

v′j · |Σ|i+j mod q.

Page 102: Esto es una cagada

102 CHAPTER 6. HASHING

This means, for the general case we need to access mini+m,n− i+m elements, whichin the worst case equals the length of v′ i.e. m. We can expect an improvement if m issmall and i or n− i is small - i.e. if v′ differs from v only in a small sequence close to oneside of the vector.

6.8 Hashing Structured State Vectors

In the preceding sections we have thoroughly addressed the application of incrementalhashing in domains, whose state can be described by a single vector of constant size.Based on the previous results, we now devise a incremental hashing scheme for dynamicstructured state vectors like those in StEAM.

A structured state vector u can be seen as the concatenation v1, . . . , vm of subvectorsvi = (vi,1, . . . , vi,ni) for i ∈ 1, . . . ,m. Such a partitioning of v can either be chosen freelyor - more naturally - it originates from the underlying state space that is explored. Forexample, the state of a computer program consists of static components, such as theglobal variables, as well as dynamic components, like the program stack and the pool ofdynamically allocated memory.

6.8.1 The Linear Method

Given an order on the subvectors, we can hash each subvector individually and integratethe results to return the same global hash code as if we hashed the global vector directly.A possible hash function on v = v1, . . . , vm is

h(v) =m∑

i=1

ni∑j=1

vi,j · |Σ|Σi−1k=1nk+j mod q.

The runtime needed by a naive incremental algorithm to combine the hash codes of thesubvectors into a global hash code varies depending on the changes that were made bythe state transition.

A program step - be it a single line of code or some atomic block - usually changes valueswithin a small subset of all allocated memory regions. Hence, we can expect the averageruntime to be close to the best case (one affected component) rather than to the worstcase (all components are affected). A weakness of the linear method lies in the timecomplexity when a subvector is inserted or removed into/from the global state vector. Inanalogy to Section 6.7.2, the worst case runtime is O(m), even for a single insertion orremoval. The method is still effective, if instructions that (de-)allocate memory regionsare uncommon compared to instructions that merely change the contents of memorycells.

6.8.2 Balanced Tree Method

By introducing an additional data structure, we can reduce the asymptotic runtime toO(log m) in each case. For this purpose, we use a balanced e.g. AVL [AVL62] tree repre-

Page 103: Esto es una cagada

6.8. HASHING STRUCTURED STATE VECTORS 103

Figure 6.5: Example tree for calculating the hash code of a structured state vector

sentation t with m inner nodes - one for each subvector vi, i ∈ 1, . . . ,m.We define the hash function h′ for node N in t as follows: If N is a leaf then h′(N) = 0.Otherwise,

h′(N) = h′(Nl) + h(v(N)) · |Σ||Nl| + h′(Nr) · |Σ||Nl|+|v(N)| mod q.

Here |N | denotes the accumulated length of the vectors in a subtree with root N , andv(N) stands for the subvector associated to N , while Nl, Nr denote the left and rightsubtrees of N respectively. If R is the root of t, then h′(R) gives the same hash value ash applied to the concatenation of all subvectors in order. Figure 6.5 shows an examplefor a set of four vectors over Σ = 1, 2, 3 and with q = 17. The nodes in the tree areannotated with their h′ values, and are calculated as follows:

We obtain the same result (namely 11), for h(1, 2, 3, 1, 2, 2, 1, 2) in the linear method.

When the values in a subvector vi are changed, we must first re-calculate h(vi). Then theresult must be propagated bottom-up to the root node. Because the depth of a balancedtree is bounded logarithmically in the number of nodes m, we traverse at most O(log m)nodes until we reach the root node of the tree. Furthermore, insertion and deletion ofan element in/from a balanced tree can be performed in O(log m) time by restoring thebalanced tree structure using rotations (either left or right). Updating the hash offsetsfor the two nodes on which a rotation is executed, is available in constant time. Wesummarize the observations in the following result.

Theorem 6.5. Dynamic incremental hashing of a structured vector v′ = (v1, . . . , vm) withrespect to its predecessor v = (v1, . . . , vm) assuming a modification in component i (up-date within the component, insertion and deletion of the component), i ∈ 1, . . . ,m withI(vi, v

′i) being the set of indices that change in component i and Ii

max = max(v,v′) |I(vi, v′i)|

is available in time

• O(|I(vi, v′i)|) for update, O(m + log ni) for insertion, and O(m) for deletion using

O(m + maxmi=1 ni) extra space.

Page 104: Esto es una cagada

104 CHAPTER 6. HASHING

• O(1) for update, O(m + log ni) for insertion, and O(m) for deletion using O(m +∑mi=1(ni|Σ|)Ii

max) extra space.

• O(|I(vi, v′i)| log m) for update, O(log m + log ni) for insertion, and O(m) for deletion

using O(m + maxmi=1 ni) extra space.

• O(log m) for update and O(log m + log ni) for insertion and O(log m) for deletion,using O(m +

∑mi=1(ni|Σ|)Ii

max) extra space.

Proof: For the linear method (first case) we have

h(v) =n1∑

j=1

vi,j · |Σ|j +n2∑

j=1

v2,j · |Σ|n1+j + . . . +nk∑j=1

v2,m · |Σ|Σm−1k=1 nk+j mod q

= h(v1) + h(v2) · |Σ|n1 + . . . + h(vm) · |Σ|Σm−1k=1 nk mod q,

so that we can reduce the update to the static setting and multiply |Σ|Σi−1k=1nk mod q with

the affected component. To update these additionally stored offset, we maintain a list|Σ|nj mod q, j ∈ 1, . . . ,m. For insertion, |Σ|ni mod q has to be newly computed. Usingfast exponentiation, this value can be determined in time O(log ni). Both tables consumeO(m) space in total. Insertion and deletion of a vector component additionally requireO(m) restructuring operations within the vector in the worst case.

For the balanced tree method (third case), insertion and deletion are available in timeO(log m), while the update requires O(|I(vi, v

′i)| log m) time. Once more, for inserting we

additionally require O(log ni) time for computing |Σ|ni mod q from scratch. As a binarytree consumes space linear in the number of leaves, the space requirements are boundedby O(m +

∑mi=1 ni).

As in the static case, improving factor |I(vi, v′i)| to O(1) (second and forth case) is avail-

able by precomputing∑

j∈I(vi,v′i)(−vi,j |Σ|j +v′i,j |Σ|j) mod q for all possible transitions from

vi to v′i and i ∈ 1, . . . ,m.

6.9 Incremental Hashing in StEAM

With the insights gained about the hashing of static as well as dynamic structuredvectors, we have enough information to devise an incremental hash function for modelchecking C++ programs. A first implementation is available as an option in StEAM. Theuser should be given the choice whether to use partial, full or incremental hashing aseach may be most appropriate depending on the checked program. If states are hashedexplicitly, there is no risk that hash collisions may render the search incomplete and par-tial hashing may be preferable as resolving hash collisions is often faster than hashingthe entire state (even if this were done incrementally). However, when bitstate-hashingor hash-compaction is used, hashing the full state vector is mandatory. Generally theincremental computation of the full hash function should be preferable, as we can ex-pect it to be faster for most programs. In rare cases the state transitions of a programmay affect a large part of the allocated memory. Due to the computational overhead ofincremental hashing, it may be faster to use the full non-incremental hash computation

Page 105: Esto es una cagada

6.10. RELATED WORK 105

in these cases. Experimental results on partial, full and incremental hashing follow inChapter 8.

6.10 Related Work

6.10.1 Zobrist Keys

An early and frequently cited method for hashing are Zobirst Keys [Zob70]. Here, fora hash function h : Dn → E, |E| >> |D|, each value from domain d ∈ D is associatedwith a random number r(d) ∈ E. The hash code of v = (v1, . . . , vn) ∈ Dn is computedthrough the XOR operator as

⊕ni=1 r(vi). Since the XOR operator is both, commutative

and its own inverse, hash codes can be computed incrementally: For v′ = (v′1, . . . , v′n),

v′j = vj (j 6= i), it holds that h(v′) = h(v) ⊕ r(vi) ⊕ r(v′i). Zobrist keys are hence oftencited in conjunction with board games, such as Chess [Mar91] or Go [Lan96]. As a clearadvantage of the method, it is computationally inexpensive as it relies completely onXOR - in contrast to, e.g., Rabin-Karp hashing which requires multiplication and modulooperators. Through the commutativity of XOR however, all permutation of a sequenceof values yield the same hash code, which is not the case for Rabin-Karp hashing.

6.10.2 Incremental Heap Canonicalization

In [MD05] the authors address incremental heap canonicalization (HC) in explicit statemodel checking. HC serves the purpose of obtaining a canonical representation (CR) forthe heap of allocated memory regions, such that states which differ only in the addressesof the heap objects are detected as duplicates. R. Josif [Ios01] previously proposed an al-gorithm for HC, which concatenates the heap objects according to their depth-first orderin the heap. The authors of the paper indicate two shortcomings of Iosifs algorithm:first, for each new state, the HC requires a full traversal of the heap, yielding a runtimelinear in the number of heap objects. Second, according to the authors, small changesin the heap lead to major changes in the CR - i.e. the positions of many heap objectschange. This can lead to a significant loss of speed in the exploration, if incrementalhashing is used. In the paper, incremental hashing is described as maintaining separatehash values for each heap object, which get combined to a global hash value for the en-tire heap. After a state transition, the hash value is only recalculated for those objectsthat were either changed by the transition, or whose positions in the CR have changed.Since most transitions only change a small fraction of all heap objects, a heap HC whichmaintains the positions of most objects in the CR is advantageous. The paper devisesan incremental HC, which solves the two shortcomings of Iosif ’s algorithm. First, theposition for an object o in the CR is determined through the breadth-first access chainin the heap graph and the length of o. The breadth-first access chain is defined as thelist of offset labels on the path edges. The authors devise a recursive relocation func-tion, which computes an object’s position in the CR according to its BFS access chainand length. The approach is implemented in the tools CMC and Zing. Experimentalresults for three example programs indicate a significant speedup of the exploration. Incontrast to the work presented in this thesis, the paper misses to discuss the option of

Page 106: Esto es una cagada

106 CHAPTER 6. HASHING

hashing only parts of the state, since when states are explicitly stored in the hash table,the exploration can be faster if the state is only partially hashed. Also, approximatehashing, such as bitstate-hashing is not mentioned although these techniques motivateincremental hashing in the first place through the need for a maximum degree of dis-tribution. Also, the approach does not consider that for a state transition its is not onlylikely that a small subset of all heap objects are changed, but also that there are onlysmall changes within those objects. The work in [MD05] requires, that the partial hashvalue of an object is fully recomputed, if it’s contents have changed. This is surprising,since the hash function defined in the paper would also support incremental hashingwithin the single objects.

Page 107: Esto es una cagada

Chapter 7

State Reconstruction

The main problem that program model checkers face are high memory requirements,since for a complete enumeration the entire system state description has to be stored toallow pruning in case a state is visited twice. The storage problem as a consequence ofthe state explosion problem, is apparent in many enumeration attempts for model check-ing. The list of techniques is already long (cf. [CGP99]): partial-order reduction prunesthe state space based on stuttering equivalence relations tracking commuting state tran-sitions, symmetry detection that exploits problem regularities on the state vector, binarystate encodings allows to represent larger sets of states, while performing a symbolic ex-ploration with SAT formulae or BDDs, abstraction methods analyze smaller state spacesinferred by suitable approximation functions, and bit-state hashing and hash compactioncompress the state vector down to a few bits, while failing to disambiguate some of thesynonyms.

Here we add a new option to this list, which will cooperate positively with explicit-stateheuristic search enumeration. In state reconstruction for each state encountered eitheras a state to be expanded or as a synonym in the hash table, we reconstruct the corre-sponding state description via its generating path. In a first phase predecessor links areused to reconstruct the path, while in the second phase the path is simulated to recon-struct the state starting from the initial one. This reduces the actual state descriptionlength down to a constant amount. The approach is feasible, if the simulation mode thatreconstructs a state based on its generation path is fast. For improved differentiation induplicate detection we may include further information with each state.

7.1 Memory Limited Search

To alleviate the storage problem for exploration algorithms, in Artificial Intelligence dif-ferent suggestions have been contributed. Iterative deepening A* search (cf. Section3.3.3) refers to a series of depth-first searches with increasing depth or cost threshold.Transposition (hash) tables [RM94] include duplicate detection to depth-first (iterative-deepening) search. Memory-limited algorithms generate one successor at a time. Thealgorithms assumes a fixed upper limit on the number of allocated edges in open∪closed.When this limit is reached, space is reassigned by dynamically deleting one previously

107

Page 108: Esto es una cagada

108 CHAPTER 7. STATE RECONSTRUCTION

expanded node at a time, and if necessary moving its parent back to open such that itcan be regenerated later. In acyclic state spaces Sparse Memory Graph Search takes acompressed representation of the closed list that allows the removal of many, but notall nodes. Frontier Search [KZ00] completely omits list closed and applies a divide-and-conquer strategy to find the optimal path in weighted acyclic graphs. Partial ExpansionA* is an algorithm that reduces the open list. It contributes to the observation that inclassical A* search that expands a node by always generating all of its successors; there-fore, nodes with merit larger than the optimal solution which will never be selected,which can clutter the open list, and waste space. In FSM pruning [TK93] the hash tableis replaced by an automatically inferred structure based on regularities of the searchspace. Pattern Databases (cf. Section 6.5.4) are memory-intense containers to improvethe heuristic estimate for directed search and refer to an approximation that is a sim-ulation. Delayed duplicate detection [Kor03] refers to the option to include secondarymemory in the search.For assembly-level model checking of software we observed that duplicate detectionfor the large state descriptions is essential, for full verification lossless storage is re-quired, there is no general assumption on the regularity of the state space graph to bemade, and, considering compacted atomic code regions as in our case, the graph may beweighted. Therefore, the application of most of the above techniques is limited.

In [EPP05], the authors propose a memory-efficient representation of markings in col-ored petri nets. The approach distinguished between explicit-makings and ∆-markings.Explicit markings, which only appear at levels i · n, n ∈ IN , i = 1, 2, .. store the fullinformation about a marking, while ∆-markings are represented through a reference toone of it’s predecessors and a transition instance. We will outline the difference to ourapproach in Chapter 9.

7.2 State Reconstruction in StEAM

Our proposal of state reconstruction distributes the state by its hash value, but storesonly limited information that is used to reconstruct the state. The general assumptionis given a sequence of instructions, reconstruction is fast.

Mini-States For state reconstruction, we introduce the notion of mini-states, whichconstitute a constant-sized, compressed representation of actual states in StEAM. Amini-state is essentially composed of:

• The pointer to the immediate (mini-state) predecessor to reconstruct the generat-ing path.

• The information, which object code transition was chosen from the parent state tothe current one.

• Additional hash information for the corresponding explicit state.

The object code transition is derived from the program counter of one thread in the ex-plicit state, which the mini-state corresponds to. Storing the additional hash information

Page 109: Esto es una cagada

7.2. STATE RECONSTRUCTION IN STEAM 109

in a mini state is not mandatory, as it can merely save the reconstruction of states incase of a hash conflict. However, since state comparisons are computationally expensive,it is most advisable to invest an additional constant amount of memory per state to storethe additional information, as it may greatly speed up the exploration.

Figure 7.1 illustrates the method of state reconstruction. On the upper side of the fig-ure, we see the first three levels of the search tree with a degree of three. Assuminga worst-case scenario, we need to store in each node, the full description of the corre-sponding state, including stack contents, CPU-registers, global variables and the lock-and memory pool. The root node (i.e. the initial state) is labeled by I. The other nodesare labeled by 1, .., 12 according to their breadth-first generation order. The edges of thetree are labeled with oi - which denotes the object code instruction (operator) generatingstate i when applied to i’s immediate predecessor state. The edges are directed towardseach state’s predecessor to emphasize that the search tree in StEAM is build by placinga predecessor pointer in each state.

As there is theoretically no limit to the amount of dynamically allocated memory in aprogram, the required memory for explicitly storing the generated states in the searchtree can grow to an arbitrary amount.

On the lower side of Figure 7.1 we see the same search tree encoded with mini-states.Besides the predecessor pointer and the operator, which are represented by the labelededges, the only information stored in a mini state is the additional hash code hi of statei.

In addition to the mini-states, we also have to memorize I as the only explicit state. InFigure 7.1, state 5 and its corresponding mini-state are highlighted by a bold dashedborder. To reconstruct state 5 from the mini-state encoding, we follow the predecessorpointers up to the (mini-) root node, which yields the operator sequence o5, o1. Applyingthe converse sequence o1, o5 to I, transforms I to state 1 and finally to state 5. Algo-rithm 7.1 illustrates the expansion of a mini-state state s using reconstruction. Here,I describes the initial state in its full representation and Imin the corresponding mini-state. For a mini-state s, s.o denotes the operation (e.g. the sequence of machine in-structions), which transforms the predecessor s.pred into s. Similarly, for a full state x,x.code(x′) denotes the operation which transforms x to it’s successor state x′. Note, thatStEAM operations have a constant-sized representation through their program counter,which is usually the program counter of a thread running in x. The notion x.execute(o)refers to applying operation o to a full state x. The hash table H contains the mini-staterepresentatives of all previously generates states.

State reconstruction applies to both lists open and closed. In case a duplicate candidate isfound by a matching hash-value and additional mini-state information, the correspond-ing state is reconstructed by simulation of the implicit trail information from a copy ofthe initial state. Since as in our case, virtual machines are highly tuned for efficiency oflinear program execution, reconstruction is fast. State reconstruction can be combinedwith almost all other state space reduction techniques.

A theoretical limit to state reconstruction is the lower bound for a lossless storage ofstates, which is about O(log

(nm

)), where n is the number of elements to be stored and

m is number of bits available for the hash table. By the discrepancy of the number ofstates generated and the state description needed, the practical savings are large.

Page 110: Esto es una cagada

110 CHAPTER 7. STATE RECONSTRUCTION

Figure 7.1: Search tree in explicit representation (top) and using mini-states (bottom).

Page 111: Esto es una cagada

7.3. KEY SATES 111

ExpandRec(mini-state s)x← Reconstruct(s)Γ← expand(x)for each x′ ∈ Γ

c← hash(x′)dup← falsefor each s′ ∈ H[c]

if (Reconstruct(s′)=x′)dup← truebreak

if (dup = false)<allocate mini-state s′′>s′′.pred← ss′′.o← x.code(x′)H[c].store(s′′)

Reconstruct(mini-state s)stack σ ← ∅s′ ← swhile (s′ 6= Imin)

σ.push(s′.o)s′ ← s′.pred

x← Iwhile (σ 6= ∅)

o← σ.pop()x.execute(o)

return x

Algorithm 7.1: General State Expansion Algorithm with State Reconstruction

As other areas related to state space search, such as puzzle solving or planning, have alow memory requirement of a few bytes per state, regeneration would possibly not yieldgain. The larger the state description length, the better the obtained savings.

Experimental results for state reconstruction can be found in Chapter 8.

7.3 Key Sates

As mentioned, due to the high speed of the virtual machine, state reconstruction is gen-erally fast in StEAM. However, the approach may also considerably slow down the ex-ploration. This happens in cases, where large search depths are encountered or whenthere are a lot of hash collisions: for each stored sate x′ with the same hash addressas the newly generated state x, to check for a duplicate, we must reconstruct x′ alongthe lengthy path from the root node to the mini-state of x′ (cf. Algorithm 7.1). In anadvanced application of state reconstruction, we may introduce so-called key-states. Thelatter are full descriptions of states, that for i ∈ IN are associated to the mini-states atlevels i·j, j = 1, 2, ... When we want to reconstruct the full description of some mini-states, we do not have to follow the predecessor pointers up to the root of the search tree, butmerely to the next predecessor with an associated key state. This ensures, that we needto execute at most i− 1 operations to reconstruct a state. Figure 7.2 illustrates a searchtree for i = 3.

Page 112: Esto es una cagada

112 CHAPTER 7. STATE RECONSTRUCTION

Figure 7.2: Search tree with key states.

7.4 Summary

In summary, state reconstruction contributes a new approach for improving the time-space trade-off in program model checking in recomputing system states based on theirgenerating paths. While it is lossless in general and combines with any state expandingexploration algorithm, bit-state hashing can be integrated for the closed list. The mainaspect is that no longer system states are stored but reconstructed through predecessorinformation. The memory requirement per state shrinks to a constant and the timeoffset is affordable (cf. Chapter 8). In fact larger models can be analyzed. In the currentapproach, each state in the search tree is reconstructed starting from the root node.Thus we have low memory requirements, since only the initial system state must bekept in memory in a full representation. As a drawback, the reconstruction of a statecan be time consuming, if its generating path is long. One issue for future research isthe integration of key-states into the existing implementation.

Page 113: Esto es una cagada

Chapter 8

Experiments

In the following, we experimentally evaluate the approaches presented in the previouschapters.

First, we compare heuristic search with undirected search in StEAM. Then, it is eval-uated, in how far state reconstruction (cf. Chapter 7) can help to alleviate the memoryrequirements of software model checking, and thus to find errors in larger programs.In Section 8.4, we investigate the speedup in the state exploration when incrementalhashing is used. Finally, in Section 8.5, we evaluate the planning capabilities of StEAMfor the multi-agent manufacturing problem (cf. Section 5.5).

8.1 Example Programs

Here, we discuss some programs used for benchmarking in StEAM. The programs in-clude C++ implementations of classical toy protocols, as well as simple real-life applica-tions. Moreover, the examples are either scalable or non-scalable programs and containdifferent types of errors, such as deadlocks, assertion- and privilege-violations.

8.1.1 Choosing the Right Examples

Classical model checking usually investigates models of concurrent systems describedin specialized modeling languages such as Promela. In contrast, unabstracted programmodel checking has a broader spectrum of applications, as we are not limited to thesearch for fundamental errors - such as errors in protocol specifications. The new gen-eration of program model checkers also allow us to find low level programming errors,like illegal memory accesses. On the other hand it is good to point out, how programmodel checking is related to the classical applications. In particular, it should be evidentthat the used description languages (i.e. actual programming languages), subsume thespecialized languages of classical model checkers with respect to expressiveness.

For the above reasons, it is reasonable to include in the set of test programs implemen-tations of protocols that are frequently cited in the classical model checking research.Additionally, we should exemplify how the tool can be used to find errors during the im-plementation phase of software. Hence, we implemented the popular protocols: Dining

113

Page 114: Esto es una cagada

114 CHAPTER 8. EXPERIMENTS

Philosophers, Leader Election and Optical Telegraph as concurrent C++ programs. Ad-ditionally, we devised a bank automata scenario whose implementation was constantlychecked for errors.

8.1.2 Scalability

A scalable model (or program) is one that is instantiated through a size parameter n,which usually indicates the number of involved processes (or threads). It is a known fact,that state space of a system grows exponentially with the number of involved processes.Hence, the quality of model checking techniques is often measured by the maximal scaleparameter in a model, for which the an error is found. However, not for all systems, thedifficulty to find an error grows monotonic with the number of processes. As a simple ex-ample, consider a scalable version of the account balance example from section 4.5.2: Wehave n processes, which access a single shared variable x. A data inconsistency arises,if another process accesses x, before a process has written the results of its algebraicoperations back to x. For this example, the error probability even increases with thenumber of processes.

8.1.3 Dining Philosophers

The dining philosophers problem constitute an popular scalable toy protocol. It is de-scribed as n philosophers sitting on a round table. Each philosopher takes turns inphilosophizing and eating. For the latter, there is a plate of spaghetti in front of eachphilosopher. Between two philosophers, lies a fork, i.e. n forks in total. In order to eat,a philosophers needs to hold a fork in each hand. So - when he feels hungry, he stopsphilosophizing and first takes the fork to his left and then the fork to his right. Whenfinished eating, the philosopher will put down the forks and continue to philosophize.When a philosopher tries to take a fork that is already used by another philosopher, hewill wait until it is put down. In a faulty protocol, a deadlock occurs, if all philosopherspick up the left fork at the same time, because then each one, waits for the second forkto be released by his right neighbor. As a consequence, the philosophers will starve todeath. Figure 8.1 illustrates the initial state for n = 4.

Although often addressed as a toy example, the dining philosophers play an importantrole in protocol verification, and not only so because it is a clear and funny way to il-lustrate deadlocks. Through it’s scalability, the protocols state space can be increasedto an arbitrary size. Furthermore, while the size of the state space increases exponen-tially with the number of processes, the shortest path from the initial state to a deadlockgrows only linearly and can be extrapolated. This qualifies the protocol as a way to testthe quality of heuristics.

8.1.4 Optical Telegraph

This protocol involves a set of n communicating optical telegraph stations [Hol91]. Thestations form a chain to transmit messages over long distances. When a station relays amessage, it first requests the attention of its neighbor station then it sends the message.

Page 115: Esto es una cagada

8.1. EXAMPLE PROGRAMS 115

Figure 8.1: The Dining Philosophers.

A deadlock occurs, if all stations mutually request each other’s attention. Though stilla toy program, an implementation of the optical telegraph protocol gets closer to a realapplication than the dining philosophers. It is also frequently cited in model checkingrelated literature [GV02, MJ05, LV01, Meh02, LL03].

8.1.5 Leader Election

A leader election problem involves a set of n processes connected through communicationchannels in an unidirectional ring. Figure 8.2 illustrates such a scenario.

The task is to elect one process as the leader by message passing. Hence, a necessarycriterion for a valid leader election protocol is that at each time the number of electedleaders is at most one. Such a valid protocol problem is given e.g. in [FGK97]. An errorcan easily be seeded into a valid implementation, by assigning the same id to more thanone process. As a result, more than one leader can be elected.

Observer Threads Obviously, the above criterion is a safety property that must beexpressed as an invariant over the entire program execution. As stated in Section 4.4,StEAM currently supports the automatic detection of privilege violations, deadlocks andillegal memory accesses and allows the definition of local assertions. We can augmentthe checked properties to include invariants by introducing observer threads. Thesethreads then constitute a part of the model checking environment rather than the actualprogram. Concretely, for an invariant demanding that property φ must hold at any time,we define a thread which executes VASSERT(φ) in an infinite loop. As a result, for eachstate in the original program, we have a state in the new program which differs only inthe thread-state of the observer. Furthermore, for each such state, there is a successorstate in which the assertion is is checked.

Page 116: Esto es una cagada

116 CHAPTER 8. EXPERIMENTS

Figure 8.2: Example for a leader election problem.

8.1.6 CashIT

CashIT describes a scenario, where the controllers for n cash dispensers communicatewith a global database of bank accounts through communication channels (cf. figure8.3). After logging in, the users of the automata may deposit, withdraw or transfermoney from/to the accounts.

CashIT differs from the first three examples, not only because it was devised specificallyfor the use with StEAM. Also, the example was written from a different perspective. Fortoy models such as the dining philosophers, we a priori know their implementations andthe errors they yield.

For CashIT it was tested, in how far StEAM can be used to detect errors during the im-plementation phase of software, rather than during maintainance. Hence, while writingthe program, the partial implementation was regularly tested against several proper-ties, including (besides the absence of deadlocks and privilege violations):

• For each withdrawal, the account balance decreases by the withdrawn amount.

• Analogously, for deposits the balance increases.

• For for a transfer of n units from account a to account a′, the balance of a decreasesby n, while the balance of a′ increases by n.

Although, the above properties may sound trivial, their verification is not. In fact, modelchecking with StEAM revealed several errors that arose during the implementation ofCashIT. The most subtle of which, is a privilege violation that, according to the errortrail, occurred during withdrawal. As it turns out, the error was caused by a false lockto the account balance:

Page 117: Esto es una cagada

8.2. EXPERIMENTS ON HEURISTIC SEARCH 117

Figure 8.3: The CashIT program.

VLOCK(balances[logged_user])

which should have been:

VLOCK(&balances[logged_user])

As the CashIT example indicates, the current version of StEAM is already useful todetect program errors during the implementation phase of a simple program.

8.2 Experiments on Heuristic Search

Here, we compare the results of uninformed search in StEAM with those obtainedthrough directed search with the heuristics explained in Section 4.5. The machine usedis a Linux PC with a 3GHz CPU and 1GB of RAM.

We conducted experiments with the four scalable programs from Section 8.1, diningphilosophers problem (philo), the leader election protocol (leader), the optical telegraphprotocol (opttel), and the bank automata scenario (cashit).

Page 118: Esto es una cagada

118 CHAPTER 8. EXPERIMENTS

Lock and Global Compaction Lock and Global Compaction (lgc) is a state space re-duction technique that was devised for StEAM. It alleviates the state explosion problemby allowing thread switches only after a resource has been locked or unlocked, or afteran access to the memory cell of a global variable has occurred. The intuition behind lgcis, that only global variables and resource locks can influence the behavior of a thread:Assume that at some state s, the program counter of thread t is at a conditional branch.Furthermore by executing another thread t′ in s, we arrive at s′. Obviously, the stateresulting from iterating t in s′ can only differ from the state resulting from iterating t ins, if the iteration of t′ in s changes the value of some global variable which occurs in thebranch condition of t.

Analogously, if the next action of t is to lock a resource r, the iteration of t′ can onlyinfluence t, if it locks or unlocks r.

In the the experiments on directed and undirected search, it is determined in how farlgc can help to solve greater program instances.

Undirected Search The undirected search algorithms considered are BFS and DFS.Table 8.1 shows the results. For the sake of brevity we only show the result for themaximum scale (s) that could be applied to a model with a specific combination of asearch algorithm with or without state space compaction (c), for which an error couldbe found. We measure trail length (l), the search time (t) in seconds and the requiredmemory (m in KByte). In the following n denotes the scale factor of a model and msc themaximal scale.

In all models BFS already fails for small instances. For example in the dining philoso-phers, BFS can only find a deadlock up to n = 6, while heuristic search can go up ton = 190 with only insignificantly longer trails. DFS only has an advantage in leader,since higher instances can be handled, but the produced trails are longer than those ofheuristic search.

Heuristic Search Table 8.2 and Table 8.3 show the results for best-first search ap-plied to the four programs. Analogously Table 8.4 and Table 8.5 depict the results forA*.

The int heuristic, used with BF, shows some advantages compared to the undirectedsearch methods, but is clearly outperformed by the heuristics which are specifically tai-lored to deadlocks and assertion violations. The rw heuristic performs very strong inboth, cashit and leader, since the error in both protocols is based on process communica-tion. In fact rw produces shorter error trails than any other method (except BFS).

In contrast to our expectation, aa performs poorly at the model leader and is even out-performed by BFS with respect to scale and trail length. For cashit, however, aa, leadto the shortest trail and rw are the only heuristic that find an error for n = 2, but onlyif the lgc reduction is turned off. Both of these phenomena of aa are subject to furtherinvestigation.

The lock heuristics are especially good in opttel and philo. With BF and lgc they can beused to a msc of 190 (philo) and 60 (opttel). They outperform other heuristics with nolgcand the combination of A* and lgc. In case of A* and nolgc the results are also good for

Page 119: Esto es una cagada

8.2. EXPERIMENTS ON HEURISTIC SEARCH 119

cashit leaderc s l t m s l t m

DFS y 1 97 15 134 80 1,197 25 739DFS n 1 1,824 2 96 12 18,106 10 390BFS y - - - 5 56 36 787BFS n - - - 3 66 3 146

opttel philoc s l t m s l t m

DFS y 9 8,264 21 618 90 1,706 17 835DFS n 6 10,455 22 470 20 14,799 46 824BFS y 2 21 2 97 6 13 16 453BFS n - - - 4 26 6 164

Table 8.1: Results with Undirected Search.

cashit leaderSA C s l t m s l t m

pl1 y 1 97 15 134 5 73 9 285pl2 y 1 97 6 120 5 73 9 312mb y 1 97 15 134 80 1,197 25 737lnb y 1 97 15 134 5 73 9 263int y 1 98 9 128 5 57 39 789aa y 1 84 0 97 5 72 28 637rw y 1 61 0 97 60 661 19 671pba y 1 97 9 121 80 1,197 24 740pbb y 1 97 15 134 80 1,197 24 741pl1 n 1 1,824 2 97 4 245 9 284pl2 n 1 792 0 97 4 245 17 376mb n 1 1,824 2 97 12 18,106 10 428lnb n 1 1824 2 97 4 245 9 284int n 1 1,824 2 97 3 68 4 139aa n 2 328 9 146 3 132 3 139rw n 2 663 15 463 40 1,447 10 470pba n 1 792 1 97 12 18,106 11 411pbb n 1 1,824 2 97 12 18,106 10 407

Table 8.2: Results with Best-First Search for cashIT and Leader Election.

philo (n=6 and n=5; msc is 7).

According to the experimental results, the sum of locked variables in continuous blockof threads with locked variable is a good heuristic measure to find deadlocks. Only thelnb heuristic can compare with pl1 and pl2 leading to a similar trail length. In case ofcashit pl2 and rw outperform most heuristics with BF and A*: with lgc they obtain an

Page 120: Esto es una cagada

120 CHAPTER 8. EXPERIMENTS

error trail of equal length, but rw needs less time. In the case of A* and nolgc, pl2 isthe only heuristic which leads to a result for cashit. In most cases both pl heuristics areamong the fastest.

The heuristic pba is in general moderate but sometimes, e.g. with A* and nolgc andopttel (msc of 2) and philo (msc of 7) outperforming. The heuristic pbb is often amongthe best, e.g. leader with BF and lgc (n = 6, msc = 80), philo with BF and lgc (n=150;msc is 190) and philo with A* and nolgc (n = 6, msc = 7). Overall the heuristics pba andpbb are better suited to A*.

Experimental Summary Figures 8.4, 8.5, 8.6 and 8.7 give a graphical summary ofthe impact of heuristic search on the overall performance of the model checker. Wemeasure the performance by extracting the fourth root of the product of trail length,processed states, time and memory (geometric mean of the arguments). We use a log-arithmic scale on both axes. Note, that there is no figure for cashIT, simply becausethe available experimental data is insufficient for a graphical depiction. Also, except forLeader Election, we only give the diagram for lgc, since as it generally performs better.For Leader Election, we additionally give a diagram for the absence of lgc, because herethe difference

opttel philoSA C s l t m s l t m

pl1 y 60 602 21 770 190 386 17 622pl2 y 60 601 21 680 190 385 49 859mb y 9 6,596 16 518 150 750 21 862lnb y 60 602 21 763 190 386 17 639int y 2 22 1 98 8 18 19 807aa y - - - 6 14 21 449rw y 2 309 1 98 90 1,706 18 854pba y 9 6,596 16 513 90 450 4 234pbb y 9 6,596 16 510 150 750 21 860pl1 n 40 1,203 20 705 150 909 34 859pl2 n 40 1,202 19 751 150 908 33 857mb n 6 10,775 23 489 60 1,425 10 548lnb n 40 1,203 18 719 150 909 34 856int n - - - 5 33 23 663aa n - - - 4 27 7 197rw n 2 494 7 206 20 14,799 469 822pba n 6 10,775 23 470 40 945 3 239pbb n 6 10,775 24 471 60 1,425 10 571

Table 8.3: Results with Best-First Search for Optical Telegraph and Dining Philoso-phers.

Page 121: Esto es una cagada

8.2. EXPERIMENTS ON HEURISTIC SEARCH 121

cashit leaderc s l t m s l t m

pl1 y - - - 5 56 34 711pl2 y 1 61 278 387 5 56 36 760mb y - - - 5 56 40 769lnb y - - - 5 56 34 719int y - - - 5 57 40 785aa y - - - 5 61 38 744rw y 1 61 196 310 7 78 16 589pba y - - - 5 56 38 648pbb y - - - 5 56 40 777pl2 n 1 136 1057 614 3 66 3 141mb n - - - 3 66 3 151rw n - - - 3 66 2 97pba n - - - 3 66 3 138pbb n - - - 3 66 3 139

Table 8.4: Results for A* on cashIT and Leader Election.

opttel philoc s l t m s l t m

pl1 y 3 32 4 215 190 386 17 631pl2 y 2 21 0 98 190 385 48 860mb y 2 22 2 98 7 16 11 371lnb y 2 22 1 98 8 18 8 286int y 2 22 1 98 7 16 18 629aa y 2 22 4 171 6 14 21 449rw y 2 22 2 98 6 14 21 449pba y 3 32 13 424 60 122 19 628pbb y 3 32 19 573 60 122 19 634pl2 n - - - 5 38 17 485mb n - - - 4 27 4 155rw n - - - 4 27 7 199pba n 2 63 55 845 7 51 42 872pbb n - - - 6 45 19 431

Table 8.5: Results for A* for Optical Telegraph and Dining Philosophers.

Page 122: Esto es una cagada

122 CHAPTER 8. EXPERIMENTS

Figure 8.4: Geometric mean for Dining Philosophers with lgc.

Figure 8.5: Geometric mean for Optical Telegraph with lgc.

Page 123: Esto es una cagada

8.2. EXPERIMENTS ON HEURISTIC SEARCH 123

Figure 8.6: Geometric mean for Leader Election without lgc.

Figure 8.7: Geometric mean for Leader Election with lgc.

Page 124: Esto es una cagada

124 CHAPTER 8. EXPERIMENTS

to the setting with lgc is most significant. In fact, the maximum possible scale almostdoubles, if the compaction is used.

In Optical Telegraph, the heuristics lnb, pl1 and pl2 are performing best. The pl2 heuris-tic only starts to perform well with n = 15, before the curve has a high peak. It seems,that the preference of continuous blocks of alive or blocked thread has only a value, afterincreasing a certain scale, here 10. The pab and pbb heuristic perform similar up to anmsc of 9.

In the Dining Philosophers, the heuristics pl1, pl2, pba, pbb are performing best. If onlyBF is considered, the heuristic lnb behaves similar than pl1 and pl2. Again, pl2 has aninitial peak. DFS is performing well to msc of 90. In the experiments the new heuristicsshow an improvement in many cases. In the case of deadlock search the new lock andblock heuristics are superior to most blocked.

8.3 Experiments for State Reconstruction

To validate the memory savings of state reconstruction, we used a Linux system with a1.8 GHz CPU and a memory limit of 512 MB and compare the runtime in seconds andspace in MB. The time limit was set to 30 minutes.

We ran StEAM with (rec) and without (¬rec) state reconstruction in the algorithmbreadth-first search (BFS) and greedy best-first search (GBF) with the ’lock and block’[LME04] heuristic. The models analyzed are instances of the Dining Philosopher (phil)and the Optical Telegraph protocol (optel). The results are summarized in Table 8.6 and8.7. As we can see the approach trades time for space. This tradeoff becomes more sig-nificant while scaling the model. When using ’rec’ in heuristic search, we are even ableto find errors in instances, where ¬rec fails.

In fact, when for GBF and ’rec’ the search eventually exceeds the time limit in optel-47,the memory consumption is still low (110 MB). So it can be assumed, that even higherinstances can be handled, if more time is available.

BFS GBF¬rec rec ¬rec rec

model time space time space time space time spacephil-2 0 97 0 97 0 97 0 97phil-3 0 101 1 98 0 97 0 97phil-4 4 133 27 108 0 97 0 97phil-5 29 402 808 275 0 98 0 97

phil-10 o.m. o.m. o.m. o.m. 0 98 56 97phil-100 o.m. o.m. o.m. o.m. 5 199 211 100phil-190 o.m. o.m. o.m. o.m. 444 492 663 107phil-195 o.m. o.m. o.m. o.m. o.m. o.m. 1,540 108phil-196 o.m. o.m. o.m. o.m. o.m. o.m. 1,557 108

Table 8.6: Results of State Reconstruction for the Dining Philosophers.

Page 125: Esto es una cagada

8.4. EXPERIMENTS ON HASHING 125

BFS GBF¬rec rec ¬rec rec

optel-2 36 271 996 185 0 98 0 97optel-5 o.m. o.m. o.t. o.t. 0 102 0 98

optel-10 o.m. o.m. o.t. o.t. 1 116 3 98optel-20 o.m. o.m. o.t. o.t. 15 183 38 100optel-40 o.m. o.m. o.t. o.t. 137 511 850 106optel-41 o.m. o.m. o.t. o.t. o.m. o.m. 980 107optel-46 o.m. o.m. o.t. o.t. o.m. o.m. 1,714 110optel-47 o.m. o.m. o.t. o.t. o.m. o.m. o.t. o.t.

Table 8.7: Results of State Reconstruction for Leader Election.

8.4 Experiments on Hashing

Here, we experimentally measure the impact of incremental hashing on the explorationspeed of StEAM.

Partial vs Full Hashing For the previous experiments, we used partial hashing, i.e.for efficiency considerations, only a part of the state description is hashed. In this case,we only hash the machine registers of all running threads. This allows us to solve higherinstances of each protocol within the given time limit. Table 8.8 shows the number ofhash collisions which occur when partial hashing is used compared to the number ofcollisions using full hashing i.e. if the full state vector is hashed. Here, we use thedining philosophers with depth-first search.

model states full partialphil-2 58 0 6phil-3 231 0 23phil-4 969 0 135phil-5 2,455 0 304phil-10 18,525 4 1,656phil-15 44,356 19 3,470phil-20 80,006 47 5,746

Table 8.8: Number of hash collisions for partial and full hashing.

Although the machine registers are the state components most likely to be changed by atransition, partial hashing produces a much greater number of hash collisions than fullhashing. For a scale of 20 and 80,006 generated states, we have 47 as compared to 5746collisions. This is not a big problem for the small test programs used here: It turns outthat in none of the experiments, a hash code occurs more than twice, since the size of the

Page 126: Esto es una cagada

126 CHAPTER 8. EXPERIMENTS

hash table is relatively big compared to the number of generated states. Hence, a hashcollision can quickly be resolved. However, with bitstate hashing in mind, the amountof collisions would be unacceptable: each collision leads to an unexplored part of thesearch tree, increasing the probability of missing an error. This implies, that for bitstatehashing it is essential to have a minimum of hash collisions, which is only possible ifthe entire state description is hashed. This also includes the stacks and all allocatedmemory blocks. As a consequence, computing the hash address may considerably slowdown the exploration.

Incremental Hashing We conducted some first experiments on incremental hashingof the stacks in StEAM. The rest of the state is hashed non-incrementally and the resultare combined to a global hash value. For each running thread the program stack isgiven as a byte vector of 8 MByte, while only the used parts are stored in the statedescription. By concatenating the stack of n running threads, we arrive at a vectorsize of 8 · 10242 · n, which has to be hashed. As new threads may be generated, thevector can grow dynamically. We used the same machine, as well as the same time- andmemory limits as in the preceding experiments. Figure 8.8 compares the run times ofthe exploration with non-incremental and incremental hash computation for the diningphilosophers, optical telegraph and leader election protocol1. For the leader electionprotocol with GBF search, we use the more appropriate ’read/write’ [LME04] heuristic.

Note, that even for the non-incremental computation, we only regard the portion of thestack, which lies below the current stack pointer. All memory cells above the stackpointer are treated as zero. This is important, since hashing the unused portions of thestack in the non-incremental case would not yield a fair comparison and a considerablespeedup of the exploration could be taken for granted.

The results are encouraging, as we get a speedup by factor 10 and above compared tonon-incremental hashing and more errors can be found within the time limit. In thenear future we aim at the verification of more complex programs, that will not allow tostore each state explicitly. This implies that compacting hash schemes such as bitstatehashing are required. As stated before, partial hashing is not appropriate to hash com-paction due to the high number of hash collisions. Hence, even though partial hashingis faster for the small test programs, full hashing will be essential for checking largerprograms for which our incremental hashing scheme provides an efficient way.

1The leader election protocol makes use of the nondeterministic RANGE-statement, which is currentlynot supported for state reconstruction. Hence, it was not used in the preceding experiments.

Page 127: Esto es una cagada

8.4. EXPERIMENTS ON HASHING 127

BFS GBFmodel ¬inc inc ¬inc incphil-2 1 0 0 0phil-3 14 1 0 0phil-4 257 25 0 0phil-5 o.t. o.t. 1 0phil-6 o.t. o.t. 1 0phil-7 o.t. o.t. 1 0phil-8 o.t. o.t. 2 0phil-9 o.t. o.t. 2 0phil-10 o.t. o.t. 3 0phil-16 o.t. o.t. 10 0phil-17 o.t. o.t. 13 1phil-18 o.t. o.t. 14 1phil-19 o.t. o.t. 17 1phil-20 o.t. o.t. 20 1phil-25 o.t. o.t. 37 3phil-30 o.t. o.t. 65 4phil-50 o.t. o.t. 279 19

phil-100 o.t. o.t. o.t. 239

BFS GBFmodel ¬inc inc ¬inc incoptel-2 o.t. o.t. 1 0optel-3 o.t. o.t 3 0optel-4 o.t. o.t. 6 0optel-5 o.t. o.t. 11 0

optel-10 o.t. o.t. 79 6optel-11 o.t. o.t. 105 7optel-12 o.t. o.t. 135 9optel-13 o.t. o.t. 170 12optel-14 o.t. o.t. 214 15optel-15 o.t. o.t. 262 19optel-16 o.t. o.t. 316 23optel-17 o.t. o.t. 379 27optel-18 o.t. o.t. 448 32optel-19 o.t. o.t. 524 39optel-20 o.t. o.t. 612 46optel-25 o.t. o.t. 1,187 113optel-30 o.t. o.t. o.t. 257optel-40 o.t. o.t. o.t. o.t.

BFS GBFmodel ¬inc inc ¬inc inc

leader-2 6 0 0 0leader-3 263 33 1 1leader-4 o.t. o.t. 5 0leader-5 o.t. o.t. 11 1leader-6 o.t. o.t. 27 3leader-7 o.t. o.t. 55 4leader-8 o.t. o.t. 112 9leader-9 o.t. o.t. 232 17

leader-10 o.t. o.t. 476 35leader-11 o.t. o.t. 936 69leader-12 o.t. o.t. o.t. 244leader-13 o.t. o.t. o.t. 668leader-14 o.t. o.t. o.t. o.t.

Figure 8.8: Run times in seconds using non-incremental and incremental hashing for thedining philosophers (top-left), the optical telegraph (top-right) and the leader electionprotocol (bottom).

Page 128: Esto es una cagada

128 CHAPTER 8. EXPERIMENTS

8.5 Experiments on Multi Agent Planning

In the experiments we used the same system as for the experiments on state reconstruc-tion. We applied our system to randomly generated MAMPs (cf. 5.5) with 5, 10, or 15agents and 10, 20, 50, or 100 jobs. The number of resources was set to 100. Each jobrequires between 10 and 100 steps to be done and up to 10% of all resources. Our goal isto prove, that the combination of planning and simulation leads to better solutions thanpure simulation. To do this, for each MAMP m, we first solve m ten times with pure sim-ulation. Then we solve m ten times with the combination of planning and simulation. Inboth cases, we measure the number of parallel steps of the solution and the total timein seconds spent for planning. A planning phase may be at most 30 seconds long. Ifduring planning a memory limit of 500 MB is exceeded or the open set becomes empty,the phase ends even if the time limit was not reached. A simulation phase executes atleast 20 and at most 100 parallel steps. Additionally a simulation phase ends, if at leastas many agents are idle, as were at the beginning.We expect that in the average case the combination of planning and simulation gives bet-ter results (in terms of the number parallel steps), that justify the additional time spenton planning. The evaluation function that counts the number of taken jobs is expected tolead to a maximum of parallelism, since more taken jobs imply less idle threads. Table8.9 shows the results. Each row represents ten runs on the same MAMP instance. Thecolumn type indicates the type of run, i.e. either pure simulation (sim) or combination ofplanning and simulation (plan). The columns min, max and µ show the minimal, maxi-mal and average number of parallel steps needed for a solution of the respective MAMP.Finally, pt indicates the average time in seconds used for planning.

In almost all cases the average solution quality increases if planning is used. The onlyexception is the instance with 10 agents and 20 jobs, where the heuristic estimate seemsto fail. Note that clashes may lead to worse solutions, since they cause an agent to wasteone step realizing that the assigned job is already taken. For all other cases however, wehave improvements between 6 and 59 parallel steps in the average. The total planningtimes range between 26 and 30 seconds, which seems odd, since each planning phasecan be up to 30 seconds long. However, this can be explained by the fact, that only inthe first planning phase all agents are idle. If in the subsequent planning phases onlya small fraction of all agents is idle, the phases may get very short because the modelchecker can quickly decide, whether new jobs can be assigned or not. If for example weassume that each parallel step corresponds to one minute in reality, then the time spentfor planning is negligible compared to the time gained through the improved solutionquality.

Page 129: Esto es una cagada

8.5. EXPERIMENTS ON MULTI AGENT PLANNING 129

type agents jobs min max µ ptsim 5 10 262 315 286 0plan 5 10 241 315 255 29sim 5 20 517 616 546 0plan 5 20 483 527 510 27sim 5 50 1081 1233 1149 0plan 5 50 1084 1187 1143 29sim 5 100 2426 2573 2478 0plan 5 100 2310 2501 2426 26sim 10 10 259 266 261 0plan 10 10 136 286 246 28sim 10 20 389 465 432 0plan 10 20 412 467 460 30sim 10 50 754 902 839 0plan 10 50 771 856 814 30sim 10 100 1732 1871 1809 0plan 10 100 1761 1858 1783 30sim 15 10 260 263 261 0plan 15 10 187 236 216 30sim 15 20 385 463 417 0plan 15 20 382 410 387 30sim 15 50 675 896 767 0plan 15 50 711 747 725 30sim 15 100 1528 1728 1607 0plan 15 100 1497 1600 1548 30

Table 8.9: Experimental Results in Multiagent Manufacturing Problems.

Page 130: Esto es una cagada

130 CHAPTER 8. EXPERIMENTS

Page 131: Esto es una cagada

Chapter 9

Conclusion

In the following, we summarize the contribution of the work at hand and give an outlookon possible further work based on these results.

9.1 Contribution

The thesis addresses the verification of C++ programs. In contrast to other approachesto software model checking, our intention is to avoid the construction of formal modelsfrom program source code. Instead, we intend to model check C++ implementationsdirectly.

Analysis of the Status QuoThe thesis gives a concise introduction to model checking. The elevator example fromChapter 2 illustrates the complexity of model checking even for small-scale systems.Moreover, the example illustrates interesting properties of the system, how they areformalized, checked and enforced in the model. In its succeeding course, the thesis iden-tifies the verification of software implementations as a modern branch of model check-ing, while it points out the important differences to classical approaches that target theverification of system specifications rather than their implementation.

The need for software model checking is motivated by highlighting its advantages overtraditional verification techniques such as testing. This claim is supported by real-lifeexamples, such as the Mars Path Finder bug, which passed unnoticed through all phasesof the development process.

Also, we introduce the most popular software model checkers and discuss the advantagesand drawbacks of their approach. Our observation is that the majority of existing toolsdoes not adequately handle the complex semantics of modern programming languages,such as Java or C++, and rely on an abstraction of actual source code instead. Thisseems contradictory to the idea of software model checking as a technique that aims tofind errors that result from details of the software’s implementation.

StEAMAs the works main contribution, the C++ model checker StEAM was created. By extend-ing a virtual machine, the tool constitutes the first model checker for C++ programs,

131

Page 132: Esto es una cagada

132 CHAPTER 9. CONCLUSION

which does not rely on an abstract model of the program. The approach of using anunderlying virtual machine for the exploration of a program’s state space eliminatesseveral problems of model based program verification. Firstly, by bypassing a modelingprocess, there is no danger that the user may introduce new errors in the model, whichare not present in the actual program. Also, since we investigate real executables com-piled from program source code, it is assured that the formal semantics of the programare correctly reflected. Thus, we know that each error found is also in the actual sourcecode and that each error in the program can be found (assuming that a sufficient amountof time and memory is available). In its current version, the tool is already capable ofverifying simple programs and of finding errors in larger ones. Note, that to truly verifya program, it’s entire state space needs to be explored. In contrast to the majority ofother software model checkers, StEAM is capable to find low-level errors such as illegalmemory accesses, because it can evaluate the result of each machine instructions beforethey are executed.

StEAM is not the only program model checker that keeps the internal representation ofthe inspected program transparent to the user. Tools like Bandera and JPF1 (cf. sections2.4.3,2.4.3) use a custom-designed formal semantic to translate source code either tosome existing modeling language such as Promela (cf. Section 2.3.1). We have seen, thatthis approach yields two major drawbacks: on the one hand, this implies a considerableoverhead for writing a translator from source code to the modeling language. Thesetranslators often do not even cover the full semantics of the respective programminglanguage. On the other hand, the used modeling languages are often not appropriateto represent programs, as they were mainly designed for the verification of high-levelprotocols rather than actual software implementations.

Moreover, StEAM is not the only model checker which uses the actual machine code ofthe checked program. Tools such as CMC and VeriSoft (cf. sections 2.4.3 and 2.4.3) aug-ment the actual program with a model checking environment and compile it to a modelchecking executable. This bypasses the need to devise a custom compiler or a machine-code interpreting infrastructure (like a virtual machine). However, this approach is notcapable to catch low-level errors such as illegal memory accesses, since the program’smachine code is executed on the physical hardware of the executing computer.

More similar to StEAM are the approaches of JPF2 and Estes (cf. sections 2.4.3). JPF2uses its own Java virtual machine, for which it is probably easy to build a model check-ing algorithm on top. Yet, the design and implementation of the virtual machine impliesa large overhead for the developers of the model checker. Also, such a machine is notthoroughly field tested for correct functioning - in contrast to, e.g., the official Java Vir-tual Machine, which is used by millions of people every day. Likewise, StEAM builds onthe existing virtual machine IVM, which is known to reliably run complex applicationssuch as Doom. Moreover, JPF2 is limited to the verification of Java programs, althoughmost industrial applications - including those of the NASA - are written in C++.

The tool Estes uses an approach most similar to that of StEAM, as it builds the modelchecking algorithm on top of an existing and widely used tool, namely the GNU debuggerGDB. Since GDB supports multiple processor levels, it is also capable to search for low-level errors that occur only in the machine code of a specific architecture. However, thecurrently available literature [MJ05] neither gives a detailed technical description on

Page 133: Esto es una cagada

9.1. CONTRIBUTION 133

the architecture of Estes, nor does it conduct exhaustive experiments. This makes it,yet, hard to judge the possibilities of Estes and how feasible the approach is.

9.1.1 Challenges and Remedies

State ExplosionVerifying compiled source code yields two major challenges. The first is the state explo-sion problem, which also applies to classical model checking. This combinatorial problemrefers to the fact, that the state space of a system grows exponentially with the numberof involved components. Like other model checkers, StEAM needs methods to alleviatethis problem. As the most promising approach, we have used heuristic search. The ben-efit from using heuristics in model checking is threefold: Firstly, since less states needto be memorized, the memory requirements are reduced. Second, because less states arevisited, the error is found faster. Third, heuristic search yields shorter counter examplesthan, e.g., depth-first search. Shorter counter examples can be tracked down more easilyby a user, which facilitates detecting the error in the program source code.

The use of heuristic search is not a novel idea. In fact StEAM inherits some of itsheuristics from HSF-Spin (cf. Section 2.3.1) and JPF2. However, we also devised newheuristics, such as lnb which performs considerably better in finding deadlocks than themost-blocked heuristic (cf. Section 8.2). Moreover, we introduce heuristics such as rwand aa, which favor paths that maximize the access to global variables. We also pro-pose the number of processes alive as a factor which can be combined with an arbitraryheuristic. The intuition is, that only alive threads can cause errors.

The experimental result in Section 8.2 are encouraging and are to a large extend in linewith our expectations.

State SizeThe second major challenge of verifying compiled code lies in the size of the state de-scription. Since we need to memorize the full information about a program state, in-cluding dynamically allocated memory, the state description may grow to an arbitrarysize. Firstly, this may aggravate the state explosion problem, since we have a higher per-state memory requirement. This problem can be alleviated by an incremental storing ofstates: since state transitions usually only change a small fraction of the state compo-nents, we only need to store these differences for the successor state. Also we devisedstate space search based on state reconstruction, which identifies states through theirgenerating path.

The latter approach is - in similar form - also used in other works such as [MD05], whichbuilds the search tree for colored petri nets by identifying intermediate nodes throughthe executed transition. However, for StEAM state reconstruction seems particularlyappropriate, since the execution of the machine instructions along the path from the rootto the reconstructed state is quite fast due to the IVM which was mainly optimized forspeed. This results in a particularly worthwhile time- space-tradeoff. The experimentalresults in Section 8.3 support this.

Another problem of the large state description lies in the hashing effort for a state. Even,if a transition makes only marginal changes to a state, a conventional hash function

Page 134: Esto es una cagada

134 CHAPTER 9. CONCLUSION

needs to look at the entire state description to compute its hash code. This can makehashing the most expensive part of state generation. To overcome this problem, wedevised an incremental hashing scheme based on the algorithm of Rabin and Karp.

To the best of the authors’ knowledge, there is little research on reducing the hashingeffort in model checking. The reason for this is presumably the fact than most othertools are based on abstraction, which yields relatively small state descriptions. Hashingis mentioned in [MD05] though. Here, the approach is to use a heap canonicalization al-gorithm which minimizes the number of existing heap objects that change their positionin the canonical representation when new objects are inserted into the heap or existingones are removed. This also minimizes the number of heap objects whose hash code hasto be re-computed. The fundamental difference to our approach is, that when the con-tent of a heap object changes, its hash code must always be fully computed, even if thechange was marginal. In contrast, the incremental hashing scheme devised in Chapter6 can also hash single objects incrementally.

9.1.2 Application

The functionality of StEAM was shown by model checking several simple C++ programs.These range from implementations of classical toy-models such as the dining philoso-phers to simple real-world applications like the control software for a cash dispenser.Note, that there is an additional example given in the appendix, which deals with thecontrol software of a snack vendor. We are able to find deadlocks, as well as assertion-and privilege-violations. Applying heuristic search allows us to find errors with lowermemory requirements and shorter error trails. The use of state-reconstruction consider-ably reduces the memory requirements per generated state, which allows us to explorelarger state spaces with the same algorithm. The problem of large state descriptions isalleviated by incremental hashing, which can significantly speed up the exploration.

Additionally, we proposed a method to use StEAM as a planner for concurrent multi-agent systems, where simulation and exploration of a system is interleaved in a verynatural way. The idea of applying model checking technology for planning is appearsto be a logical consequence, since the two fields are quite related (cf. Section 5.1). Theapproach also takes a unique position in planning, since by planning on compiled code,we avoid the generation of a formal model of the system: the planning is done on thesame program, which controls the actual agents.

The new model checker raised some attention in the model checking community. Wehave already received requests from NASA - Ames Research Center and RTWH Aachen,who would like to use StEAM on their own test cases.

9.1.3 Theroretical Contributions

We aim at giving a very general view on model checking as a verification method forstate based systems. The conversion between the two graph formalisms in section 2.1.1emphasizes, that model checking approaches rely on the same principles, independentlyof the underlying formalism or the field of application.

In chapter 6, we give a formal framework for incremental hashing in state space search.

Page 135: Esto es una cagada

9.2. FUTURE WORK 135

Incremental hashing is an important issue in program model checking, but can also beapplied to any problem domain that is based on state space exploration. As an importantpart of the framework, we devise efficient methods for incremental hashing on dynamicand structured state vectors.

Moreover, we make general observations about the asymptotical run times of hash com-putation in several domains and analyse time/space tradeoffs which invest memory topre-calculate tables that allow an asymptotical improvement for the hash computation -e.g., from O(n) to O(1).

For state reconstruction, we give a general algorithm that is applicable to any kind ofstate space search. Whether it makes sense to apply the algorithm depends on two fac-tors: the size of the state descriptions and the time needed to compute a state transition.

For the planning approach, we devised the multi-agent manufacturing problem (MAMP),as an extension of job-shop problems. Furthermore, we give a novel technique which in-terleaves planning and simulation phases based on the compiled code which controls theenvironment. The resulting procedure interleave (cf. Algorithm 5.4) is applicable to alldomains, whose description allows exploration as well as simulation.

9.2 Future Work

StEAM continues to be a running project and the source code will be publicly availablein the near future. We give an outlook for further development the tool may undergo inthe future.

9.2.1 Heuristics

Heuristic search has shown to be one of the most effective methods to overcome the stateexplosion problem. With the appropriate heuristic, it is possible to quickly find an erroreven in programs that exhibit a very large state space. Hence, an important topic willbe to devise new and improved heuristics.

Pattern Databases Of particular interest are pattern databases (PDBs). Here, thestate space is abstracted to a degree that allows an exhaustive enumeration of the ab-stract states by an admissible search algorithm such as breadth-first search. The gen-erated search tree is used to build a table which associates with each abstract state thedistance to the abstract goal state (or error state). In a subsequent exploration of theoriginal state space, the distance values from the table serve as heuristic estimates forthe original states. PDBs were quite successfully applied in puzzle solving and planning.However, the application to program model checking imposes some challenges:

First, one must find an appropriate abstraction. An easy solution would be to selecta subset of the state components. However, a state of StEAM is merely given by thestacks, CPU-registers, data-/bss-sections and the lock-/memory-pool, which allows to de-fine only 26 = 64 distinct abstractions. A more informative heuristic can be obtained byabstractions at the source code level, e.g. by restricting the domain of variables. Theimplementation would be technically challenging, since during exploration the abstrac-tions must be mapped from the source code level to the corresponding memory cells.

Page 136: Esto es una cagada

136 CHAPTER 9. CONCLUSION

Second, in contrast to AI puzzles, not all program transitions are invertible. For in-stance, we cannot define the converse of a program instruction, that writes a value toa variable x, since the previous value of x is not a-priori known. The classical applica-tions of pattern databases [CS96, CS94] rely on a backwards search in the abstract statespace, starting at the abstract goal state. In domains, where state transitions are notinvertible, the search in the abstract state space must maintain a list of all predeces-sors with each generated state. Since duplicate states - and hence multiple predecessorstates are common in concurrent programs, this may considerably increase the memoryrequirements for the abstract state space enumeration.

Trail-Based HeuristicsPrior to the existence of StEAM, the author did some research on trail-based heuristics.These exploit a complete error state description from a previous inadmissible search todevise consistent heuristics. These heuristics can then be used in combination with anadmissible heuristic search algorithm such as A* or IDA* to obtain a shortest error trail.

One example of a trail-based heuristics is the hamming-distance, which counts the num-ber of unequal state components in the current and the error state. Another exampleis the FSM-distance, which abstracts states to the program counters of the runningthreads. Hence, the FSM-distance is related to pattern databases. The latter however,generally require a complete enumeration of the abstract state space to compute thetable of heuristic values, while for the FSM-distance this information can be extractedfrom the compiled code of the program through static analysis.

The author implemented both heuristics for the Java program model checker JPF2.Some experimental results have been published [EM03]. We passed on an implemen-tation for StEAM, as we wanted to concentrate on fundamentally new concepts in ourresearch. Anyhow, the generation of short counter examples remains an important as-pect in model checking, hence implementation of the discussed trail-based heuristicsmay be an option for future enhancements of the model checker.

9.2.2 Search Algorithms

StEAM already supports the uninformed search algorithms depth-first and breadth-firstsearch, as well as the heuristic search algorithms best-first and A*. The memory-requirement of the state exploration remains an important topic in program modelchecking. Hence, StEAM should offer additional algorithms that have a stronger bias onmemory efficiency.

Iterative DeepeningIterative deepening algorithms such as iterative depth-first (IDFS) or IDA* (cf. Chap-ter 3) are particularly attractive for model checking, as they are both optimal and memory-efficient. As an advantage iterated deepening algorithms only need to maintain a smallset of open states along the currently explored path. However, to be truly effective, thealgorithms must be integrated with hash compaction methods which provide a compactrepresentation of the closed states.

StEAM already provides a rudimentary implementation of IDFS. Currently each statemust be fully expanded, instead of generating just one successor state. For an outgoing

Page 137: Esto es una cagada

9.3. HASH COMPACTION AND BITSTATE-HASHING 137

degree of k, this implies that at a search depth of n, the size of the open list is n·k insteadof n. The problem is, that value k for a state cannot be determined before the state isfully expanded, because k is a function of both the number of running threads and thenon-deterministic statements.

External SearchWhen a model checking run fails, this is usually because the space needed to store thelist of open states exceeds the amount of available main memory (RAM). The amountof storage on external memory devices such as hard disk is usually a multiple of themachine’s main memory. At present, the standard personal computer is equipped with512MB RAM and a 80GB hard disk, i.e. the secondary memory is 160 times the amountof available RAM. An obvious approach is to make that memory usable for the stateexploration.

Implicitly, this is already possible through the virtual memory architecture of modernoperating systems. Unfortunately, the virtual memory management has no informationabout the locality of the states, we store - i.e. it will be common that each consecutiveaccess to a stored state causes a page fault and thus an access to the secondary mem-ory. Obviously this would slow down the exploration to a unacceptable degree. For anefficient use of external memory, the data exchange between primary and secondarymemory must be managed by the search algorithm. One of the first approaches to ex-ternal search applies BFS with delayed duplicate detection and uses secondary memoryto efficiently solve large instances of the Towers of Hanoi puzzle [Kor03] and the n2 − 1-puzzle [KS05]. In a different line of research, an external version of A* is used to solvethe 15-puzzle [EJS04]. The external A* was later integrated into SPIN [JE05], yieldingthe first model checker capable of using external search.

Since memory limitations impose an important challenge in software model checking,integrating external search algorithms into StEAM would be very interesting.

9.3 Hash Compaction and Bitstate-Hashing

Compacting hash functions allow a memory efficient storing of the set of closed states.Here, we essentially distinguish between hash compaction and bitstate-hashing.

In hash compaction, a second hash function is used to compute a fingerprint of the states, which is then stored in a hash table. A state is assumed to be a duplicate of s, if itsfingerprint is already stored at the corresponding position in the hash table.

In bitstate-hashing the state is mapped to one or more positions in a bit-vector. A stateis assumed to be a duplicate, if all corresponding bits in vector are already set.

With compacting hash functions, more states can be visited, although completeness ofthe search is sacrificed: If two distinct states are falsely assumed to be duplicates, thesearch tree is wrongly pruned and an error state may be missed. However, statisticalanalyses have shown that the probability for this is very low [Hol96].

The use of compacting hash functions is of particular interest in combination with iter-ated deepening search (cf 9.2.2), because here the list of open states is generally smallcompared to the number of closed states.

Page 138: Esto es una cagada

138 CHAPTER 9. CONCLUSION

StEAM already supports a version of bitstate-hashing, which maps the state to a singleposition in a bit vector using a single hard-coded hash function. In the future, the num-ber of bit positions should be parameterizable, because mapping the state to more thanone position will significantly reduce the risk of hash collisions and thus the danger ofmissing an error state. Using an arbitrary number of bit positions per state also impliesthat the corresponding hash functions must be automatically generated.

9.4 State Memorization

To reduce the memory requirements of the exploration, states should be stored in anincremental fashion. In its current form, StEAM already uses backwards pointers tocomponents of immediate predecessor states. This can alleviate the redundant storingof states, but it cannot fully prevent it: First, two states may share identical compo-nents, although one is not an immediate predecessor of the other. Second, componentsget fully stored, even if they differ only slightly from the corresponding component ofthe predecessor state. This leads to particularly strong redundancies, e.g., if the respec-tive component constitutes a large array in which the state transitions change only onememory cell at a time. In the future one may consider a state storage, which implies lessredundancies. It is important however, that the reduced memory requirements do notcome at the cost of a significant slowdown of the exploration.

Collapse-Compression The model checker SPIN uses a sophisticated approach calledcollapse compression [Hol97c] to store states in an efficient way. Here, different statecomponents are stored in separate hash tables. Each entry in one of the tables is given aunique index. A whole system state is identified by a vector of indices that indicate thecorresponding components in the hash tables.

The use of collapse compression prevents the repeated storing of a component. However,two fully stored components may still expose minimal differences.

Memorizing ChangesAlternatively, states can be represented by the differences to their predecessor state.To some degree this is already implemented by the state reconstruction of StEAM (cf.Section 7), states are identified through their generating path. However executing therespective machine instructions is most probably slower than applying the resultingchanges directly to the state.

As another disadvantage, not all machine instructions are invertible, because the con-verse of writing a new value to a memory cell c depends on c’s old content. This makesit impossible to directly backtrack from a state s to its predecessor p. Alternatively,for each memory cell c whose value was changed by the transition from p to s, we canmemorize the difference δc(s, p) = c[s] − c[p] of the old and new value. This allows us tobacktrack from s to by calculating c[p] = c[s] + δc(s, p) for each changed memory cell c.

9.4.1 Graphical User Interface

In the long term, research in model checking must yield useful tools, which are also ac-cessible for practitioners. StEAM already provides a certain degree of user-friendliness,

Page 139: Esto es una cagada

9.5. FINAL WORDS 139

since its use does not require knowledge about specialized modeling languages such asPromela. Moreover, the counter examples returned by StEAM directly reflect the corre-sponding execution of the investigated program, rather than that of a formal model. Thelatter would make it much harder to track down the error in the actual source code.

To obtain a broad acceptance in the industry, it will be essential to devise an integratedgraphical user interface which allows an easy and intuitive use of the tool.

9.5 Final Words

Beyond the comparisons of the various approaches, the work at hand hopefully gavea clear impression of how model checking may help to ease the software developmentprocess. In particular, it is desirable that the industry becomes aware of the poten-tial that model checking-assisted software development offers as an alternative to thestate-of-the-art engineering process, as in the latter, the software engineer invests a con-siderable amount of time on formal specifications and an even more time on browsingcode in the attempt to detect the cause of an error. Software model checking can providethe engineer with detailed information about the cause of an error, which makes it mucheasier to track down and correct the corresponding piece of code.

The approach of extending an existing virtual machine, not only ensures a correct inter-pretation of the semantics of a program - also the internals of the model checking processremain transparent to the user, who in turn is more likely to accept the new technology.We hope that in the future more researchers become aware of this fact.

Page 140: Esto es una cagada

140 CHAPTER 9. CONCLUSION

Page 141: Esto es una cagada

Appendix A

Technical Information forDevelopers

In the following, we give some detailed information about the implementation of StEAM.It is meant to make it easier for developers to get known to the source package of themodel checker.

A.1 IVM Source Files

The original package of ivm includes the virtual machine and a modified version of thegcc compiler igcc, which produces the ELF executables to be interpreted by ivm. Asstated before, the approach of StEAM does not require changes to igcc, so only the sourcefiles for the virtual machine are of relevance. Moreover, only the files located in the subfolder ivm/vm need to be regarded. These include:

icvmvm.c

In the original package, this file contained the loop which reads and executes the pro-gram’s opcodes. For StEAM, this was replaced by a loop, which expands program statesaccording to the chosen algorithm. The original program execution is only done untilthe main-thread registers to the model checker. At this point, the contents of the stack,the machine, the global variables and the dynamic memory are memorized to build theinitial state (i.e. the root state in the search tree).

The source file also holds the functions getNextState, expandState and iterateThread.The first function chooses the next open state to be expanded according to the usedsearch algorithm. The second function expands a given state and adds the generatedsuccessors to the open list. The third function iterates a given thread one atomic step byexecuting the corresponding machine instructions.

141

Page 142: Esto es una cagada

142 APPENDIX A. TECHNICAL INFORMATION FOR DEVELOPERS

icvmsup.c

This file implements the instructions of the virtual machine in a huge collection of mini-function. Due to its size of more than 140000 lines, it is compiled to 14 separate objectfiles (icvmsup.o, icvmsup1.o,. . . ,icvmsup13.o) before linking. Compiling icvmsup takesquite long (several minutes on a 1.8GHz machine). This is quite inconvenient in cases,where a make clean needs to be done (e.g. when a constants in a header file werechanged), as the original makefile will also delete the icvmsup object files, even if themachine instructions are not affected by the change. In this case it may be convenient tomove the object icvmsup object file to a temporary folder before cleaning up and movingthem back before the subsequent make all. The source file icvmsup.c was not modi-fied during the development of StEAM. However, the file is useful to find out about themeaning of a certain instruction. In icvmsup.c, an instruction can be located throughits corresponding opcode. For example, if we disassemble the dining philosophers fromChapter 8 with iobjdump -d philosophers | less, we find a fraction of machine code:

.

.

.

20: 0200 7c2c 0000 bsrl 2c9c <___do_global_ctors>26: a8eb 0800 movel (8,%r2),-(%sp)2a: a8eb 0400 movel (4,%r2),-(%sp)

.

.

.

We may want to find out about the instruction brsl <x> (branch sub routine long). Notethat iobjdump displays opcodes according to the hi/low order of the underlying machine.In our case, we have an Intel-processor with Little-Endian notation, which means thatthe corresponding opcode is 0x2 rather than 0x200. In icvsump.c we find a line startingwith:

_If1(_sCz2,2,s32) ...

The instructions for ivm are generated through a macro, whose first parameter is the op-code with a leading ”_sCz”. Note that the opcode is appended without any leading zeros.We can easily interpret the implementation of the instruction, even without knowinghow the macro translates to compilable c-code:

(R_SP-=8,WDs32(R_SP,R_PC+6,0));R_PC=(is32(2)+R_PC);

First, the stack-pointer (R_PC) is decremented to reserve space for the return-address ofthe called sub routine. Then, that address is stored at the new position of stack-pointer

Page 143: Esto es una cagada

A.2. ADDITIONAL SOURCE FILES 143

through the macro WDs32 (write signed 32-bit data). The return address correspondsto the current program counter plus 6 - the length of the bsrl instruction. Afterwards,the program counter is set to the the current position plus a 32-bit offset that followsthe opcode. Apparently, the macro is32(y) reads a 32-bit parameter from address of theprogram counter plus y.

cvm.h

This header defines the basic data types needed for the virtual machine, such as themachine registers. The macros used by the mini-functions in icvmsup.c - such as WDs32- are also defined here. As a big convenience, all memory read- and write-accesses areencapsulated through the macros of this header. This enabled us to record all suchaccesses by simply enhancing the macros. On the one hand, this allow us to determinechanges made by a state transition without fully comparing the pre- and successor state.On the other hand, we can capture illegal memory accesses before they actually happen.

A.2 Additional Source files

The following source files were added for StEAM.

include/*

The name is somewhat misleading, as this folder also contains ”.c” files. More precisely, itcontains the AVL-Tree package used for the lock- and memory pool of states in StEAM.Using the AVL trees may not have been the best choice for storing the pools, as theirhandling is tedious and a simple hash table may be faster in many cases. Developersare hereby encouraged to replace the AVL trees with a more suitable data structure.

mysrc/*

This folder holds the source files exclusively written for StEAM - these include:

IVMThread.h|.cc

This is the class definition for threads in StEAM. All new thread classes must be de-rived from IVMThread. A thread registers to the model checker by executing a TREGcommand-pattern in its start-method. A second TREG statement tells StEAM, it hasreached the run()-method of the thread.

mfgen.c

This is the source code of the tool, which is parameterized with the names of all involvedsource files of the checked program. In turn generates a makefile for building the ivm-executable that can be model checked by StEAM.

Page 144: Esto es una cagada

144 APPENDIX A. TECHNICAL INFORMATION FOR DEVELOPERS

extcmdlines.cpp

This is the source code annotation tool. Before the investigated program is compiled, itssource is annotated with line number information and information about the name ofthe source file through command patterns. The annotation is automatically done in themakefile generated by mfgen and , hence, remains transparent to the user. It must beassured though, that extcmdlines.cpp is compiled to an executable called extcml must liein the system path.

The source annotation tool is actually a workaround, as StEAM does currently not usethe debug information that can be compiled into ELF files (e.g. using the -g option ongcc). Writing a parser for this information would have cost too much time consideringthe manpower available for the project.

Developers are encouraged to add this functionality to StEAM, as it would make theannotation redundant and speed up the entire model checking process.

pool.h|.c

These files define the functions needed for the lock- and memory-pool of StEAM’s statedescription.

icvm_verify.h|.c

These source files are the core of StEAM as they define the command patterns, datastructures and functions needed by the model checker. A command pattern is generatedby the macro INCDECPATTERN, which translates into a senseless sequence of incre-ment and decrement instructions. The parameters of a pattern are realized by assigningtheir value to a local variable. After parsing the command pattern, StEAM can obtainthe values of the parameters directly from the stack.

Most functions defined in icvm_verify.c relate to program states. This includes, cloningof a state, comparison of two states, deletion of states etc.

Moreover, icvm_verify.c implements the functions of StEAM’s internal memory man-agement. The management is encapsulated through the functions gmalloc(int s) andgpmalloc(int s, char * p), which allocate memory of a given size or of a given size plusa given purpose. When StEAM is invoked with the parameter -memguard, all memoryregions allocated with gmalloc or gpmalloc are stored in a private memory pool. NOTE:This refers to the memory pool of the model checker and must not be confused with thatof the investigated program. It is possible to print the contents of the memory pool usingthe function dumpMemoryPool(). StEAM’s internal memory management is useful toe.g. detect memory leaks in the model checker. For instance, this was important duringthe implementation of state-reconstruction (cf Chapter 7). Here, states are frequentlydeleted after being replaced by a corresponding mini-state. If a few bytes of the stateare not freed, the resulting leaks will quickly exceed the available memory.

The source file icvm_verify.c also implements the currently supported heuristics throughthe function getHeuristicEstimate. The latter function is called by getNextState() inicvmvm.c, which chooses the state to be expanded next according to its heuristic value.

Page 145: Esto es una cagada

A.3. FURTHER NOTES 145

Note that for simplicity undirected search is also implemented through heuristic values.For BFS, the heuristic estimate of a state corresponds to its depth in the search tree.For DFS the estimate is determined by subtracting the depth from a constant which isgreater than any search depth we may ever encounter.

steam.c

This is the frontend of the model checker, which calls the executable of the modifiedvirtual machine. Environment variables are used to pass the parameters.

hash.h|.c

These files implement the (non-)incremental hashing in StEAM. The hash function isparameterized with a bitmask, which determines the hashed components of the statedescription. The frontend of steam currently only gives two choices: Either only thecpu-registers are hashed (default) or the entire state (-fullhash). The cpu registers werechosen for the partial hash option, since they constitute the state component that ismost likely to be changed by a state transition. Developers are encouraged to try othersubsets of components.

A.3 Further Notes

The functions getNextState, expandState, iterateThread and printTrail actually do notbelong in the file icvmvm.c and it would be a good idea to move them to icvm_verify.c.

When StEAM is started with the -scrtrl option, the error trail is printed as the actualsource lines of the checked program (rather than just the line numbers). For this, StEAMmakes use of the Unix-commands cat and grep. Hence, the option will not work onsystems where these commands are not available. It would be desirable to rewrite theprintTrail function such that it no longer requires any external programs.

As a definite drawback, states must currently be fully expanded. The problem is, thatit is not a-priori known, if during thread iteration a non-deterministic statement is en-countered. In the latter case, execution of a thread yields more than one successor.For this reason state reconstruction is currently not available for programs involvingnon-deterministic statements. Also the full expansion is disadvantageous for depth-firstvariation, such as DFS and IDA*, since at depth d the model checker needs to memorized · o states instead of d - where o is the outgoing degree of a node in the search tree.Hence, a big improvement would be to add the possibility to directly generate the i-thsuccessor of a state.

Page 146: Esto es una cagada

146 APPENDIX A. TECHNICAL INFORMATION FOR DEVELOPERS

Page 147: Esto es una cagada

Appendix B

StEAM User Manual

B.1 Introduction

StEAM, (State Exploring Assembly Model checker) is a model checker for native con-current C++ programs. It extends a virtual machine - called ICVM - to perform modelchecking directly on the assembly level. This document gives you the basic informationneeded to install and use StEAM. The example programs included in the installationpackage are supposed to show the current capabilities of StEAM. However, we encour-age you to try to verify your own programs. This manual will also explain the stepsneeded to prepare a piece of software for verification with StEAM. For detailed informa-tion about the internals of the model checker, we refer to [ML03].

B.2 Installation

The installation instructions assume that StEAM will be installed on a Linux systemusing gcc. StEAM is based on the “Internet Virtual Machine” (IVM). The IVM packagecomes with source code for the compiler and the virtual machine (vm) itself. In thecourse of the StEAM project only the vm needed to be enhanced, while the compiler wasleft unchanged. The task of building igcc - the compiler of IVM, will hence be assignedto the user. The rest of the model checker, including the modified virtual machine caneither be obtained as a binary package for Linux or as a source package. The sourcepackage can be compiled for use on Linux and Cygwin (a Linux-like environment forWindows).

B.2.1 Installing the Compiler

The original source package of IVM can be downloaded from the project’s home page 1.The package was found to compile with older versions of gcc (e.g. 2.95.4). However, laterversions such as 3.4.4 require considerable changes to the makefile in order to buildigcc. You may obtain an adapted version of the source package, which allows to build the

1http://ivm.sourceforge.net/

147

Page 148: Esto es una cagada

148 APPENDIX B. STEAM USER MANUAL

compiler on more recent versions of gcc from the StEAM-page2. Note that this appliesonly to igcc. If you want to compile the original virtual machine on modern versions ofgcc, additional changes to the source package may be necessary.

Once you have build igcc, make sure that the directory containing the binaries (i.e. igcc,ig++, ...) are included in the PATH variable.

B.2.2 Installation of StEAM

In the following, we assume that you have downloaded the binary installation packageof StEAM. If you prefer to compile the source package, please refer to the installationinstructions given there. The binary installation package of StEAM contains the en-hanced virtual machine, the frontend of the model checker, the source annotation tool,the makefile generator and some header files. Unpack the contents of the package to anarbitrary directory e.g. "/home/foo/".

Next, set the following environment variables:

PATH=$PATH:/home/foo/steam_release/bin

STEAM_INCLUDE_PATH=/home/foo/steam_release/include

C_INCLUDE_PATH=$STEAM_INCLUDE_PATH

CPULS_INCLUDE_PATH=$C_INCLUDE_PATH

B.3 A First Example

If you followed the instructions in section B.2, you should now be able to check one of theexample programs. Go to the directory:/home/foo/steam_release/models/philo. Here lies a C++-implementation of the diningphilosophers problem. If you list the directory, you will notice the following files:

MakefilePhilosopher.ccPhilosopher.hphilosophers.c

Besides the Makefile, there are the files Philosopher(|.h|.cc) which describe the c++-class of a philosopher-thread. If you look at the content of Philosopher.cc (e.g. withless Philospher.cc), you will notice, that it basically does locks and unlocks two global

2http://ls5-www.cs.uni-dortmund.de/∼mehler/uni-stuff/steam.html

Page 149: Esto es una cagada

B.3. A FIRST EXAMPLE 149

variables which were passed to the constructor function. The file philosophers.c containsthe main() method which generates and starts the threads.

Compile the program by typing:

make

This will first apply annotations to the .c and .cc source files. After this, the exampleprogram is compiled and linked with ig++ - the compiler of IVM. Do another directorylisting and you will see the following:

MakefilePhilosopher.ccPhilosopher.cc_ext.cPhilosopher.hPhilosopher_ex.ophilophilo_ex.ophilosophers.cphilosophers.c_ext.c

The source files containing an “ext|.c|.cc” in their name were produced by the sourceannotation tool. The file philo is the IVM-object file compiled from the annotated sourcefiles. Start the model checker on the example program by typing:

steam philo 2

This will start the model checker with the default parameters on the compiled programphilo parameterized with ’2’. In the case of the dining philosophers, this means that2 instances of the thread class Philosopher are created. The typed command will pro-duce the output of some standard information from the model checker, followed by thereport of a detected deadlock, an error trail and some statistics - such as the number ofgenerated states, memory requirements etc. Each line of the error trail tells us the exe-cuted thread, the source file and line number of the executed instruction. Here, Thread1 refers to the main program, while Thread 2 and Thread 3 correspond to the first andsecond instance of Philosopher. However, the error trail may seem a bit lengthy for sucha small program. Now type:

steam -BFS philo 2

Page 150: Esto es una cagada

150 APPENDIX B. STEAM USER MANUAL

This tells the model checker to use breadth-first (BFS) search (instead of depth-firstwhich is the default search algorithm). The error trail produced by BFS will be signif-icantly shorter than the first one. You may want to compare the source files with theinformation given in the trail to track down the deadlock. Alternatively, you can add theparameter -srctrl to the command line. This prints the error trail as actual source lines- rather than just as line numbers. Apparently, a deadlock occurs if both Philosopherslock their first variable and wait for the respective other variable to be released.

B.4 Options

StEAM is invoked by a call:

steam <[-optargi]> <progname> [<prog. arguments>], where

• <progname> is the name of the program to be checked.

• <prog. arguments> are the parameters passed to the program to be checked

• <[-optargi]> are optional model checker parameters.

steam -h

Prints out the list of available parameters. The parameters come from one of the follow-ing categories:

B.4.1 Algorithms

In the current version, StEAM supports depth-first search (default), breadth-first search(-BFS), depth-first iterative deepening (-IDFS) and the heuristic search algorithms best-fist (-BESTFIRST) and A∗ (-ASTAR). The latter two additionally require that a heuristicis given (see B.4.2).

Also note, that IDFS is only effective in combination with bitstate-hashing (see below).

B.4.2 Heuristics

For detailed information about each heuristic, we refer to [LME04]. The heuristic touse is specified by the parameter -heuristic <hname>, where <hname> is from one of thefollwing groups.

Deadlock heuristics

These heuristics are specifically tailored to accelerate the search for deadlocks, but willin general not be helpful for finding other errors, such as assertion violations.StEAM currently supports the deadlock heuristics: mb, lnb, pl1, pl2, pb1 and pb2. Forexample:

Page 151: Esto es una cagada

B.4. OPTIONS 151

steam -BESTFIRST -heuristic lnb philo 30

Uses best-first search and the lock and block heuristic to model check the dining philoso-phers with 30 instances of the Philosopher class. The BESTFIRST parameter may havebeen omitted in this case, because StEAM uses best-first as the default algorithm whena heuristic is given. You may compare the results of the directed search with these ofundirected search (type ’steam philo 30’ or ’steam -BFS philo 30’).

Structural Heuristics

Rather than searching for a specific kind of error, structural heuristics [GV02] exploitproperties of concurrent systems and the underlying programming language, such as thedegree of thread interleaving and the number of global variable accesses. The currentlysupported structural heuristics are aa,int and rw.

B.4.3 Memory Reduction

State Reconstruction (experimental!)

When the option -rec is given, StEAM does not store generated states explicitly, butidentifies each state by its generating path. This trades computation time for a reducedmemory consumption. As to version 0.1 of StEAM, this option will not properly workwith programs that make use of the non-deterministic range-statement (cf. Section B.6).

Lock-Global-Compaction

The parameter -lgc activates lock-global compaction, which allows thread-switches onlyafter either a global variable was changed or a resource was locked. In experiments,this reduction method has shown to considerably reduce the memory requirements of amodel checking run. However, it is yet not formally proven that lock-global-compactionpreserves completeness.

B.4.4 Hashing

By default, StEAM only hashes the cpu-registers of a program state. Other options are:

Full Hashing

The option -fullhash instructs StEAM to hash the entire program state.

Incremental Hashing

Given the option -inch, the entire state description is hashed in an incremental fashion,which is usually faster than -fullhash alone. Note, that -inch implies full hashing.

Page 152: Esto es una cagada

152 APPENDIX B. STEAM USER MANUAL

Bitstate Hashing

When option -bitstate is given, closed states are not completely stored but mapped to aposition in a bit-vector. Bitstate hashing should always be used with full hashing, asthis minimizes the chance that StEAM will miss states.

B.4.5 Debug Options

These options are only interesting for developers that want to write their own version ofStEAM.

GDB

The option -GDB starts StEAM in the GNU debugger gdb.

Memory Management

Passing then option -memguard activates StEAMs internal memory management. Thisis helpful to e.g. find memory leaks in the model checker (but not in the checked pro-gram).

B.4.6 Illegal Memory Accesses

Given the option -illegal, StEAM will explicitly look for illegal memory accesses (seg-mentation faults). As this may significantly slow down the exploration, it is not a defaultoption.

B.4.7 Other Options

Simulation

The option -sim tells StEAM to simulate (execute) the program instead exploring itsstate space.

Time and Memory limit

You can set the maximum amount of memory in megabytes (-mem <mb>) and the max-imum runtime in seconds (-time <s>) required by the model checking run. When onethese limits is exceeded, the model checker prints its final statistics and terminates.

Log File

If you pass the option -FSAVE to the model checker, the output will be written to a fileinstead of the standard output.

Page 153: Esto es una cagada

B.5. VERFYING YOUR OWN PROGRAMS 153

Source Trail

When option -srctrl is given, the error trail is printed as actual source lines, rather thanas line numbers. StEAM currently uses external Unix-commands to print single linesfrom source files. As a drawback, the printing of the trail cannot be interrupted usingctrl-c, as this terminates the external program rather than the model checker. Thisproblem should be fixed in future versions.

No Trail

The option -nt suppresses the printing of the error trail - this is useful while doingexperiments where the actual trail is of no interest.

B.5 Verfying your own Programs

In this section, we give a small tutorial, how your own code can be verified with StEAM.

B.5.1 Motivation

The desire to verify software may in principle arise from two reasons. Either we alreadyhave some faulty implementation of a system and want to use model checking to de-tect the error; or we are just starting with our implementation and want to use modelchecking to avoid the generation of errors while our software is being developed. In thistutorial, we want to concentrate on the latter case. Using assembly model checking inthe implementation phase yields several advantages. On the one hand, it may reduceor even replace the efforts spent on verifying properties of a formal description of theprogram. Skipping the formal modeling will not only save a considerable amount ofdevelopment time, also there is no need for the developer to learn the underlying de-scription language (e.g. Promela). Furthermore, assembly model checking is able to findimplementation errors which were not present in the software’s specification.The concurrency in StEAM can also be used twofold. Either, the investigated softwareitself describes a concurrent system. In this case, we define the processes in the softwarein terms of threads in StEAM. Or, the software itself is not concurrent. In the latter case,we can use threads to simulate the environment of our system.

B.5.2 The Snack Vendor

As an example, lets us assume we want to develop the control software of snack vendormachine. As its basic functionalities, the machine should accept that coins are insertedto raise the credit and buttons are pressed to buy a certain snack. Figure B.1 shows thebare bones of the software formed by two minimal functions - insertCoin() and pressBut-ton().

When a coin is being inserted, the credit is raised by the appropriate value. When abutton is pressed, the machine first checks if there is sufficient credit to buy the snack.

Page 154: Esto es una cagada

154 APPENDIX B. STEAM USER MANUAL

If this is the case, the price of the snack is subtracted from the credit. Two counters -got and sold - keep track of how much money was received and the total value of thesnacks sold. A first property we would like to check is that the total value of sold snacksis always less or equal to the amount of money received, i.e. we want to check for theinvariant: (got>=sold). Currently, StEAM supports the detection of deadlocks and localassertions. However, we are also able to check for invariants by introducing an observerthread. The source code of this observer is shown in Figure B.2. As can be seen, athread class in StEAM must be derived from the abstract class “IVMThread” and mustimplement the methods start() run() and die(). The start()-method makes preparationsfor the generated instance and must finally call the run()-method. The call to the run()method tells StEAM to insert the new thread into the running system. The run()-methoditself must contain the actual loop of the thread. The die()-method is reserved for futureversions of StEAM and currently has no effect - it must be implemented, but currentlythe body can be left empty. The statement VASSERT(bool e) is used to define localassertions. If the program reaches the control point of the statement and the expressione evaluates to ’false’, StEAM returns the path that leads to this state and reports anerror.

Note that we do not have to put our assertion into an infinite loop. We can rely onsearching only for those error states, where the observer thread is first executed whenthe assertion does not hold. Thus, we avoid the generation of equivalent paths to thesame error state.

Furthermore, we need to simulate the environment of our system - in this case, the usersof the machine. One way to achieve this, is to introduce two derived thread classes - onethat inserts coins and one that presses buttons. The header and source files for theseare shown in Figure B.3 and B.4. Both classes use the statement RANGE(varname n,int min, int max, which causes a value between min and max to be non-deterministicllychosen as the value of the integer variable n.

Although, in its current form, the source code can already be compiled to an IVM-executable, StEAM will not be able interpret the program correctly. First, the .c and.cc files need to be annotated with additional information. Fortunately, this remainsmostly transparent to the user. The StEAM package comes with the tool mfgen. Thelatter takes as its parameter a set of source files and generates a Makefile which can beused to build the model-checkable file. In the directory where the sources for our vendormachine are located, simply type:

mfgen *

followed by

make

This will create the binary file “vendor”.

Page 155: Esto es una cagada

B.6. SPECIAL-PURPOSE STATEMENTS 155

B.5.3 Finding, interpreting and fixing errors

Now, start the model checker by typing:

steam -BFS vendor

After a (hopefully short) time, the model checker will print the error trail and report anassertion violation. Note that we have used breadth-first search (BFS) to detect the errorbut the default algorithm is depth-first (DFS). In most cases, DFS find errors faster thanBFS, but the BFS is guaranteed to return the shortest trail. However, in some cases BFSalso finds an error faster than DFS - this can happen, if the search tree is deep but theerror is located at a shallow level. You may try to run the model checker again withoutthe -BFS option, which will probably exceed the memory limit of your machine. Takesome time for interpreting the error trail before you read on ...It should be obvious, what happened: The button was pressed immediately after thecredit was increased, but before the coin value was added to got. We can fix the error bydeclaring the two variable increases as an atomic block, i.e.:

BEGINATOMIC;credit+=amount;got+=amount;ENDATOMIC;

When you recompile the program (make) - and start the checker anew, the error will nolonger occur.

B.6 Special-Purpose Statements

The special-purpose statements in StEAM are used to define properties and to guide thesearch. In general, a special-purpose statements is allowed at any place in the code,where a normal C/C++ statement would be valid3. In Section B.5 we already used someof these statements for verifying the vendor machine. Now, we summarize statements,that are currently supported by StEAM.

BEGINATOMICThis statement marks an atomic region. When such a statement is reached, the ex-

ecution of the current thread will be continued until a consecutive ENDATOMIC. ABEGINATOMIC statement within an atomic block has no effect.

ENDATOMICThis statement marks the end of an atomic region. An ENDATOMIC outside an atomic

3A following semicolon is optional.

Page 156: Esto es una cagada

156 APPENDIX B. STEAM USER MANUAL

region has no effect.

RANGE(<varname>, int min, int max)This statement defines a nondeterministic choice over a discrete range of numeric vari-able values. The parameter <varname> must denote a variable name that is valid in thecurrent scope. The parameters min and max describe the upper and lower border of thevalue range. Internally, the presence of a RANGE-statement corresponds to expandinga state to have max−min + 1 successors. A RANGE statement must not appear withinan atomic region. Also, the statement currently only works for int variables.

VASSERT(bool e)This statement defines a local property. When, during program execution, VASSERT(e)is encountered, StEAM checks if the corresponding system state satisfies expression e.If e is violated, the model checker provides the user with the trail leading to the errorand terminates.

VLOCK(void * r)A thread can request exclusive access to a resource by a VLOCK. This statement takes

as its parameter a pointer to an arbitrary base type or structure. If the resource is al-ready locked, the thread must interrupt its execution until the lock is released. If alocked resource is requested within an atomic region, the state of the executing threadis reset to the beginning of the region.

VUNLOCK(void * r)Unlocks a resource making it accessible for other threads. The executing thread must

be the holder of the lock. Otherwise this is reported as an access violation, and the errortrail is returned.

B.7 Status Quo and Future of StEAM

The model checker StEAM is an attempt to realize software model checking without ab-straction and the need to manually construct models in specialized languages, such asPromela. In the current form, the tool is already able to find errors in small programs.The large search space induced by real programs remains the main challenge of soft-ware model checking, which may inhibit the finding of an error. Therefore, one of themain issues of future development lies in finding good heuristics. As shown in previouswork [LME04], an appropriate heuristic is able to find errors fast, even if the underlyingsearch space is huge.Also we will improve the memory-efficiency of StEAM by optimizing its state represen-tation. Future versions of StEAM will also support memory-efficient techniques suchas bitstate hashing. In the long term our vision is to create a user-friendly application,which facilitates software- development and maintainance without the need for in-depthknowledge about model checking technology.

Page 157: Esto es una cagada

B.7. STATUS QUO AND FUTURE OF STEAM 157

/ ************************************************* vendor.c

** The bare bones of a snack vendor machine

** written by Tilman Mehler

************************************************ /

#include "vendor.h"#include "icvm_verify.h"#include "Inserter.h"#include "Presser.h"#include "Checker.h"

int credit=0;int coin_values[3] = 10, 50, 100;int price[3] = 50, 100, 200;int sold=0;int got=0;

void main()

BEGINATOMIC;(new Inserter())->start();(new Presser())->start();(new Checker())->start();ENDATOMIC;

void insertCoin(int amount)

if(credit+amount<=MAX_CREDIT) credit+=amount;got+=amount;

void pressButton(int n)

if(price[n]<=credit) credit-=price[n];sold+=price[n];

Figure B.1: Source code of the snack vendor software

Page 158: Esto es una cagada

158 APPENDIX B. STEAM USER MANUAL

/ **************************************** Checker.h

** Declarations for the Checker-class

************************************** /

#ifndef CHECKER_H#define CHECKER_H#include "IVMThread.h"

class Checker:public IVMThread

public:Checker();virtual void start();virtual void run();virtual void die();

;#endif

/ **************************************** Checker.cc

** Definitions fot the Checker-class,

* which makes an invariant-check

************************************** /

#include "Checker.h"#include "icvm_verify.h"

extern int credit;extern int got;extern int sold;

Checker::Checker()

void Checker::start() run();

void Checker::run()

VASSERT(got>=sold);void Checker::die()

Figure B.2: Source code of our observer thread

Page 159: Esto es una cagada

B.7. STATUS QUO AND FUTURE OF STEAM 159

/ *************************************** Inserter.h

** Declaratiins fot the coin-inserter

* Thread-class

************************************** /

#ifndef INSERTER_H#define INSERTER_H#include "IVMThread.h"

class Inserter:public IVMThread

public:Inserter();virtual void start();virtual void run();virtual void die();

;#endif

/ *************************************** Inserter.cc

** Definitions fot the coin-inserter

* Thread-class

************************************** /

#include "vendor.h"#include "Inserter.h"#include "icvm_verify.h"

extern int coin_values[3];

Inserter::Inserter()

void Inserter::start() run();

void Inserter::run()

int i=0;while(1)

RANGE(i, 0, 3);insertCoin(coin_values[i]);

void Inserter::die()

Figure B.3: Source code of our observer thread

Page 160: Esto es una cagada

160 APPENDIX B. STEAM USER MANUAL

/ *************************************** Presser.h

** Declaratins fot the button-presser

* Thread-class

************************************** /

#ifndef PRESSER_H#define PRESSER_H#include "IVMThread.h"

class Presser:public IVMThread

public:Presser();virtual void start();virtual void run();virtual void die();

;#endif

/ *************************************** Presser.cc

** Definitions for the coin-inserter

* Thread-class

************************************** /

#include "vendor.h"#include "Presser.h"#include "icvm_verify.h"

Presser::Presser()

void Presser::start() run();

void Presser::run() int i=0;while(1)

RANGE(i, 0, 3);pressButton(i);

void Presser::die()

Figure B.4: Source code of our observer thread

Page 161: Esto es una cagada

Appendix C

Paper Contributions

In the following, I specify my contributions to the papers published in the course of myresearch.

Bytecode Distance Heuristics and Trail Direction for Model Checking JavaPrograms [EM03]. This paper is based on my master’s thesis which was supervisedby Stefan Edelkamp.

Introduction to StEAM - An Assembly-Level Software Model Checker [ML03]This is the first paper that gives a technical description of the software model checkerStEAM. It was my idea to enhance the virtual machine IVM to a model checker. More-over, I had devised and implemented:

• The enhancement of the virtual machine with multi-threading.

• The state description.

• The incremental storing of states.

• The two supported search algorithms (depth-first and breadth-first).

• The testing for deadlocks and assertion violations.

• The concept of ”Command Patterns”.

• The hashing.

• All models for the experiments.

Peter Leven helped in the initial installation of the IVM-package. Moreover, he imple-mented the lock- and memory-pool based on an existing AVL-tree package.

I also conducted and interpreted the experiments and wrote the actual paper. Mr. Levenprovided useful comments.

161

Page 162: Esto es una cagada

162 APPENDIX C. PAPER CONTRIBUTIONS

Directed Error Detection in C++ with the Assembly-Level Model Checker StEAM[LME04] In this paper, I enhanced StEAM with the heuristic search algorithms best-first and A*. Moreover, I implemented the heuristics: mb, int, rw, lnb, aa, where rw, lnb,aa are new heuristics which I devised.

Peter Leven devised and implemented the heuristics pba,pbb,pl1,pl2. Also, he proposedthe lgc-Compaction, which I implemented.

The new experiments were mainly conducted by me. Mr. Leven helped with the inter-pretation and contributed some scripts. The idea to illustrate the experimental resultsas the geometric mean of four factors was proposed by Mr. Leven, while I wrote theprograms which generated the GNU-Plot diagrams from the raw experimental data.

Stefan Edelkamp invested considerable effort in making the paper sound and more read-able.

I gave the presentation on the conference.

Planning in Concurrent Multiagent Systems with the Assembly-Level ModelChecker StEAM [ME04] Stefan Edelkamp had the initial idea to make StEAM ap-plicable to multi-agent systems.

The content of the paper, including the MAMP-formalism and the execution and inter-pretation of the experiments is mainly my contribution. Again, Stefan Edelkamp helpedimproving the soundness and readability of the paper.

I presented the paper at the conference.

Incremental Hashing in State Space Search [EM04] I motivated the research onincremental hashing as an important factor of StEAM’s further development. Throughhis expertise, Stefan Edelkamp could contribute the bigger part of the formal framework.I was again responsible for the execution and interpretation of the experiments. For this,I enhanced the Atomix-solver Atomixer with an incremental hash function.

I gave the presentation for the paper.

Directed C++ Program Verification [Meh05] This paper gives an overview of myresearch. I am the only author.

Incremental Hashing for Pattern Databases [EM05a] Here, the concept of incre-mental hashing was transferred to the domain of action planning. Stefan Edelkampcontributed the major part of the formal framework. I enhanced the planner mips withan incremental hash function and executed and evaluated the experiments.

I gave the presentation for the paper.

Knowledge Acquisition and Knowledge Engineering in the ModPlan Workbench[EM05b] This paper describes the integrated planning environment ModPlan, whichwas developed in the project group 463 for which I was the co-supervisor. I was mainlyresponsible to help the students on practical problems (mostly in programming). I also

Page 163: Esto es una cagada

163

contributed comments for the paper. Moreover, I participated in the presentation of thetool at the Knowledge Engineering Competition at ICAPS05.

Tutorial on Directed Model Checking [EMJ05] These are the proceedings of atutorial at ICAPS05, which was organized by Stefan Edelkamp. Shahid Jabbar andmyself contributed additional material and case studies. Moreover, I prepared and gavethe talk about incremental hashing and state compaction, which constitutes about 1/5of the overall tutorial.

Dynamic Incremental Hashing and State Reconstruction in Program ModelChecking [ME05b] In this paper, incremental hashing was for the first time appliedfor programs model checking. I enhanced the formal framework of the previous paperswith the applicability on dynamic and structured state vectors, which makes out thebigger part of the paper. Moreover, I implemented incremental hashing for the modelchecker StEAM. As its second contribution, the paper introduces state reconstruction,which was Stefan Edelkamp’s idea in the first place. The elaboration of the concept isapproximately 50% my work and 50% the work of Stefan. Again, I implemented theapproach into StEAM and executed and evaluated the experiments.

The paper was meant to record our research on two novel approaches and was hencepublished as a technical report at the University of Dortmund.

Dynamic Incremental Hashing in Program Model Checking [ME05a] This re-fines the incremental hashing approach from [ME05b]. The formal framework and theoverall quality of the paper was thoroughly improved by both authors. Moreover, I en-hanced the introduction for a better motivation of the approach.

Page 164: Esto es una cagada

164 APPENDIX C. PAPER CONTRIBUTIONS

Page 165: Esto es una cagada

Bibliography

[AM02] Y. Abdeddaim and O. Maler. Preemptive job-shop scheduling using stop-watch automata. In Workshop on Planning via Model Checking, pages7–13, 2002.

[AQR+04a] T. Andrews, S. Qadeer, S. K. Rajamani, J. Rehof, and Y. Xie. Exploitingprogram structure for model checking concurrent software. In Interna-tional Conference on Concurrency Theory (CONCUR), 2004.

[AQR+04b] T. Andrews, S. Qadeer, S. K. Rajamani, J. Rehof, and Y. Xie. Zing: A modelchecker for concurrent software. Technical report, Microsoft Research,2004.

[AVL62] G. M. Adelson-Velskii and Evgenii M. Landis. An algorithm for the or-ganization of information. Doklady Akademii Nauk SSSR, 146:263–266,1962.

[AY98] R. Alur and M. Yannakakis. Model checking of hierarchical state ma-chines. In International Symposium on Foundations of Software Engi-neering (FSE), pages 175–188, 1998.

[BC01] M. Benerecetti and A. Cimatti. Symbolic model checking for multi-agentsystems. In Workshop on Computational Logic in Multi-Agent Systems(CLIMA), pages 312–323, 2001.

[BCCZ99] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model checkingwithout BDDs. In Tools and Algorithms for the Construction and Analysisof Systems (TACAS), pages 193–207, 1999.

[BCM+90] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang.Symbolic model checking: 1020 states and beyond. In Symposium on Logicin Computer Science (LICS), 1990.

[BCT03] P. Bertoli, A. Cimatti, and P. Traverso. Interleaving execution and plan-ning via symbolic model checking. In Workshop on Planning under Un-certainty and Incomplete Information, pages 1–7, 2003.

[Ber04] Berkley Center of Electronic System Design. BLAST User’s Manual,2004.

165

Page 166: Esto es una cagada

166 BIBLIOGRAPHY

[BK00] F. Bacchus and F. Kabanza. Using temporal logics to express search con-trol knowledge for planning. Artificial Intelligence, 116:123–191, 2000.

[BR02] T. Ball and S. K. Rajamani. The SLAM Project: Debugging System Soft-ware via Static Analysis. In Symposium on Principles of programminglanguages, pages 1–3, 2002.

[Bre03] M. Brenner. Multiagent planning with partially ordered temporal plans.In International Joint Conference on Artificial Intelligence (IJCAI), pages1513–1514, 2003.

[CCGR99] A. Cimatti, E.M. Clarke, F. Giunchiglia, and M. Roveri. NUSMV: a newSymbolic Model Verifier. In Conference on Computer-Aided Verification(CAV), pages 495–499, 1999.

[CEMCG+02] A. Cimatti, E. Giunchiglia E. M. Clarke, F. Giunchiglia, M. Pistore,M. Roveri, R. Sebastiani, and A. Tacchella. NUSMV 2: An opensourcetool for symbolic model checking. In Conference on Computer-Aided Veri-fication (CAV), pages 27–31, 2002.

[CES86] E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verificationof finite-state concurrent systems using temporal logic specifications. InTransactions on Programming Languages and Systems, volume 8, pages244 – 263, 1986.

[CGP99] E. M. Clarke, O. Grumberg, and D. A. Peled. Model checking. MIT Press,1999.

[Coh97] J. D. Cohen. Recursive hashing functions for n-grams. ACM Transactionson Information Systems, 15(3):291–320, 1997.

[CS94] J. C. Culberson and J. Schaeffer. Efficiently searching the 15-puzzle. Tech-nical report, University of Alberta, 1994. TR94-08.

[CS96] J. C. Culberson and J. Schaeffer. Searching with Pattern Databases. InCanadian Conference on AI, pages 402–416, 1996.

[CS98] J. C. Culberson and J. Schaeffer. Pattern databases. Computational In-telligence, 14(4):318–334, 1998.

[DBL02] H. Dierks, G. Behrmann, and K. G. Lahrsen. Solving planning prob-lems using real-time model checking. In Workshop on Planning via ModelChecking, pages 30–39, 2002.

[DIS99] C. Demartini, R. Iosif, and R. Sisto. dSPIN: A dynamic extension of SPIN.In Model Checking Software (SPIN), pages 261–276, 1999.

[dWTW01] M. M. de Weerdt, J. F. M. Tonino, and Cees Witteveen. Cooperativeheuristic multi-agent planning. In Belgium-Netherlands Artificial Intel-ligence Conference (BNAIC), pages 275–282, 2001.

Page 167: Esto es una cagada

BIBLIOGRAPHY 167

[Ede01] S. Edelkamp. Planning with pattern databases. In European Conferenceon Planning (ECP), pages 13–24, 2001.

[Ede03] S. Edelkamp. Promela planning. In Workshop on Model Checking Soft-ware (SPIN), pages 197–212, 2003.

[EJS04] S. Edelkamp, S. Jabbar, and S. Schrödl. External A*. In German Confer-ence on Artificial Intelligence (KI), pages 226–240, 2004.

[ELL01] S. Edelkamp and A. Lluch-Lafuente. HSF-Spin User Manual, 2001.

[ELL04] S. Edelkamp and A. Lluch-Lafuente. Abstraction in directed model check-ing. In Workshop on Connecting Planning Theory with Practice, pages7–13, 2004.

[ELLL01] S. Edelkamp, A. Lluch-Lafuente, and S. Leue. Directed model-checkingin HSF-SPIN. In Workshop on Model Checking Software (SPIN), pages57–79, 2001.

[ELLL04] S. Edelkamp, S. Leue, and A. Lluch-Lafuente. Directed explicit-statemodel checking in the validation of communication protocols. Interna-tional Journal on Software Tools for Technology, 5(2-3):247–267, 2004.

[EM03] S. Edelkamp and T. Mehler. Byte code distance heuristics and trail direc-tion for model checking Java programs. In Workshop on Model Checkingand Artificial Intelligence (MoChArt), pages 69–76, 2003.

[EM04] S. Edelkamp and T. Mehler. Incremental hashing in state space search.In Workshop on New Results in Planning, Scheduling and Design (PUK),pages 15–29, 2004.

[EM05a] S. Edelkamp and T. Mehler. Incremental hashing for pattern databases.In Poster Proceedings of International Conference on Automated Planningand Scheduling (ICAPS), pages 17–20, 2005.

[EM05b] S. Edelkamp and T. Mehler. Knowledge acquisition and knowledge en-gineering in the ModPlan workbench. In International Competition onKnowledge Engineering for Planning and Scheduling, pages 26–33, 2005.

[EMJ05] S. Edelkamp, T. Mehler, and S. Jabbar, editors. ICAPS Tutorial on Di-rected Model Checking, 2005.

[EPP05] S. Evangelista and J.-F. Pradat-Peyre. Memory efficient state space stor-age in explicit software model checking. In International Workshop onModel Checking Software (SPIN), pages 43–57, 2005.

[FGK97] L. Fredlund, J. F. Groote, and H. Korver. Formal Verification of a LeaderElection Protocol in Process Algebra. Theoretical Computer Science,177(2):459–486, 1997.

[God97] P. Godefroid. Model checking for programming languages using VeriSoft.In ACM Symposium on Principles of Programming Languages, pages174–186, 1997.

Page 168: Esto es una cagada

168 BIBLIOGRAPHY

[GS97] S. Graf and H. Saidi. Construction of abstract state graphs of infinitesystems with PVS. In Conference on Computer-Aided Verification (CAV),pages 72–83, 1997.

[GV02] A. Groce and W. Visser. Model Checking Java Programs using StructuralHeuristics. In International Symposium on Software Testing and Analy-sis (ISSTA), pages 12–21, 2002.

[GV03] A. Groce and W. Visser. What went wrong: Explaining counter examples.In Workshop on Model Checking of Software (SPIN), pages 121–135, 2003.

[Hav99] K. Havelund. Java PathFinder user guide. Technical report, NASA AmesResearch Center, 1999.

[HBG05] P. Haslum, B. Bonet, and H. Geffner. New admissible heuristics fordomain-independent planning. In National Conference on Artificial In-telligence (AAAI), pages 1163–1168, 2005.

[HEFN01] F. Hüffner, S. Edelkamp, H. Fernau, and R. Niedermeier. Finding optimalsolutions to atomix. In German Conference on Artificial Intelligence (KI),pages 229–243, 2001.

[Hel04] M. Helmert. A planning heuristic based on causal graph analysis. In In-ternational Conference on Automated Planning and Scheduling (ICAPS),pages 161–170, 2004.

[HJMS03] A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Software Verificationwith BLAST. In Workshop on Model Checking Software (SPIN), pages235–239, 2003.

[HNR68] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for heuristicdetermination of minimum path cost. IEEE Transactions on Systems Sci-ence and Cybernetics, 4:100–107, 1968.

[Hol91] G. J. Holzmann. Design and Validation of Computer Protocols. Prentice-Hall, 1991.

[Hol96] G. J. Holzmann. An analysis of bistate hashing. In International Sympo-sium on Protocol Specification, Testing and Verification, pages 301–314.Chapman & Hall, Ltd., 1996.

[Hol97a] G. J. Holzmann. Designing bug-free protocols with SPIN. Computer Com-munications Journal, 20(2):97–105, 1997.

[Hol97b] G. J. Holzmann. The model checker SPIN. Software Engineering,23(5):279–295, 1997.

[Hol97c] G. J. Holzmann. State compression in SPIN. In Third Spin Workshop,Twente University, The Netherlands, 1997.

[Hol98] G. J. Holzmann. An analysis of bitstate hashing. Formal Methods inSystem Design, 13(3):287–305, 1998.

Page 169: Esto es una cagada

BIBLIOGRAPHY 169

[Hol03] G. J. Holzmann. The Spin Model Checker, Primer and Reference Manual.Addison-Wesley, 2003.

[HP00] K. Havelund and T. Pressburger. Model checking Java programs usingJava PathFinder. International Journal on Software Tools for TechnologyTransfer, 2(4):366–381, 2000.

[HS01] M. Holzer and S. Schwoon. Assembling molecules in Atomix is hard.Technical Report 101, Institut für Informatik, Technische UniversitätMünchen, 2001.

[HSAHB99] J. Hoey, R. St-Aubin, A. Hu, and C. Boutilier. SPUDD: Stochastic plan-ning using decision diagrams. In Conference on Uncertainty in ArticialIntelligence (UAI), pages 279–288, 1999.

[HT99] J. Hatcliff and O. Tkachuck. The Bandera Tools for Model-checking JavaSource Code: A User’s Manual. Technical report, Kansas State Univer-sity, 1999.

[HZ01] E. A. Hansen and S. Zilberstein. LAO * : A heuristic search algorithmthat finds solutions with loops. Artificial Intelligence, 129(1-2):35–62,2001.

[IDS98] R. Iosif, C. Demartini, and R. Sisto. Modeling and validation of Java mul-tithreading applications using SPIN. In Workshop on automata theoreticverification with SPIN, pages 5–19, 1998.

[Ios01] R. Iosif. Exploiting heap symmetries in explicit-state model checking ofsoftware. In International Conference on Automated Software Engineer-ing (ICSE), pages 26–29, 2001.

[JE05] S. Jabbar and S. Edelkamp. I/O efficient directed model checking. InVerification, Model Checking and Abstract Interpretation (VMCAI), pages313–329, 2005.

[KDH00] J. Kvarnström, P. Doherty, and P. Haslum. Extending TALplanner withconcurrency and ressources. In European Conference on Artificial Intelli-gence (ECAI), pages 501–505, 2000.

[Knu98] D. Knuth. The Art of Computer Progarmming - Vol.3: Sorting and Search-ing (2nd edition). Addison-Wesley, 1998.

[Kor85] R. E. Korf. Depth-first iterative-deepening: An optimal admissible treesearch. Artificial Intelligence, 27(1):97–109, 1985.

[Kor03] R. E. Korf. Breadth-first frontier search with delayed duplicate detection.In Workshop on Model Checking and Artificial Intelligence (MoChArt),pages 87–92, 2003.

[KR87] R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algo-rithms. IBM Journal of Research and Development, 31(2):249–260, 1987.

Page 170: Esto es una cagada

170 BIBLIOGRAPHY

[KS05] R. E. Korf and P. Schultze. Large-scale parallel breadth-first search. InNational Conference on Artificial Intelligence (AAAI), pages 1380–1386,2005.

[KZ00] R. E. Korf and W. Zhang. Divide-and-conquer frontier search applied tooptimal sequence allignment. In National Conference on Artificial Intel-ligence (AAAI), pages 910–916, 2000.

[Lan96] H. Landman. Games of No Chance, chapter Eyespace values in Go, pages227–257. Cambridge University Press, 1996.

[Leh49] H. D. Lehmer. Mathematical methods in large-scale computing units. InSymposium on Large-Scale Digital Calculating Machienery, pages 141–146. Cambridge, Massachusetts, Harvard University Press, 1949.

[LL03] A. Lluch-Lafuente. Symmetry reduction and heuristic search for error de-tection in model checking. In Workshop on Model Checking and ArtificialIntelligence (MoChart), pages 77–86, 2003.

[LME04] P. Leven, T. Mehler, and S. Edelkamp. Directed error detection in C++with the assembly-level model checker StEAM. In Workshop on ModelChecking Software (SPIN), pages 39–56, 2004.

[LMS02] F. Laroussinie, N. Markey, and Ph. Schnoebelen. On model checking du-rational kripke structures. In International Conference on Foundations ofSoftware Science and Computation Structures, pages 264–279, 2002.

[LV01] F. Lerda and W. Visser. Addressing dynamic issues of program modelchecking. In Workshop on Model Checking Software (SPIN), pages 80–94,2001.

[Mar91] T. A. Marsland. Encyclopedia of Artificial Intelligence, chapter ComputerChess and Search, pages 224–241. J. Wiley & Sons, 1991.

[McM92] K. L. McMillan. The SMV system. Technical report, Carnegie-MellonUniversity, 1992.

[MD05] M. Madanlal and D. L. Dill. An incremental heap canonicalization algo-rithm. In Workshop on Model Checking Software (SPIN), pages 28–42,2005.

[ME04] T. Mehler and S. Edelkamp. Planning in concurrent multiagent systemswith the assembly model checker StEAM. In Poster Procedings of GermanConference on Artificial Intelligence (KI), pages 16–30, 2004.

[ME05a] T. Mehler and S. Edelkamp. Dynamic incremental hashing in pro-gram model checking. Electronic Notes on Theoretical Computer Science(ENTCS), 149(2):51–69, 2005.

[ME05b] T. Mehler and S. Edelkamp. Dynamic Incremental Hashing and StateReconstruction in Program Model Checking. Technical report, Universityof Dortmund, 2005.

Page 171: Esto es una cagada

BIBLIOGRAPHY 171

[Meh02] T. Mehler. Gerichtete Java-Programmvalidation. Master’s thesis, Insti-tute for Computer Science, University of Freiburg, 2002.

[Meh05] T. Mehler. Directed C++ program verification. In International Conferenceon Automated Planning and Scheduling (ICAPS) - Doctoral Consortium,pages 58–61, 2005.

[MJ05] E. Mercer and M. Jones. Model checking machine code with the GNUdebugger. In Workshop on Model Checking Software (SPIN), pages 251–265, 2005.

[ML03] T. Mehler and P. Leven. Introduction to StEAM - an assembly-level soft-ware model checker. Technical report, University of Freiburg, 2003.

[MPC+02] M. Musuvathi, D. Park, A. Chou, D. Engler, and D. Dill. CMC: a prag-matic approach to model checking real code. In Symposium on OperatingSystems Design and Implementation (OSDI), 2002.

[Pun77] A. Puneli. The temporal logic of programs. In Symposium on foundationsof computer science (FOCS), pages 46–57, 1977.

[PvV+02] M. Pechoucek, A. Ríha, J. Vokrínek, V. Marík, and V. Pražma. Explantech:applying multi-agent systems in production planning. International Jour-nal of Production Research, 40(15):3681–3692, 2002.

[QN04] K. Qian and A. Nymeyer. Guided invariant model checking based on ab-straction and symbolic pattern databases. In Symposium on TheoreticalAspects of Computer Science (TACAS), pages 497–511, 2004.

[RM94] A. Reinefeld and T. A. Marsland. Enhanced iterative-deepeningsearch. IEEE Transactions on Pattern Analysis and Machine Intelligence,16(7):701–710, 1994.

[RN95] S. Russel and P. Norvig. Artificial Intelligence, a Modern Approach. Pren-tice Hall, 1995.

[Sch98] U. Schoening. Algorithmen - kurzgefasst, Kapitel: Algebraische undZahlentheoretische Algorithmen. Spektrum Verlag, 1998.

[SD96] U. Stern and D. L. Dill. Combining state space caching and hash com-paction. In Methoden des Entwurfs und der Verifikation digitaler Sys-teme, 4. GI/ITG/GME Workshop, pages 81–90, 1996.

[Spi92] J. M. Spivey. The Z Notation: A Reference Manual. Prentice Hall, 1992.

[TK93] L. A. Taylor and R. E. Korf. Pruning duplicate nodes in depth-first search.In National Conference on Artificial Intelligence (AAAI), pages 756–761,1993.

[VHBP00a] W. Visser, K. Havelund, G. Brat, and S. Park. Java PathFinder - secondgeneration of a Java model checker. In Workshop on Advances in Verifi-cation, (WAVe), pages 28–34, 2000.

Page 172: Esto es una cagada

172 BIBLIOGRAPHY

[VHBP00b] W. Visser, K. Havelund, G. Brat, and S. Park. Model Checking Programs.In International Conference on Automated Software Engineering (ICSE),pages 3–12, 2000.

[Zob70] A. L. Zobrist. A new hashing method with application for game play-ing. Technical report, University of Wisconsin, Computer Science De-partment, March 1970.

Page 173: Esto es una cagada

Index

binary decision diagram, see BDD

A*, 43in software model checking, 44optimality, 44

abstract modelsdrawbacks, 18loss of detail, 31of software, 31

abstraction, 94data, 94predicate, 94

admissibility, 44for software model checking, 62

agent, 73distributed systems, 73experiments, 128in StEAM, 74MAMP, 77planning, 73

approximate hashing, 87Ariane-5, 15assertion, 24, 156atomic regions, see command patterns, 93,

155Atomix, 96AVL-tree, 102

in StEAM, 143

Bandera, 32BDD, 23, 72, 107BEGINATOMIC, 58, 59, 155best-first search, 43

in software model checking, 43BFS, 41

optimality, 41bitstate-hashing, 87, 107, 137BLAST, 34breadth-first search, see BFS

cash dispenser, 116cashIT, 116closed-list, 39CMC, 33code inspection, 27collapse compression, 138command patterns, 52, 53, 144

atomic regions, 53capturing the main thread, 53line information, 53locks, 53non-determinism, 54

compilerof IVM, 48

conclusion, 131–139concurrency, 23consistency, 44

for software model checking, 62CPU registers, 48cross product, 24CTL, 24cvm.h, 143

deadlock, 26, 60debugger, 31, 61, 152depth-first iterative deepening, see DFIDdepth-first search, see DFSdeterministic finite automaton, see DFADFA, 21DFID, 42

in software model checking, 43DFS, 42

in software model checking, 42dining philosophers, 114, 148directed graph, 39dSPIN, 32dynamic arrays, 85dynamic memory, 50

elevator example, 23

173

Page 174: Esto es una cagada

174 INDEX

ELF format, 48, 48memory image, 48

ENDATOMIC, 155error trail, 17

in the elevator example, 26on the source level, 31optimal, 41over abstract models, 31quality, 68shortening, 136suppresing in StEAM, 153

Estes, 35estimator function, see heuristicEuclidean distance, 43expansion, see node expansionexperiments, 113–128

directed search, 117on hashing, 125on multi-agent planning, 128on state reconstruction, 124used programs, 113–117

ExPlanTech, 73exploration, see searchextcmdlines.cpp, 144external search, 137

file information, see command patternsformal semantics, 31, 36

handling of, 36frame pointer, 48FSM-pruning, 108

gcc, 48gdb, 35, 61general state expansion algorithm, 40Glob, 55–57graph formalisms, 21–22

hash compaction, 88, 107, 137hash.c, 145hashing, 55, 83

approximate, 88balanced tree method, 102bit-state, 137bitstate, 87collisions, 125compaction, 88, 137distribution, 84, 94, 125

dynamic arrays, 85dynamic state vectors, 99–102experiments, 125explicit, 85full, 125hash functions, 84ignoring duplicates, 83in Atomix, 96in propositional planning, 98–99in StEAM, 104in the n2 − 1-puzzle, 95incremental, 83, 89, 92

runtime of, 93, 94static examples, 95–99

linear method, 102on static state vectors, 92partial, 89, 125probing, 85Rabin-Karp, 90recursive, 92remainder method, 84special case in software model check-

ing, 83structured state vectors, 102–106successor chaining, 85

heap canonicalization, 105heuristic, 43

admissible, 44consistent, 44Euclidean distance, 43Manhattan distance, 96supported by StEAM, 62trail-based, 136

heuristicsalternating access, 64branch-coverage, 64error-specific, 63formular-based, 63interleaving, 64read-write, 64structural, 64threads alive, 65

HSF-Spin, 30

icvm_verify.h, c144icvmsup.c, 142icvmvm.c, 141

Page 175: Esto es una cagada

INDEX 175

IDA*, 44in software model checking, 45

illegal memory access, 36, 60, 152incremental hashing, 89interleave algorithm, 76interleaving planning and execution, 72, 75Internet C Virtual Machine, see IVMiterative deepening A*, see IDA*IVM, 47–48

enhancements, 49–55source code, 141state of, 48

IVMThread, 74IVMThread.cc, 143

Java Bytecode, 35Java PathFinder, see JPFjob-shop scheduling, 78JPF, 35

first version, 35second version, 35

key-state, 111Kripke Structure, 22

leader election, 115Lehmer Generator, 93lgc, 118line information, see command patternsliveness-property, 26lnb, 63lock, 156lock and block, see lnblock and global compaction, 118lock-pool, 51, 144locks, 49, 53, 59LTL, 24

makefile-generator, see mfgenmakespan, 68, 78MAMP, 77Manhattan distance, 96Mars Climate Orbiter, 16Mars Pathfinder, 35memory-pool, 51mfgen, 154mfgen.c, 143mini-state, 108

model, 22model checking, 21, 23

advantages over testing, 17classical, 21differences to planning, 68explicit, see symbolicfor planning, 69, 72need for, 26of software, 30–37

challenges, 133status quo, 131the classical approach, 31the modern approach, 31tools, 32–36

symbolic, 23unabstracted, 18

advantage of, 31drawback of, 32

most-blocked, 63multi-agent systems, see agentmulti-threading, 47

in StEAM, 49, 54observer-threads, 115see also concurrency, 23

n2 − 1-puzzle, 69node expansion, 39non-determinism, 54, 59, 156nuSMV, 30

observer-threads, 115open-list, 39optical telegraph, 114optimality

of BFS, 41

parallel steps, 75partial search, 89partial-order reduction, 107path costs, 39path quantor, 24pattern database, see PDBPDB, 94, 135physical memory, 48planning, 67–72

differences to model checking, 68on the assembly level, 74propositional, 67

Page 176: Esto es una cagada

176 INDEX

temporal, 68via model checking, 69, 72

polynomial ring, 92pool.c, 144printf-debugging, 61probing, 85program, 22program counter, 48program memory, 48Promela, 29, 35protocol, 22

trivial, 24

Rabin-Karp, 90RANGE, 59, 156

safety-property, 25scalability, 114search, 39

directed, 43experiments, 117

external, 137frontier search, 108general state expansion algorithm, 40geometric, 39, 43in StEAM, 54memory limited, 107undirected, 40

search tree, 57sequential hashing, 88simulation, 74, 152SLAM, 34sliding-tile puzzle, 69SMV, 30snack vendor, 153software engineering process, 16source annotation, 144SPIN, 29

version 4.0, 33stack, 52stack pointer, 48state

key-state, 111memorization, 138mini-state, 108of StEAM, 49size, 133

state enumeration, 17state explosion problem, 24, 74, 133state reconstruction, 107–112

algorithm, 111experiments, 124

StEAM, 47applicability, 59, 134features, 57installation, 147–148memory management, 144, 152options, 150–153special-purpose statements, 155state description, 49

steam.c, 145stored components, 51successor chaining, 85Suggestion, 75Supertrace, 87symmetry reduction, 107system, 22

temporal logics, 24testing

drawbacks, 17limits of, 27

Therac-25 accident, 15TL-Plan, 72

unlock, see locksUPPAAL, 73

VASSERT, 59, 156VeriSoft, 33virtual machine, 18, 31

extending an existing vm, 36IVM, 47ivm, see IVMJVM, 35

VLOCK, 59, 156VUNLOCK, 59, 156

Warsaw accident, 15waterfall model, 16

Z (specification language), 16Zing, 33Zobrist keys, 105