design & veriﬁcation of unipro protocols for mobile …tcs/mt/galataki.pdf · design &...

Design & Verification of UniPro Protocolsfor Mobile Phones

Despo Galataki

MSc Parallel & Distributed Computer Systems (PDCS)

Vrije Universiteit (VU), Amsterdam/The Netherlands

Supervisors: Andrei Radulescu, ST-Ericsson

Dr. Kees Verstoep, VU

Wan Fokkink, VU

July 2009

Contents

1 Introduction 11.1 Thesis’ Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis’ Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis’ Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 UniPro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Sliding Window Protocol for data transmission 103.1 Introduction & Related Work . . . . . . . . . . . . . . . . . . . . . . . . 103.2 UniPro Sliding Window protocol . . . . . . . . . . . . . . . . . . . . . . 143.3 Modeling in Promela . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Connection Management Protocol 254.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 UniPro Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4 Solution to the Connection Management protocol . . . . . . . . . . . . . 28

4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4.3 Examples and Informal Proofs . . . . . . . . . . . . . . . . . . . . 314.4.4 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Modeling in Promela . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.6 Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.6.1 Correctness Properties & their Implementation . . . . . . . . . . 414.6.2 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Router Management Protocol 445.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Router Congestion Control Protocol . . . . . . . . . . . . . . . . . . . . . 455.3 Router Congestion Control protocol Optimization . . . . . . . . . . . . . 485.4 Modeling in Promela . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.5 Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.5.1 Correctness Properties & their Implementation . . . . . . . . . . 605.5.2 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6 Run Time 646.1 USW protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.1 Spin simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.1.2 DiVinE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.2 Connection Management protocol . . . . . . . . . . . . . . . . . . . . . . 706.2.1 Spin simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.2.2 DiVinE simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.3 Router Management protocol . . . . . . . . . . . . . . . . . . . . . . . . 726.3.1 DiVinE simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 73

i

7 Conclusions & Future Work 75

A Codes 77A.1 Script created to run on DAS . . . . . . . . . . . . . . . . . . . . . . . . 77

B Full Verification Outputs 77B.1 USW protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

B.1.1 1st never claim . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.1.2 2nd never claim . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78B.1.3 1st never claim of DiVinE - reachability algorithm - on 2 nodes . . 79B.1.4 1st never claim of DiVinE - owcty algorithm - on 2 nodes . . . . . 79

B.2 Connection Management protocol . . . . . . . . . . . . . . . . . . . . . . 80B.2.1 Channel capacity of 1 message in Spin . . . . . . . . . . . . . . . 80B.2.2 Rendezvous communication in Spin . . . . . . . . . . . . . . . . . 81B.2.3 Channel capacity of 2 messages in DiVinE - on 32 nodes . . . . . 82

B.3 Router Management protocol . . . . . . . . . . . . . . . . . . . . . . . . 82B.3.1 Channel capacity of 1 message running in Spin . . . . . . . . . . . 82B.3.2 Channel capacity of 1 message running in DiVinE- reachability

algorithm - on 32 nodes . . . . . . . . . . . . . . . . . . . . . . . 83B.3.3 Channel capacity of 1 message running in DiVinE- owcty algo-

rithm - on 32 nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 83

C Further Elaboration on Run Time section 83C.1 Running 2nd never claim using DiVinE . . . . . . . . . . . . . . . . . . . 83

ii

Acknowledgments

This thesis was written during my six-month internship in ST-Ericsson, (at High TechCampus, Eindhoven). Firstly, I would like to thank my supervisor at ST-Ericsson, AdreiRadulescu, for his active interaction, discussion and great cooperation. In addition, I amthankful to all my colleagues who have created a friendly working environment whichI will definitely miss. I thank Kees Verstoep and Wan Fokkink from Vrije Universiteit,for their ideas, help with DiVinE and corrections throughout the document. Finally, Iwould like to thank several friends like Roxana Ionutiu for her care during difficult timesand comments about the document, and Stilianos Louca for his comments.

This thesis is dedicated to my family, friends and colleagues whose support was pre-cious throughout this research.

Many Thanks Indeed

iii

Abstract

UniPro, as an abstract standardized interface, interconnects the variety ofdevices within a mobile phone and hides the complexity of a heterogeneous en-vironment. UniPro is a combination of chip-to-chip connectivity and TCP-likeprotocols. Thus, new protocols are defined and verified!

This thesis helps in the design and verification of two basic UniPro protocols.The first one, the UniPro Sliding Window (USW) protocol, is implemented forflow control and reliability of UniPro, in the data link layer. The second protocolis built on the transport layer and is responsible for setting up and closing a con-nection between two nodes. Furthermore, we extend the connection managementprotocol for congestion control. So, link reservations are required before a newconnection is set up. The connection bandwidth is reserved through the routers.

In this Master’s thesis, we implement 3 different models for our protocols in or-der to check their correctness properties. The models are described in the Promelaspecification language, which can be input to both model checkers: Spin (used ona single machine) and DiVinE (which distributes the work load to multiple nodes).All models are fully verified in Spin or DiVinE.

Keywords: UniPro; Sliding Window; Protocol Verification; Spin; DiVinE

iv

1 Introduction

It has been more than 3 decades (1973), since the first mobile phone embodiment.In the early to mid 1980s, when the first automatic cellular network was intro-duced, no human being could imagine that after some years everybody would beable to obtain a private mobile phone or even need one. Throughout its evolu-tion, the mobile phone was always a rapid technological development in terms ofbusiness profitability. Needless to say, the mobile phone holder population has in-creased enormously. New users and old ones who update their telephone, resultedto a sale-peak of 1.15 billion pieces in 2007.

Over the last few years mobile phone technologies (e.g., to a respect to pro-cessing power of microchips, battery weight and capacity) have been improvednotably. Thus, the software teams could build more complicated and variety ofapplications for mobile devices. Apart from being used for anything else (e.g.,cameras, media players, internet connectivity), mobile phones are used to sendand receive calls. Over the last decades, many companies have been involvedin the development of mobile telephony, which thus created a competitive andchallenging environment for even further improvement.

The diversity and complexity of the development of mobile phones created theneed for a single standardization, which is addressed by Mobile Industry Proces-sor Interface (MIPI Aliance)[2]. MIPI is an alliance which consists of most mobileindustry companies. MIPI defines the interface standards for mobile phones fea-tures, like camera, display and audio. Thus, the interface diversity across vendorsis decreased. They all aim to a single endeavor: creating a unified environmentthat connects all different interfaces and combines the hardware with software.

In particular, there is need for a general protocol that is responsible for thecommunication among the wide diversity of applications and devices like the cam-era or the display. One of the main groups of the MIPI Alliance is for UnifiedProtocol (UniPro) [10]. UniPro is a serial high-speed interface which intercon-nects chips within a mobile device. It offers reliability, robustness and abstractnessand requires very low power operation. Inspired by the new era of multitasking,UniPro is ready for the upcoming innovations of parallel processing on mobiledevices too.

1.1 Thesis’ Aim

The interest is in analyzing and verifying UniPro’s existing and future protocolfeatures. Transport protocol is mainly the protocol that interconnects differentintegrated circuits within a mobile device. To achieve our goals we formulate theprotocol idea at an abstract level and focus on different basic actions, like sendingdata (a variation of sliding window protocols applied to the Data Link layer) andmanaging the set-up and termination of a connection.

The verification is done using the Spin verification tool, which uses Promela asan input protocol-description language. As most highly verification tools, Spin hasits own limitations. Even though Spin uses the advanced search tree compressionand reduction algorithms, the full verification of a complex protocol still requireslarge processing and memory resources. Thus, we take advantage of another toolcalled DiVinE. DiVinE distributes the workload of a verification among multiplecompute nodes of one or more clusters (DAS3 in our case). This way the memoryand processing requirements are shared among the nodes.

1

1.2 Thesis’ Overview

UniPro technology offers a chip-to-chip connectivity within a mobile device (PCIExpress). The features that it offers - reliability, error control, etc - are similarto those of the Internet protocol suite. The different structure of the networkand the UniPro’s specification create the need for defining new protocols, designedespecially for this technology.

In this thesis we analyze, design and verify two basic protocols for UniProby using three verification models. For each model we concentrate on specificcorrectness properties of the protocols. In this way, Spin not only had verified thecorrectness properties but it was a guide during the design phase of the protocolstoo.

The first protocol is a variation of the sliding window protocol family, USWprotocol, and it is used in the data link layer of the OSI model. The USW protocoloffers reliability and error and flow control to UniPro’s links. In addition, westudy the connection management protocol which is responsible in setting up andclosing a connection. The second protocol is verified with 2 different verificationmodels: Connection Management (CM) protocol and Router Management (RM)protocol. CM protocol checks the message flow and node states correctness andthe RM protocol checks the bandwidth reservation correctness of a connection forcongestion control.

Throughout this study, both model checkers, Spin and DiVinE were useful!DiVinE could verify bigger problems than Spin but Spin’s error trails were usedto find flaws in a particular design and correct them.

1.3 Thesis’ Structure

This thesis is organized as follows. We begin with an elaboration of the UniProinterface and a description of the formal verification using Spin and DeVinE insection 2. We proceed with the main part of the thesis which consists of 3 sections.Section 3 describes the USW protocol and its implementation. Section 4 analyzesthe connection management protocol and checks its flow correctness at the endnodes. With a different model, in section 5, we check the correctness of the band-width reservation. Section 6 shows the performance results of the executions andincludes some comparisons of the two model checkers used. Chapter 7 concludesthe thesis.

If the reader has an adequate background in the networking and verificationareas, the background section and the related work in the main sections can bejust skimmed. The router management protocol, as an extension of the connectionmanagement protocol, can be followed only if the latter is read first. If one is notinterested in Spin’s or DiVinE’s performance, section 6 can be skipped.

All figures were designed and exported in OmniGraffle Professional framework.The graphs presented in Section 6 were plotted in Gnuplot. The document waswritten in LaTeX using TeXShop freeware.

2

2 Background

This chapter provides background information about the UniPro protocol, theformal verification process and its necessity. Subsequently, we briefly present thetools and frameworks we make use of.

2.1 UniPro

UniPro has been designed as a chip-level network for mobile devices. As a con-sequence, a UniPro network is relatively small, typically consisting of up to adozen devices. However, it is designed to support multi Gbits/sec using serialbidirectional links.

UniPro bidirectional link

Device

Switch

Port

Figure 1: Abstract representation of the network within the mo-bile device

There are two definitions of components used in the UniPro specification whichare closely related to each other: device and module.

• Device: A device is any component which is UniPro addressable. Examplesof UniPro devices include application processors, media processors, cameraprocessosr, displays and baseband. Its definition simplifies the design ofUniPro by means of low level abstractness and generality.

• Module: A module can be considered a component at a higher level withone specific functionality, e.g., a camera. It consists of either one device - inthis case the device is the module - or combinations of multiple devices andswitches.

The network consists of devices connected with each other through the switches,where data can be conveyed from one device to another (see Figure 1). Thisnetwork composes a mobile device, e.g., mobile phone, with all its functionality.Currently, a UniPro network consists of up to 128 devices which exceeds therequirements for the running projects of mobile devices. The reason for the exactnumber is that UniPro offers 7 bits for the identity of its devices and thus thisidentity cannot be larger than 128.

UniPro offers high-speed communication and it guarantees reliability, integrityand robustness for data exchange between two devices. Since a mobile device has alimited repository of energy (battery). It is very important that UniPro consumesas little as possible for the communication. Accordingly, communication protocolshave to be optimized (i.e., consisting short sequences of instructions). Simplicityis one of the main features that a new technology has to follow to be used in

3

the future. UniPro is a generic hardware and software friendly technology, whichcan support a diversity of applications. Finally, UniPro is scalable, not only 128devices, but even more for future requirements.

Despite the fact that the size of the network can be considered to be small withlimited number of devices and applications, its complexity is related to networksof bigger size. Failures might occur within a mobile device too and they mustbe handled even if they seldom appear. Devices might fail during a conversation,messages can be lost in the way from one end point to another, applications mightcrash. Moreover, UniPro solves some additional issues sunch as the low-powerconsumption.

Application

OSI Model

Presentation

TransportNetwork

Data Link

Physical

Session

UniPro Model

PhysicalPhy Adapter

Data LinkNetwork

Transport

Application - specific

protocols

Figure 2: OSI and UniPro network layers

In order to address UniPro’s complexity, a UniPro layer stack is defined whichis based on the well known Open Systems Interconnection (OSI) Reference Model.The OSI model is used by many network designs, since it simplifies the complexityof a network and separates the issues that have to be handled in different layers.This way a network designer can solve the problems of each layer separately, butat the same time builds the dependencies from one layer to the other. The OSIstack consists of 7 main layers. It starts from a very low level layer which providesthe physical specification for devices (signal and binary transmission) and goes tothe highest level which is the application level.

From Figure 2, one can observe many similarities but also some detailed dif-ferences between the two models. The UniPro model distinguishes two physicallayers instead of one. The lowest one is in charge of successful signaling, lineencoding, etc., (like the physical layer in the OSI model), while the intermediatelayer (Phy Adapter) is responsible of abstracting the different technologies andbind them together in a heterogeneous environment. The Data Link layer ensuresthat there is a reliable link between two modules in one hop distance, and that aframe can be arbitrated and multiplexed corresponding to the specified priorities.Similarly to the OSI model, the Network layer deals with the routing and address-ing of a packet, i.e., how the network transfers a packet from the source to thedestination. The fourth layer is the Transport layer. The Transport layer definesthe quality of a connection and is responsible for the flow and congestion control ofthe network. Finally, the UniPro model combines the three upper layers of the OSImodel - Session, Presentation and Application layer - into a single one, becauseit is responsible for connecting the diversity of applications and modules togetherrather than for implementing applications. The interface of the Transport layerhas to be simple, such that the applications can adapt easily to it.

4

Referring to its specification, UniPro offers the following features:

• Connections are reliable, meaning that if one node sends data, it is acceptedat the destination intact and in the order in which it was sent;

• Links are bidirectional;

• There is always only one way that connects two different devices, so messagesthat are sent from device A to device B always remain in the same order inwhich they were sent;

• Devices can be servers and/or clients, meaning that a device wishing torequest a service from another; device is called client and the callee is theserver. A server or a client can be connected (via data exchange) only withone device at a time, but it responds to requests from other devices as well.

More facts are shown in the next sections where applicable.

Scientific Question

UniPro has a significant scientific and practical value. The high expectations ofthe interface bring it to the top of its class, designed to be for current use incurrent and future technologies for mobile devices. UniPro opens the horizonsto innovations. Its practical value is based on the global standardization andease of use. No matter the brand, its scope is to build an abstract interfaceremoving the barriers between different companies. UniPro is an interface withmany perspectives for wide future use.

2.2 Formal Verification

Last decades many new ideas emerged from the computer science spectrum. Usu-ally, scientists, in order to accept a new idea - e.g., a software protocol or digitalhardware-, they need a proof of why and wether it is correct. The size of a sys-tem and the complex network environment, make it difficult for one to find andanalyze all the possible cases of a protocol. Failures are very difficult to be found,trailed and handled, especially within large systems. However, for novel complextechnologies, it is important to be formally verified.

Formal verification can be achieved in several different ways; a mathematicalproof, or by examining all cases and to show that they are consistent with thesystem requirements. Some examples are Petri nets, timed automata, finite statemachines and process algebra. A mathematical proof, in addition to the factthat it can be too complicated and time-consuming, is potentially error-proneand difficult to be followed by others. On the other hand, flow charts and statemachines in general help to deeper understanding of the model and and they areeasily readable by others. Unfortunately, flow-charts are not enough to detecterrors because some possible paths which can lead to a failure are not obvious.Lastly, running all the possible paths that an algorithm can reach by hand isnot always a practical option because of the complex interaction of the systemand its environment. Therefore, there exist some simulators and model checkers,which automatically verify all the possible paths of an algorithm and check thatits properties are never violated.

For this project, apart from designing flow charts for our protocols, we makeuse of an automated model checker, Spin [1]. Spin has been used to prove thecorrectness properties of several protocols and even parts of operating systems [3]

5

[4] [6]. Spin has been proven to be a successful verification tool [11] [12] which isthe major reason we choose it. Spin is an open source tool and has been used forsimilar projects and physical environments in the past where it was able to findsome flaws in the algorithms. Using Spin, one can represent parallel processesthat run on a fuzzy network or a distributed system. By examining all possiblescenarios, it automatically can check if the protocol still runs correctly even ifconnections are unreliable. Furthermore, the user can define as many servers orclients as desired and checks if there is a connection corrupted by another one.In addition, because of its wide use, Spin has good documentation and severalexamples that are useful in order to learn the tool. Last but not least, we chooseSpin because it had been already tried for some UniPro protocols.

There is a number of steps that a Spin user has to follow. First of all, thealgorithm has to be represented in a way that the tool recognizes. Model checkers,usually explore all the possible states of a model. As a consequence, they need alot of memory and time to fully verify the model. Thus, programmers representtheir system in a high level of abstraction or check its different critical partsseparately. As a second step, the user has to express the protocol properties andspecifications (by using assertions for example) such that the tool will recognizean error in case of violation. In case the state-search is completely finished andthere is no violation and the it has been represented correctly, then it means thatthe model is fully verified.

To run a simulation of a protocol in Spin, one should model the design usingPromela programming language (default file extension .pml). Promela is a formalspecification language. Promela allows us to build high-level models of distributedsystems from three basic components: asynchronous process, message channelsand data objects. Despite the fact that Promela’s syntax is similar to the Clanguage, it has its own limitations and particulars, some of which are statedbelow.

• Process Calls: Spin runs the different processes in parallel. Only the codewithin a process is sequential. In this way Spin makes interleaving betweenstatements of different processes.Thus, the state space can be huge. Forthis reason, if there are sequential parts within a process that are indepen-dent, they should be declared within the same scope as ”d step” or ”atomic”(”d step” is stricter than ”atomic”).

• Channels: Channels are used in Spin modeling to represent a network andas a communicator among different processes. Channels can be synchronousor asynchronous. One can define the maximum number of messages thata channel can hold; if this number is exceeded then Spin gives an error ortimeout in case there is no way to prevent overflow of the channel.

• Timeout: ”timeout” is a reserved word in Promela. A timeout can be putin several places in the model and when all other possibilities are false orblocked then one of those timeouts will occur, if it is an option at the blockingpoint.

• If statements and Loops: Within an if statement, any of the options thatare true can occur and will execute e.g. if the first and second conditionsare true, both of them can occur and not always the first one like in otherlanguages. In addition, if none of the options is executable the code willblock until one of them becomes executable, unless there is an ”else” casethat covers it. Loops will behave exactly the same with the only differencethat they will be repeated unless the reserved word ”break” stops them.

6

• Never Claims: A never claim is a special process in Promela, most of thetimes placed at the end of the file, consisting of a number of states andstatements. It is checked in every single step during verification and it hasto never reach the end of the process or finish. Otherwise an error will bethrown. It is used in defining the correctness properties of a system.

A model checker is used to find flaws and prove correctness properties of pro-tocols that are not enough to be analyzed by using other methodologies. Thatmeans that the tool has to check wether the protocol meets a given specification.Spin checks for general type of conditions but the designer has to define the mostcomplicated and specific ones. In Spin there are 3 ways to find flaws in a model.

1. Spin checks in general: Spin as a verification tool will always check and hitan error when it detects errors like deadlock, live-lock starvation (cycle),under-specification (unexpected reception of messages) or over-specification(unreachable or dead code);

2. Assertion: In addition, the designer can place assertions anywhere in thecode and check variables and states;

3. Linear Temporal Logic (LTL) and Never Claim formulas: Last, Spin assistsin verifying the correctness properties of a model. Protocol properties areusually described in modal temporal logic specification languages like LTLor Computational Tree Logic (CTL). Spin supports LTL formulae and trans-lates them into the ”never claim” code that can be placed in a Promela file.The tool will search in future paths whether, e.g., the condition eventuallycan be true. Spin will return an error if any of the ”never claim” statementsis satisfied.

Design

Model in Promela

correct verification

yes

Implementation

no

Verify in Spin

Figure 3: Process followed

We use Spin to design and get a better understanding of the protocol, but aboveall to formally verify its correctness. We take advantage of Spin’s default checks,

7

assertions and never claims to examine wether the system design unambiguouslymeets its requirements or not. Besides the parallel execution of the processes,the option of synchronous or asynchronous communication and the ability of ex-pressing networks in general, make Spin a useful tool. If an error takes place,Spin produces a trail for further investigation. A designer can use this to see thesequence of commands which have been executed and which induced the error.

The verification phase is indirectly part of the designing phase (see Figure 3).We first make a representation of a protocol using flow charts. Then we designthe implementation in Spin by modeling it in Promela. If the verification finishessuccessfully, then the representation is to be considered suitable for further imple-mentation. Otherwise, if the verification shows an error, we check the error trailand try to understand the error. Then we redesign a new protocol and we repeatthe same sequence of steps as before; the model and verification phase.

Some of the main problems when using Spin are the CPU time and memoryusage due to the huge state space generated. So, Spin offers the options to sim-ulate a system and to exhaustively or partially verify it. It has both, depth-firstor breath-first search options to explore all the states and many other optionstoo. The state-space notably increases with every small change of a parameter.The following formula gives an idea of how big the space can be with very smallnetwork.

Ra =

(s∑

i=0

mi

)q

where q is the number of channels, s the maximum number of messages and m thenumber of different values that a specific data type can have [1]. Consequently,the designer has to model its protocol at the highest level of abstraction, withoutloosing essential behavior of the real protocol, in order to reduce the state space.

Spin’s developers took this limitation seriously into account and tried to comeup with alternative solutions in order to reduce the state space required. So, Spinprovides its users with plenty strategies and options for reducing memory space,some of which are stated below:

• Partial order reduction: This aims to reduce the number of states that needto be stored by eliminating the interleaving of independent actions;

• State compression: Trivially from its definition, it compresses each state sothey take less memory than without compression;

• Weak fairness enforcement: This is a technique where Spin enforces theweak fairness constraint on all cycle analyses, i.e., if an activity is continu-ously enabled then it has to be executed infinitely often. Like weak fairnesstechnique, there exist also others and their aim is to concentrate on specificsearch (just search for acceptance cycles);

• Bit-state hashing: By default, Spin holds the whole state description inmemory. By means of bit-state hashing technique, a state’s hash code in abit-field is stored instead and that saves a lot of memory. Clearly, hashingtechniques are prone to errors when one state is represented with the samehash value. Reference [1], mentions savings of 98% of memory in the case ofusing one hash function (175 MB to 3 MB) and 92% when two hash functionsare used (13 MB). As a conclusion, a 97% of full exploration was done in thefirst case.

8

Parallel Formal Verification

Tools like SPIN, are unable to completely verify some of the protocols, with thebiggest limitation being the memory space. An obvious solution is to increase thememory of the compute node. However, there limits to that as well. The nextoptions is to use multiple compute nodes to verify a protocol.

An application that allows us to simulate a model on a cluster of nodes is Di-VinE [18]. The DiVinE model checker has good performance in a high-bandwidthcomputational grid environment like DAS [8]. Several algorithms are introducedin Divine in order to split the work-load and distribute it among different nodes.Lastly, DiVinE introduces two algorithms for cycle detection: topological sortalgorithm (OWCTY) and maximal accepting predecessors (MAP) [7].

DiVinE has a native modeling language that supports code that is written inPromela language too. That is done through the NIPS module which is a re-implementation of the real SPIN tool. It was easy for us to run our simulationson DAS3 heterogeneous cluster and compare that with running simulations onSpin using a single machine. DAS3 consists of 5 clusters with 272 dual cpu AMDOpteron nodes. Nodes are connected through very fast network; Myrinet-10G andGigabit Ethernet. They all have similar specifications; for example they all have4GB of memory and they are 2.2− 2.6 GHz fast.

9

3 Sliding Window Protocol for data trans-

mission

The family of sliding window protocols is well known in the computer networkarea. Tanenbaum[9] gives a thorough introduction and analysis on them. In thissection we introduce our sliding window protocol and we explain the way we modeland check specific properties in the Promela language.

3.1 Introduction & Related Work

We assume the channel between two nodes is erroneous i.e. errors occur in thelinks and the routers are often overflowed so data packets are lost. As a result, acontinuous communication between the transmitter and the receiver (data pack-ets and acknowledgments providing feedback that they have been received) shouldexist. Sliding window protocols are popular because they offer reliable data trans-mission and they control the flow of the messages where the two links betweenthe two ends are not of the same speed and the two ends are not of the same pro-cessor power. In addition, they are feasible to be used for a real compute nodesand Internet - as the sources of buffer memory and processing power are finite.Sliding window variations are used at both the data link layer (HDLC) and in thetransportation layer (TCP) of the OSI model.

For this section we use the definitions of transmitter and receiver where bothare end nodes in a network.

• Transmitter (Tx): The transmitter is a node which wants to send a piece ofdata to the other party. This data is cut into smaller parts called packetswhich have to be sent in a specific order;

• Receiver (Rx): The receiver -informed that the transmitter is sending data-waits for the packets to arrive, it processes the packet and it sends a feedbackto the transmitter that it has received the specific packet.

In this section we do not consider how the transmitter or the receiver implementthe connection management, but we concentrate on the reliability of the connec-tion.

The transmitter ensures that the other party receives all the packets correctlyand in order. The receiver acknowledges every packet and sends an indicationto the transmitter. So, the transmitter can send i ≤ N packets. N is calledwindow size and the transmitter waits for no more than N acknowledgements atthe same time. Each of the packets has a unique identity increased in sequence;so the first ones have identity in [0, ..., n] where n = N − 1. Then the transmitterremains blocked until it receives feedback from the receiver. The feedback isknown as acknowledgement (ACK) and is usually implemented as an integer i ∈N0, that represents the identity of the packet received. When connections areflawless, the transmitter by receiving m acknowledgements, simply sends the nextm ([n + 1, ..., n + m]) packets and move his window m positions further. In thisway there is a continuous communication between transmitter and receiver wherethe transmitter tends to have feedback in order to keep in sending data. Thetransmitter always has no more than N unacknowledged packets sent.

Within a network with flawless connections, there is no need of using acknowl-edgements unless the transmitter wants to know that the receiver got the packetsor wether it is still online. The main aim of sliding window protocols is to be

10

reliable even under imperfect circumstances. That is the reason, the feedbackacknowledgment or negative acknowledgment (NAC) in case of a failure are nec-essary. There are 2 generic sliding window protocols studied in the literature [9]and a lot of their variations.

• Go back N: This is the simplest one where the receiver will send an acknowl-edgement after the arrival of the expected packet and it will simply ignoreall the packets after an error until it receives the correct one. The transmit-ter will retransmit all the packets that have not been acknowledged after atimeout

• Selective repeat: In this protocol only the failure packets will be retrans-mitted. So, apart of sending acknowledgments, the receiver informs thetransmitter if there is a failure and of which packet. Again the transmitterwill retransmit all the damaged or missing packets and all unacknowledgedones after timeout

Obviously, the former protocol wastes bandwidth by retransmitting all packets af-ter a failure. On the other hand, the latter is better in terms of wasting bandwidthbut it is more complicated to implement as the receiver needs to remember whichpackets have been received after a failure and generally keep track of at least thelast packets. Needless to say that the first one is preferred when the connectionhas static behavior i.e. there are little lost messages during the transmission.

The time spent until the transmitter receives the feedback from the receiverand the extra messages sent are the major issues in sliding window protocols. Animproved window protocol is the one which hides the latency of the transmissionand increases the utilization of the line by using pipelining techniques. The slid-ing window variations used nowadays are enhanced with smarter techniques andapproaches. An example is the one which uses the maximum packet lifetime inTCP or the round trip time. In this way they solve confusions (if a message is oldit means that it is from previous round and it can be ignored) and they improvethe protocol[13] [15] .

Sliding window size and Utilization

The simplest solution of a reliable protocol is to send a single packet, wait for anacknowledgement and then send the next one [9]. This protocol is called one-bitsliding window or Stop-and-Wait in the literature. Bezem and Groote proovedthe correctness of a one bit sliding window protocol using mCRL IN 1994 [14].However, it is very slow because for each packet transmission, an extra overheadis added for the acknowledgment. The cumulative time that the transmitter waitsfrom last bit of a packet that it sends to the network until it receives the ACK isequal to R : round trip propagation time. That reduces the utilization of a lineconsiderably (50% when the packet consists of 1bit).

11

D0 arrived, sends ACK0D1 arrived, sends ACK1




sends D0

Tx Rx

sends D1sends D2sends D3

sends D4sends D5sends D6sends D7

>>S >

>

R

Figure 4: Line utilization using pipelining

With the pipelining technique we can achieve better utilization of a line [9].That means increasing the sliding window size to N > 1. Is that claim correct?In Figure 4 we check a scenario with N = 4. We assume that each packet is of lbits and is transmitted with a speed of b bits/sec. As a corollary, we get that thetransmission delay is

S =l

b

and the line utilization,

U =S

S + R=

lb

lb + R

=l

l + Rb

So, indeed we can have better line utilization if the transmission delay is greaterthan the round trip propagation delay, S > R or l > bR. Variations of sliding win-dow protocols with arbitrary window size have been studied and formally verifiedin different ways (see Related Work in [19]).

Finite field of sequence numbers

header

i1 ... inseq. #

Figure 5: Data packet emphasizing on the sequence number inthe header

12

Until now, we looked at the sliding window protocols with an infinite sequencenumber attached to them. This is unrealistic for a real packet as in the headerthere is a limited room of bits indicating the packet identity (see Figure 5). Thesequence number that a packet can have is between [0, ..., 2n − 1] if n bits areavailable in the header. We define the maximum sequence number as I = 2n − 1,where always I ≥ 1 as n > 0. Thus, we can assume an infinite sequence numberonly for a message which consists of less than I packets. For generality reasonswe do not make this assumption any more but we proceed with a finite sequencenumber of the packets.

0 I... 0 ... 0 ... 0 ... ...I I I

Figure 6: The sequence numbers are cycled

To overcome with the problem, we simply n-cycle the sequence numbers whenwe reach the one with nth identity (see Figure 6). Therefore, the initial slidingwindow protocols have to be checked if they are still correct without any limitationor how they can be fixed.

It has been proven that cycling the identities of the packets adds an extralimitation which is directly related to the maximum size of the sliding windowN . This depends on the exact protocol and the circumstances by which factor Nbased on. The problem is caused by the mixture of ACKs and NACs and howthey could be confused with previous messages with the same identity. We take asexample the two generic sliding window protocols; in the Go-back-N protocol Nis fractioned by N ≤ I − 1 where in the selective repeat protocol N is fractionedby N ≤ 1

2(I + 1).

UniPro and Sliding Window protocol

It is believed that a variation of the sliding window family is beneficial for trans-mission of data in UniPro. One of the UniPro’s targets is the good utilization ofthe physical connections. In addition, in Unipro we have wire-like communicationchannels, thus messages are received in the same order that they were sent. Thesefacts directly lead us to the pipelining techniques and accordingly to a slidingwindow protocol. We build our protocol inspired by the Go-back-N variations.The major reasons are

• UniPro offers high-speed static connections among modules, so Go-back-Nis preferable over selective repeat as the roundtrip time is short and failuresrarely happen;

• It is low power technology, so the main protocols have to be as simple aspossible;

• UniPro keeps the header of a packet as small as possible, so it has a limitednumber of bits for the sequence number of a packet; 5 bits gives a cycle ofat most 32 different identities. That means in order to have good utilizationit really matters to gain at least the Go-back-N cycle maximum window sizethat would roughly be 31 (see Section 3.1)

UniPro uses the sliding window protocol for the data link layer. In the nextsubsections we introduce our variation of the Go-back-N sliding window protocol.

13

3.2 UniPro Sliding Window protocol

INIT(current=0)

RECV DATA(sn)

sn=current

yes

no

SND ACK(current)current++

SND NAC(current-1)

Figure 7: Flow chart for the receiver’s algorithm

For UniPro, the transmitter and receiver can be two different nodes connectedwith each other through specific ports. The UniPro Sliding Window (USW) pro-tocol is very similar to Go-back-N protocol. The main difference is that when thereceiver notices a failure, it sends a NACi with the ith last correct packet that itreceived. Then the transmitter knows about the failure earlier than in Go-back-N.

Figure 7 shows a flow chart of the receiver’s algorithm. The receiver’s actionsare very simple and the algorithm is really small. The protocol starts with havingthe variable current = 0, which indicates the identity of the expected packet.After the receiver receives a packet, it will check if it is the expected one and without errors. If it is, it sends an acknowledgment ACKi to the transmitter and waitsfor the next packet to arrive by incrementing current value. In case of failure, itwill send a NACi−1 with the identity of the last correct packet which arrived inorder.

1 20 I... 0 ... 0 ... ......

beginning current

All packages numbers within the window

on_post

II

Figure 8: Further explanation for the variables in the program

The transmitter needs to store any two out of the three predicates, beginning,on post and current (see Figure 8). They are the basic variables defining the

14

on_post < N-1

RECV ACK (sn)

RECV NAC (sn)

SND (current) (on_post++, current++)

INIT (on_post=0, current=0)

within window

yes

no

current=sn+1on_post=0

on_post=current-sn-1

yes timeout

current=beginning,on_post=0

yes

(a) (b) (c) (d)

Figure 9: Flow chart for the transmitter’s algorithm

transmitter’s window. We call window the packets which have been sent but notacknowledged (enclosed by the oval). a Using two of them the transmitter caneasily calculate the third one.

• beginning: This is an indicator of the first packet that has been sent buthas not been acknowledged;

• on post: This the number of the packets waiting for an acknowledgementand can be no more than the maximum window size (on post ≤ N);

• current: This is an indicator of the next packet that will be sent

Even though the transmitter seems to be complicated because it is the onewho has the control, it has been simplified a lot (see Figure 9). In general, itsends N packets to the network and it can only send the next one when some ofthe packets that it sent are acknowledged by the receiver. It will resend a packetonly if it receives a NAC or after a timeout.

Better understanding of USW’s functionality

The only messages that the transmitter receives are ACKi and NACi. By thereceiver’s behavior, there is only one unique ACKi acknowledgement sent perpacket and if there is more feedback for exactly the same ith packet that would

15

be NACi. Thus, the receiver always indicates the maximum ith packet that itreceived correctly and in order in its answers ACKi or NACi. The sequence ofits responses will look like the following by time:

ACK0, NAC∗0 , ACK1, NAC∗

1 , .....

time

We can clearly observe that there can be exactly one acknowledgement ACKi

after which there can be an arbitrary number of NACis and this sequence repeatsindefinitely. Consequently, even if some packets will be omitted by the time thisseries will arrive at the transmitter, we conclude that transmitter’s window alwaysmoves in front and not behind 1. So, there is no way that the transmitter can bein doubt about an acknowledgement or wether it will move the beginning of hiswindow to the right (further) or to the left (behind). This is the reason that USWcan have its maximum window size N up until to the greatest identity I.

D0

D1

NAC2

....

D0

ACK0ACK1

D0 XD1

X

timeoutX

NAC1

D2

Tx Rx

D2ACK2

Figure 10: An example of Transmitter’s and Receiver’s behavior

Figure 10 shows a sub-block of the data packet exchange between a trans-mitter and a receiver, with I and N equal to 2. We assume that at the be-ginning of the conversation, all data packets and appropriate acknowledgementshave been received to their destination. For this example we use the triple(beginning, current, on post) for the transmitter side. The first data messageD0 is lost but the second message D1 does not arrive, (0, 2, 2). The receiver ob-serves the error and it replies with NAC2 because the last correct packet wasD2. Note that the data packet with number 2 lies out of the window i.e. the

1Remember that connections are one hop wire-like

16

transmitter does not expect this acknowledgment. So, it decides to resend D0and D1. The two acknowledgements are lost but if they weren’t then they will bewithin the window. After a timeout, the transmitter resends D0, (0, 1, 1) but thereceiver waits for D2, so it replies with NAC1. By the NAC1, the transmitterwill conclude that the receiver has already received D1 and it will directly sendD2. Scenarios can be even worse by overlapping messages of the two ends butUSW is capable of handling them.

Attempt to increase the maximum window size

D0D1D2

ACK0ACK1

ACK2X

X

X

timeout

Tx Rx

D0D1D2

D0D1D2

ACK0

ACK1ACK2

X

X

Tx Rx

D0D1D2

(a) (b)

Figure 11: Scenarios where the maximum window size N is equalto the maximum identity of a packet I + 1

In the Go-back-N protocol, if the maximum window size N is equal to the max-imum identity I, N = I + 1, there is a scenario where the transmitter is confusedabout the acknowledgements that it receives. The scenario is the following:

1. The transmitter sends the data messages with sequence numbers [0, ..., I]

2. The receiver sends the appropriate acknowledgements

3. Only the acknowledgement ACKI arrives to the transmitter

4. The transmitter sends the next I +1 data packets and it again receives onlythe ACKI

The transmitter has no way to find out if the acknowledgement means that thelast I + 1 data packets were all lost or all received. Similarly, in USW protocol,we face this ambiguity but slightly different.

In USW, the problem facing Go-back-N is not likely to occur because eachacknowledgment is unique and can be received at most once by the transmitter.However, the problem moves more to the receiver’s side. The scenario of Figure11(a), N = I + 1 = 3, is described below:

1. The transmitter sends the first 3 packets D0, D1 and D2

2. The receiver takes them successfully, thus it replies with acknowledgementsfor each one

17

3. All 3 acknowledgements fail to arrive to the transmitter

4. After a timeout, the transmitter will automatically resend the 3 packets

Obviously, the receiver is unaware of any retransmission and it will assume thesecond round packets, D0, D1and D2, are the next ones 11(b).

So, we have proven by contradiction that the maximum window size cannotbe more than the maximum packet identity I.

Considerations

We believe that USW is better than Go-back-N protocol because the receiverdoesn’t wait to receive and ignore all following packets after a failure but it in-forms immediately the transmitter. The transmitter right after a NAC will resetits window and resend the data. USW fully utilizes the link when the product ofthe packet transmission time S and the maximum size of the window N minusone is bigger than the round trip time R, S(N − 1) > R. In UniPro we haveshort distances (wire) and high bandwidth capacity, so we assume that the prop-agation time 1

2R and accordingly R would be very small. In the same way thepacket transmission time S is also small but because of the size of the window,(31 currently, which can be increased in future), it seems to increase the overallproduct S(N − 1). Furthermore, in the USW protocol the transmitter will get toknow if there was a failure earlier or at the same time as with Go-back-N. Theonly reason that one can doubt the advantages of the USW is the wasted band-width because of additional NAC messages. The number of those can be found intheory from Li = R/S. Interesting research about how the basic parameters, likethe window size, affect the utilization of the line and delay of acknowledgment forthe Go-back-N variation, is presented in [16].

18

3.3 Modeling in Promela

The final version of USW is the result of various implementations in Promela andtesting within the Spin environment. This protocol is the simplest one in thisstudy, so we used it to learn and experiment with Spin too.

Network

Tx Rx

tx2rx

rx2tx

Noise

Noise

Figure 12: Sliding Window Network

We model our protocol by using only one transmitter Tx and one receiverRx (see Figure 12). The two end nodes are connected through two differentchannels; the one from the transmitter to the receiver and the other from thereceiver to the transmitter, tx2rx and rx2tx respectively 2. Tx sends data tothe the channel tx2rx, tx2rx!t type(sn), where t type is the type of the messageand sn the sequence number of the data message or acknowledgment. When Txreceives a message, it consumes it from the channel rx2tx, rx2tx?t type(sn), whereagain t type is the type of the message and sn its sequence number. Respectively,Rx sends packets to the channel rx2tx (rx2tx!t type(sn)) and consumes packetsfrom the channel tx2rx (tx2rx?t type(sn)).

1 a c t i v e proctype no i s e ( ){2 mtype t type ;3 byte t sn ;45 do6 : : tx2rx ? t type ( t sn)−> /∗ randomly remove msg from tx2rx ∗/7 : : rx2tx ? t type ( t sn)−> /∗ randomly remove msg from rx2tx ∗/8 od9 }

Figure 13: Create channel algorithm

Another consideration in Promela is the way to create unreliable connectionswhere messages can be lost. For this, we include an extra process called noise (see

2Channels are unidirectional in Promela

19

Figure 13). This process has only one property. It has access to both channels ofthe two end points and it randomly steals some messages from them. We choosethis way to create noise in the connections because we found it to be the simplestand the one with less overhead for verification for this problem. Bare in mind thatby disabling the noise, the connections become flawless.

Receiver

1 a c t i v e proctype Rx( ){2 mtype t type ;3 byte t sn , cur rent =0;45 do6 : : tx2rx ?DATA ( t sn)−> /∗ r e c e i v e data message ∗/78 i f9 : : ( t s n==current )−>// c o r r e c t

10 rx2tx !ACK ( cur rent ) ; /∗ send ACK∗/11 NEXT( cur rent )12 : : e l s e −>//something was corrupted13 rx2tx !NAC ( ( ( current−1+MAXMSG)%MAXMSG) ) ; /∗ send NAC∗/14 f i15 od16 }

Figure 14: The receiver’s algorithm

In order to make our code as abstract as possible we only use two end pointswhich are enough because of the connection-oriented UniPro protocol. Figure 14and 15 show the receiver’s and the transmitter’s process respectively. For theircommunication we use two channels rx2tx (receiver to transmitter) and the tx2rx(transmitter to receiver). The algorithms themselves do not need to change to fitto Promela model - they are exactly the same as the flow charts of the previoussection. For the transmitter we use the predicates on post and current and wecalculate the beginning by the function BEG(current, post) when it is needed.The function NEXT (x) finds the next identity of the packet that has to be sentafter the xth one. In addition, note that in the transmitter’s algorithm line 32,we could use a Promela timeout instead. However, we use the condition shownbecause it represents more potential failures than timeout. Replacing it with atimeout, it means that all processes have to be blocked in order to make thecondition executable. This meands that the channels are empty and the receivertook all the existing packets before resending which it doesn’t fit with reality. Onthe other hand, the condition on post == MAX MSG MINUS 1 is executableeven if there are still messages in the channels.

20

Transmitter

1 a c t i v e proctype Tx( ){2 byte t sn , on post =0, cur rent =0, i =0;34 /∗ Main loop o f Tx∗/5 do6 /∗ when the bu f f e r i s not f u l l can send a message ∗/7 : : ( on post <(MAX MSG MINUS 1))−>8 tx2rx !DATA ( cur rent ) ; /∗ send data package ∗/9 NEXT( cur rent ) ; /∗move to the next po s i t i o n ∗/

10 on post++;1112 : : rx2tx ?ACK ( t sn)−>13 i = BEG( current , on post ) ;14 i f15 /∗with in the window∗/16 : : ( i <=current && t sn>= i && t sn<cur rent )17 | | ( i>cur rent && ( t sn<cur rent | | t sn >= i ))−>18 /∗ count on post ∗/19 on post=(current−t sn−1+MAXMSG)%MAXMSG;20 /∗ out o f the window∗/21 : : e l s e−>22 /∗ i n i t i a l i z e window∗/23 cur rent=( t sn+1)%MAXMSG;24 on post =0;25 f i ;2627 : : rx2tx ?NAC, t sn−>28 /∗ i n i t i a l i z e window∗/29 cur rent=( t sn+1)%MAXMSG;30 on post =0;3132 : : ( on post==MAX MSG MINUS 1) −> /∗ replacement o f t imeout ∗/33 /∗ i n i t i a l i z e window∗/34 cur rent=BEG( current , on post ) ;35 on post =0;3637 od38 }

Figure 15: The transmitter’s algorithm

21


First we need to specify the properties that we verify, so we will create the ap-propriate network and algorithm. For USW we assume that the connection hasalready been set up and we concentrate on the reliability that it offers. USW hasthe following to offer:

1. Error control: Ensure at the receiver that the sent bits arrive and no other;

2. In order: The bits arrive in the same sequence as they have been sent;

3. No duplicated packet: Make sure that the receiver doesn’t accept a packettwice.

1 a c t i v e proctype Source ( ){2 do3 : : source ! 0 /∗ send 0∗/4 : : source ! 1 ; break /∗ send 1∗/5 od ;67 do8 : : source ! 0 /∗ send 0∗/9 : : source ! 2 ; break /∗ send 2∗/

10 od ;1112 do13 : : source ! 0 /∗ send 0∗/14 od15 }

Figure 16: An extra process used only in verification

This is the most important part of the implementation in Promela and thereason of using Spin. To check USW’s properties, we use never claims. First,we need to create an extra process which controls a new channel called source(see Figure 16). This process sends to the channel numbers [0, 1, 2]. It can producecombinations of the following sequence

0∗ 1 0∗ 2 0∗

where (0∗) can be any number of zeros including none. The verifications are basedon Wolper’s data independence method, using three colored messages (balls) [5].In [5], the proof that 3 different balls are enough to check the correctness of slidingwindow protocol, is also shown.

During verification, Spin will make sure that it uses all different combinationsthat create different states. The transmitter will attach in every different packetone of those numbers in the same order that it receives them. The receiver checksthe received message sequence to verify the link reliability. When we say that apackets ”arrives” we mean that the receiver has accepted it, sent an acknowledge-ment and it waits for the next packet. Spin produces an error when a never claimreaches the accept all label and thus the never claim terminates.

As zeros can be any in number we will concentrate on the packet with numbers1 and 2. In the code, the variable rcv1 means that the last packet that the receiverhas gotten was with the attachment 1 , rcv2 with attachment 2 and rcv0 withattachment 0.

22

1 never {2 T0 in i t :3 i f4 : : rcv0−> goto T0 in i t ; /∗0 i s r e c e i v ed ∗/5 : : rcv1−> goto T0 S2 ; /∗1 i s r e c e i v ed ∗/6 : : rcv2−> goto a c c e p t a l l ; /∗2 i s r e c e i v ed ∗/7 f i ;8 T0 S2 :9 do

10 ::1−> goto T0 S211 od ;12 a c c e p t a l l :13 sk ip14 }

Figure 17: Checking the 1st and 2nd properties of USW

For the first property, the error control (explained in Section 3.4), we write thenever claim presented in Figure 17. The never claim ensures that if there is anycombination of states to reach the accept all it will produce an error. So, by thiscode we prevent that the packet with the attachment 2 will be received before thepacket with the attachment 1 is received 3. In this way we ensure the arrival ofthe packet with the attachment 1. Consequently, we can say that all packets arereceived correctly.

The 2nd property implies that messages would be in order. The never claimcreates an error if the packet with attachment 2 arrives before the packet withattachment 1. The same code verifies the 2nd property too.

3Note that the attachments will be tried with any packet during the verification because simplySpin will try all the different paths which can occur in the algorithm

23

1 never {2 T0 in i t :3 i f4 : : rcv1 −> goto T0 S2 /∗1 i s r e c e i v ed ∗/5 : : e l s e−> goto T0 in i t6 f i ;78 T0 S2 :9 i f

10 : : ! rcv1−>goto T0 S5 /∗ other than 1 i s r e c e i v ed ∗/11 : : e l s e−>goto T0 S212 f i ;13 T0 S5 :14 i f15 : : rcv1−>goto a c c e p t a l l /∗1 i s r e c e i v ed ∗/16 : : e l s e−>goto T0 S517 f i ;181920 a c c e p t a l l :21 sk ip22 }

Figure 18: Checking the 3rd property of USW

The 3rd property says that each packet should be received only once (see Figure18). So, it might be the case that the variable rcv1 is true for more than one timeand that will be considered as an invalid duplication. So, we use an intermediatestate, T0 S2, to move to the next state once a packet with different attachmentis received. After that, if the receiver sees another packet with attachment 1 anerror will occur. The drawback in this scenario is that there is no way to say ifthe same packet arrives again right after the first one. However, we solve it byinserting an extra assertion at the receiver. The extra assertion ensures that thepacket with the attachment 1 or 2 are accepted and the next packet has differentattachment. The rest of the cases are tested by the never claim.

Full verification of USW has finished successfully!

24

4 Connection Management Protocol

UniPro offers a connection-oriented transport protocol. The base for a transportprotocol to be reliable is ensuring that both end nodes know and agree on con-nection establishment before start exchange data. Through out this section, westudy the connection establishment and termination.

After a brief introduction and related work, we show our final version of aconnection management protocol. We start by describing the main protocol. Thenwe give a deeper understanding of it by explaining some alternative solutions anddiscuss why they haven’t been used. This is followed by a brief explanation of thecode implemented in Promela and the correctness properties that we check.

4.1 Introduction

We describe our models through message flow charts for special examples anda state machine for the full specification of a protocol. Like in other similarprotocols, the terms client and server are used.

• Client: This asks for a service or for the connection to be established;

• Server: This offers a service and ”listens” on a specific port where clients canconnect to. It cannot initiate the connection but it has to accept a client’srequest in order to get connection establishment.

Message type -Explanation-

SYN Request to open a new connection

ACK Agreement on other party’s request

NAC Disagreement on other party’s request

DATA Data message exchange

FIN Request to close the connection

Table 1: Messages during communication between two nodes

Table 1 describes all possible messages that a client or a server can send orreceive. Apart from the SYN message, which is meant for the client’s request of aconnection establishment (a server never sends a SYN and a client never receivesa message SYN) , all the others can be sent and received from any one of the twoparties. The DATA message represents a packet of meaningful information afterconnection establishment. For finalization of a connection, the message FIN isused. Finally, in contrast with a NAC message (negative answer), the acknowl-edgement ACK can be used as a positive answer of one of the parties to a SYN,FIN or DATA message. In this section we will concentrate more on the connectionestablishment and correct message flow rather than checking the reliability andintegrity of data messages. Thus, we omit the ACK message after the receptionof a DATA message.

25

Label -Explanation- Client Server

Listen In listening mode, ready to accept a new connection√

Closed Not participating in any connection√

WaitSynAck Waiting for an acknowledgment during connection setup√ √

Connected In data exchange mode√ √

WaitFinAck Waiting for an acknowledgment during connection close√ √

WaitFin Waiting for the other party’s close√ √

Want2Close Waiting for the other party’s close without informing√

Table 2: Client and Server states

Table 2 shows a number of states in which a client or a server can be (the last 2columns indicate wether the state belongs to the client or/and the server). Whenthe server listens, i.e., is in ”Listen” mode, it means that it is available to serve anew client. On the other hand, the client can be in ”closed” mode when it is notparticipating in any conversation, but it can request a new connection by sending aSYN message. Then, depending on wether we have a 2-way or 3-way handshakingthe client (obliged to be go through it) or the server (only when the protocol uses3-way handshaking protocol) can be in ”WaitSynAck” state. After the connectionsetup, the server and the client are in ”Connected” mode exchanging data, untilthe closing of a connection is initiated. The mode ”WaitFinAck” is entered whenone of the parties sends a FIN message and waits for an acknowledgement, whilethe ”WaitFin” is reached if one of the parties has already sent a FIN and gottenan ACK acknowledgement for it but still waits for the other party’s FIN. Lastly,there is a special case for ”WaitFin” state that is called ”Want2Close”. Only theserver can reach this state and it means that the server does not want to exchangemore data with the client but it avoids sending any FIN message.

4.2 Related Work

UniPro’s requirements are very similar to the TCP/IP [17] Internet protocol suite’sones. They both offer reliability during data exchange after the connection isestablished. In addition, both environments, Internet and UniPro, are composedby plenty of clients and servers where clients can connect to the servers many times.As TCP runs on the Internet successfully, we believe that a similar protocol cansuite to UniPro too. Thus, we are interested in studying TCP’s handshaking andclosing process.

In Figure 19(a) the connection establishment is shown, known as the three-way handshake. The client is always the one who will initiate the connectionby sending a request of synchronization (SYN) to a specific server. The server,if available, will acknowledge the request and finally the client has to send anacknowledgement back, meaning that it has received the server’s message. Theclient repeats SYN messages and server repeats the ACK message when a timeoutoccurs. After the reception of a client’s acknowledgment, both end nodes will beconnected and ready to exchange significant data.

26

(a) (b)

Client Server Client Server

SYN

ACK

ACKConnected

WaitSynAck

Closed Listen

WaitSynAck

Connected

Connected Connected

FIN

ACK

FIN

ACKClosed

Listen

WaitFinAck

WaitFinWaitFinAck

Figure 19: (a) TCP connection establishment (b) TCP connec-tion close

While exchanging data, there might be the case that one of the nodes wantsto leave. In this case, it has to inform the other party by sending a message FINand wait for its acknowledgement. If the acknowledgement is delayed then it willsimply resend the FIN message. The other party has to proceed with exactly thesame steps, but not necessarily immediately after the other, i.e., after receiving aFIN message it can still send data until it is also ready to close the connection.Figure 19(b) depicts the scenario where the client is the first one who wants toclose the connection and the server closes after him.

Generally, Figure 19 shows an ideal scenario where no message is lost, no otherclients or servers interfere with the connection, all messages arrive in order andthey don’t overlap each other. By removing all the previous assumptions, thescenarios become much more complex, and difficult to find any flaws.

4.3 UniPro Requirements

As UniPro is specifically meant for mobile devices, it has its own limitations andrequirements. We now discuss some of the requirements that directly affect theconnection management protocol.

• Low consumption and high speed: Because of the limited source of energy,we have to work with the least messages possible. Connection management,setup and closing, must be short in the number of exchange messages andcompletion time.

• Memory: Each UniPro component has little or limited memory and thus weshouldn’t count on any tables of memory that hold information, history orstates of other components. In such a case, it simply won’t be scalable.

• Sessions: We do not count on any session identification of the connectionbecause we want to keep the algorithm as simple as possible and the headerof each packet small.

• Timing variables: For the same reasons as the above, in UniPro there are notiming variables and algorithms that interpret a certain delay of a packet toa categorization of the message as obsolete or usable.

27

4.4 Solution to the Connection Management protocol

4.4.1 Introduction

(a) (b)

Client Server Client Server

SYN

ACK

ACKConnected

WaitSynAck

Closed Listen

WaitSynAck

Connected

Connected Connected || Want2Close

FIN

FIN

Closed Listen

WaitFinAck

Figure 20: (a) Final connection establishment (b) Final connec-tion close

Figure 20(a) presents connection establishment where (b) shows connectionfinalization. Connection establishment looks similar to TCP except for some smalldetails that are discussed in the next subsection. We highlight our close of theconnection because is much simpler: it uses less states, it uses a smaller numberof messages and UniPro significantly reduces the number of different possiblemessages in most of the clients’ and servers’ states.

The connection setup is done using 3-way handshaking. The client sends aSYN message and waits until it receives an ACK from the server. Then, it replieswith an ACK too, signaling the server to go to the next state ”Connected”.

The closing of a connection can be done in two different ways, but both willfollow the same sequence of messages.

1. The first way is when the client is the first who wants to close the connectionand it will send a FIN message to the server. The server, by the time itreceives the FIN might still want to send more data before closing. In thiscase it will finish with the data messages and only at the end it will replywith a FIN message.

2. The second way is when the server closes the connection earlier than theclient. To implement this, it will move to the ”Want2Close” state and waitfor the client’s FIN. Then it follows the same steps as in the first way.

Note that the FIN from the server also serves as an acknowledgement becausethe server will only send it after it receives client’s request to close the connec-tion. Furthermore, always the server is the first to close the connection; and theclient will only close after the server. A detailed description is shown in the nextsubsection.

4.4.2 Description

The state machine of the server while it receives messages from its current clientis displayed in Figure 21. Only the messages that are possible to occur in each

28

Init

got_fin=falseListen

WaitSynAck

Want2Close Connected

my_client

rcv SYN (id)

snd ACK

rcv ACK || rcv DATArcv FIN

snd FIN

rcv SYN snd ACK

rcv FINgot_fin=true

(ε*) && got_fin=false

(ε*) && got_fin=truesnd FIN

rcv FINsnd FIN

timeoutsnd ACK

rcv ACK

rcv ACK || rcv DATA

Figure 21: Server’s state machine with its current client

state are shown. For a better overview, we separately describe the messages thatare sent by other clients. The server moves around between 4 different states withan initial state ”Listen”, ready to accept a new connection.

• Listen: The server can be in listen mode when it is free to accept a newconnection and is not busy with any other client. By the first SYN messagethat it receives from a client A it will acknowledge it and it will proceed tothe state ”WaitSynAck”. From this time on until it closes the connection,A will be its current client.

• WaitSynAck: In this state the server can receive a variety of messages. Itcan receive an ACK or DATA message, which means its client received itsacknowledgement and is connected. The server then also, it moves also tothe state ”Connected”. For any reason, if it receives the message ”FIN”, itwill reply with ”FIN” and it will go back to the ”Listen” state. The FINmessage can be a negative response from client that it doesn’t want to usethe connection anymore. The server stays in the same state if it receivesanother SYN message or after timeout; in both cases it sends an ACK to itsclient

• Connected: In state ”Connected” the server is formally participating in dataexchange. It stays in the same state when it receives a message ACK, DATAor FIN. None of them will be answered. By the reception of a FIN message,it will put the flag got fin = true. The symbol (ε∗) means that the servercloses the connection. If the server has already received a FIN from its client,then it can reply with a FIN and immediately move to the ”Listen” state.Otherwise, it will just move to the state ”Want2Close”.

• Want2Close: In the ”Want2Close” state, the server still can receive ACKmessages in case there was a repeated and delayed ACK message from theclient and the server moves from ”Connected” to ”Want2Close” withoutreceiving any data message from the client. Then the server will just stay inthe same state. When, the message FIN arrives the server will answer withFIN and move to the ”Listen” state

Figure 22 shows the interactions between a server and clients that are notequal to the current client. In none of them the server moves to another state

29

Listen

rcv FINsnd FIN

Init

WaitSynAck

Init rcv FINsnd FIN

rcv SYNsnd NAC

Connected

Init rcv FINsnd FIN

rcv SYNsnd NAC

Want2Close

Init rcv FINsnd FIN

rcv SYNsnd NAC

Figure 22: Server’s state machine with clients other than itscurrent client

and that is to be expected as the states are mentioned for a current client andthey are not influenced by other clients. In case a previous client did not receivethe answer FIN after it sent a FIN, it will repeat it. This FIN can arrive in anystate of the server with a different client and it will be answered by a FIN. Themessage SYN can be received in all states except Listen. This is a result of thesituation where multiple clients request a connection with the same server at thesame time. The server will reply with a NAC as an indication that it is busy withsome other client.

Closed WaitSynAck

ConnectedWaitFinAck

Init

snd SYN

timeoutsnd SYN

rcv ACK snd ACK rcv NAC

snd FIN

rcv FIN

rcv ACK snd ACK

rcv DATA

snd FIN

rcv FIN

rcv ACK || NAC snd FIN

Figure 23: Client’s state machine with its current server

In Figure 23 a client’s interaction with its current server is shown. The clientstarts from state ”Closed” and it can move through three other states statedbelow.

• Closed: While a client is in this state it chooses a server and tries to connectto it by sending a SYN message and immediately moving to the next state”WaitSynAck”

• WaitSynAck: This an intermediate state after the client sends a SYN andbefore it jumps to the ”Connected ” state. While here, the client expects

30

to receive an ACK that it can answer with another ACK. Alternatively,it will receive a NAC message which indicates that the server is busy. It isimportant that in this case, the client will reply with a FIN message and moveto the WaitFinAck state (see Section 4.4.3). The message SYN is replayedafter a timeout. In addition, it is still possible that the client receives a FINmessage. This means that the specific server replies to a FIN message of anold connection and there were at least 2 FIN messages played from client toserver.

• Connected: Here the client can receive another ACK message, after which itsends an ACK back to its server. As this is the state where data exchange isdone, the client generally receives some DATA too. When it doesn’t want tosend more data, it informs the server with a FIN message and moves to thestate ”WaitFinAck”. Notice that the client shouldn’t receive any FIN fromits server before it sends its own FIN.

• WaitFinAck: In this state, the client normally receives a FIN which willrelease it and let it go to the ”Closed” state again. Apart from the successfulrequest (through ”Connected”), the client also reaches this state after itreceives a NAC message. That means it can receive multiple ACK and NACmessages before it gets the FIN from the server. It depends on the numberof SYN messages that the client sent before moving from ”WaitSynAck” to”WaitFinAck” state.

Closed

ConnectedWaitFinAck

WaitSynAck

rcv FIN Init

Init

Init

Init

rcv FIN

rcv FIN rcv FIN

Figure 24: Client’s state machine with other servers than itscurrent one

Figure 24 shows the interaction with other servers when a message is sent bythem. In every state the FIN message can appear as a delayed repeated messagefrom a server. The client won’t reply to this message and it will remain in itscurrent state.

4.4.3 Examples and Informal Proofs

Reducing messages for connection setup from 3-way handshakingto 2-way handshaking

We reduce the complexity of the connection setup by using 2-way-handshakinginstead of the 3-way one. In this way the server jumps directly from ”Listen”

31

SYN

Client Server

SYN

X

WaitSynAck

Closed

WaitFinAck

Connected to Client 2

DATA

...

ListenNAC

ACKConnected

DATA

XFIN

timeout

Figure 25: An example with two-way handshaking

mode to ”Connected” and it doesn’t expect any ACK message. One might claimthat 2-way-handshaking is enough as the client can send a SYN message over andover again after timeout, until it gets the answer ACK or NAC from the server.Unfortunately, this algorithm doesn’t work when the messages from the clientoverlap the messages from the server and reversely.

In Figure 25, we show a scenario where 2-way-handshaking has a drawback.The client tries to connect to the server, so it sends a SYN message to the channel.The client replays the message because the server answers very late. The serverreplies with a NAC to the first SYN message because it is busy with anotherclient. However, it replies with an ACK to the second SYN message because itfinished with the other client. After the NAC message, the client moves to the”WaitFinAck” state waiting to close where the server sends an acknowledgementand goes to ”Connected” mode and starts sending data packets. That is wrongas the network is flooded with useless data messages.

Using the same ACK for closing as for opening a connection

One might suggest to use an ACK message from the server to the client, afterthe FIN message arrives to the server. We argue against this idea by providingthe scenario below (see Figure 26). Here, the server has finished with the dataexchange and the client too. So, the client sends a FIN request to the server andit replays it after a timeout. The server answers to both of them respectively,believing that the first acknowledgment ACK was lost. In the meantime the sameclient tries to reconnect to the same server. SY N message is lost, but the clientreceives the second ACK from the server and it moves to the state ”Connected”.The server receives data messages without establishing any connection which isconsidered to be a failure.

Need for a FIN after a server’s NAC

In this subsection we study and show the need of sending the FIN messages afterthe server answers with a NAC to a SYN. The other option is once the client

32

....Client Server

X

timeout

FIN

Connected

Closed Listen

WaitFinAck

ACK

Want2Close

DATA, ACK

FIN

ACK

SYNWaitSynAck

Figure 26: An example where server answers with ACK messageto a FIN message

Client 1 Server 1

X

timeoutSYN

Connected to Server 2

ClosedListen

WaitSynAck

NAC

WaitSynAck

DATA, ACK

SYN

ACK

Connected to Client 2

...

timeout

ACKNAC

Closed

...

...

...

WaitSynAck SYN Listen

Connected

Figure 27: An example where the client closes immediately aftera server’s NAC message without going through ”WaitFinAck”

receives a NAC message it can simply stop trying to connect and move to the”Closed” mode. The server is not affected as at the moment it answered to theSYN, it was busy with another client and it didn’t initiate any new connection.This is a faster and simpler way to close the connection.

Unfortunately, we found a drawback to this idea (see Figure 27). A scenario inwhich the aforementioned idea cannot work is described below. The client sends aSYN message twice to the server. While receiving the first one, the server is busywith another client and thus it answers with a NAC. By the second one, the serveris ready to setup a new connection and it replies with an ACK. Then, it movesto the ”WaitSynAck” mode where a timeout is hit and as a result it replays theACK message. In the meantime, the client connects and asks for a service fromanother server. Thus, when the first acknowledgement arrives, it sends a NACto the server. Consequently, the server moves to the ”Listen” mode. The client

33

decides to reconnect to the server and it immediately receives an acknowledgmentfrom him. However, the server is not aware of this new connection because theSYN message has been lost. This scenario fails since the client incorrectly assumesit has a connection, so it starts sending data packets to the server.

Looking into solutions requires defining the fundamental problem first. Theproblem starts from the multiple message overlaps. So, a fourth message (4-way-handshaking) won’t help because the problem will still be there but more unlikelyto occur. It should be noticed that, if we would be able to distinguish betweendifferent sessions of SYN -replayed SYN are considered to be in the same session-,the problem is solved. The server once it sends a NAC it repeats the NAC to allthe SYNs which belong to this session. There is no way that the server will acceptany SYN of the previous session and so we prevent the failure to happen. A trivialsolution is to keep track of the last session of all servers at the client side and allclients at the server side 4. This version is not acceptable by UniPro’s standardsbecause it doesn’t scale. On the other hand, trying to keep track of all differentsessions by using only one bit and not a table is not possible. Servers and clientscan connect to each other multiple times (odd or even) and thus keeping only oneextra bit is not enough to give information of every connection.

Explanation why the FIN message worksBy asking from the client to close the connection going through the ”WaitFinAck”state we prevent him to connect to another server until it receives the server’s FIN.The main idea is that the client doesn’t move to the next session unless it receivesall the answers (ACK, NAC) of the server from this session. The server mightsend multiple replies (ACK, NAC) to the client. If it sends ACKs that means thatit moves to the state ”WaitSynAck” and after that it can only send ACK messagesafter a timeout until it gets an answer from the client. It is obvious that over thissession the server might send multiple or no NAC messages and or multiple or noACK messages in the following sequence

NAC+ ACK∗

orNAC∗ACK+

That means that the client receives at least one of the messages ACK or NACbefore it goes to the next session. Once it receives an ACK, it can only receive anACK until it moves to the next session. If none of the NAC messages arrive tothe client then it means that it will send SYN messages until it gets an ACK andit will move to the next mode ”Connected” once it has it. Otherwise, it will stickto this session in ”WaitSynAck” mode. If at least one of the NAC messages arriveto the client, then the client before moving to ”Closed” state makes sure that ittakes all the answers from the server that are for the current session and then it isfree to ask for another request. In this case the client will stay in ”WaitFinAck”state and answer with a FIN message until it gets the FIN from the server. So,we make sure that the server gets all the messages until it gets the first FIN andthe client gets all server’s answers until it gets its FIN for this session 5. So both

4This version has been fully verified on Spin even with only 2-way-handshaking connection setup.A bit table with a length equal to the number of servers was needed at clients and a bit table with alength equal to the number of clients was needed at servers

5Note that the messages are always in order

34

ends absorb all the SYN, ACK and NAC messages that are in the channels beforethe client goes to the next session. This is the reason that any failure like the onementioned before is prevented.

4.4.4 Considerations

In this section we prove that the connection set up cannot be simpler than the 3-way-handshaking e.g. 2-way-handshaking (see Figure 25). The reason is that theserver after receiving the SYN message goes to the ”Connected” state immediatelyand starts sending data messages while the client does not expect it. This happenswhen the client receives at least a NAC message before the ACK message after itsends a couple of SYN messages to the server. As in the 3-way-handshaking thiscan be prevented if the client goes through the FIN messages before starting anew session after a NAC (see Figure 27), the same method can be used in 2-way-handshaking but it might be the case that the server will send some data messagesbefore the FIN message arrives. That depends on the protocol’s requirements. Forour specifications the 2-way-handshaking is not enough and so we use the 3-wayone.

We simplified the closing of the connection a lot by enforcing that the clientwill always be the first one who sends the FIN message to the server. That meansthat the server always closes before the client. In addition, a client is the onewho will request a new connection. Therefore, there is no way to mix one sessionwith another session. One can claim that having the client always closing first is alimitation to the protocol. That can be hidden as the server has the separate state”Want2Close” for that and it can send some notes to the client or piggyback aflag that it wants to close the connection. Apart from this, the server when beingin this state, stops sending data messages. So, there are several ways in which theclient is able to notice the server’s desire to close the connection. Therefore, eventhough we keep the session closing simple, there are no further limitations to ourmodel.

35


Network

Client 1 Server 2

Client 2 Server 1

ch[0]

ch[1] ch[2]

ch[3]

Figure 28: Network for Connection management protocol

In Figure 28 we present the network as modeled in the Promela language. Eachnode, client or server, has a unique channel attached to it, [ch[0], ..., ch[N − 1]]where N the number of nodes. This channel is used by each node to receive mes-sages. When a node sends a message, it does by directly sending it to the channelof the node it wants to contact. Our network is simple and does not representthe real one. The reasons are, firstly, full verification asks for abstractness andsimplicity and secondly, the aim of this study is to check the functionality andcorrectness of the connection management protocol and not routing of packets.

A message consists of two values, the type of the message and the sender. Thereceiver is not needed as each node sends the message directly to destination’schannel. The type of the message can be a value of the followingmtype = [CO SY N,CO FIN,CO ACK,CO NAC]. Note that we removed thedata messages because they do not play a role in checking the protocol’s function-ality.

Server

The server process uses 4 local variables: my client, t source, t type and got fin.The first one holds the identity of the client that the server is currently connectedwith. The variables t source and t type are temporary variables and they areused when the server receives a message. The former indicates the identity of thesender while the latter indicates the type of the message. The variable got finshows whether the server has received any FIN message of its current client in thecurrent session or not.

When a server sends a message to a client with identity A it will send it to thechannel ch[A]. The code in Promela would be ch[A]!t type(my id), where t typeis the type of the message and my id is the server’s identity so the client knowsthe sender. When a server receives a message ch[my id]?t type(A) it means thatclient A sent a t type message to its channel.

The code is presented below and it follows exactly the algorithm shown in theprevious subsection.

36

1 proctype s e r v e r ( byte my id )2 {3 byte my c l i ent ;4 byte t s ou r c e ;5 mtype t type ;6 b i t g o t f i n = f a l s e ;78 S L i s t en :9 /∗we only r e c e i v e messages ∗/

10 ch [ my id ] ? t type ( t s ou r c e ) ;1112 /∗ check the type o f the message and proceed ∗/13 i f14 : : t type == CO FIN −> /∗a FIN messages from prev ious s e s s i o n ∗/15 ch [ t s ou r c e ] ! CO FIN( my id ) ;16 goto S L i s t en ;17 : : t type == CO SYN −> /∗ a new c l i e n t asks f o r s e r v i c e ∗/18 ch [ t s ou r c e ] ! CO ACK ( my id ) ;19 my c l i ent = t s ou r c e ;20 goto S WaitSynAck ;21 : : e l s e −>22 a s s e r t ( f a l s e ) ;23 f i ;242526 S WaitSynAck :27 i f28 : : ch [ my id ] ? t type ( t s ou r c e ) −> /∗ can r e c e i v e a message ∗/29 i f30 : : t s ou r c e == my c l i ent −> /∗ the cur rent s e s s i o n ∗/31 i f32 : : t type == CO ACK −> /∗ expected ACK message ∗/33 goto S Connected ;34 : : t type == CO FIN −> /∗ the c l i e n t wants to c l o s e ∗/35 ch [ my c l i ent ] ! CO FIN ( my id ) ;36 goto S L i s t en ;37 : : t type == CO SYN −> /∗ r e p e t i t i o n o f SYN message ∗/38 ch [ my c l i ent ] ! CO ACK ( my id ) ;39 goto S WaitSynAck ;40 : : e l s e −>41 a s s e r t ( f a l s e ) ;42 f i ;43 : : e l s e −> /∗ other s e s s i o n s ∗/44 i f45 : : t type == CO SYN −> /∗ another c l i e n t ’ s r eque s t ∗/46 ch [ t s ou r c e ] ! CO NAC ( my id ) ;47 : : t type == CO FIN −> /∗ r e p e t i t i o n o f FIN message ∗/48 ch [ t s ou r c e ] ! CO FIN ( my id ) ;49 : : e l s e −>50 a s s e r t ( f a l s e ) ;51 f i ;52 goto S WaitSynAck ;53 f i ;54 : : ch [ my c l i ent ] ! CO ACK ( my id)−> /∗ TIMEOUT IN REALITY − resend ACK ∗/55 goto S WaitSynAck ;56 f i ;57

37

58 S Connected :59 i f60 : : ch [ my id ] ? t type ( t s ou r c e ) −> /∗ r e c e i v i n g a message ∗/61 i f62 : : t s ou r c e == my c l i ent −> /∗ cur rent s e s s i o n ∗/63 i f64 : : t type == CO FIN −> /∗ c l i e n t dec ide s to c l o s e the connect ion ∗/65 g o t f i n = true ;66 goto S Connected ;67 : : t type == CO ACK −> /∗ r e p e t i t i o n o f ACK message ∗/68 goto S Connected ;69 : : e l s e −>70 a s s e r t ( f a l s e ) ;71 f i ;72 : : e l s e −> /∗ other s e s s i o n s ∗/73 i f74 : : t type == CO SYN −> /∗ another c l i e n t ’ s r eque s t ∗/75 ch [ t s ou r c e ] ! CO NAC ( my id ) ;76 : : t type == CO FIN −> /∗ r e p e t i t i o n o f FIN message ∗/77 ch [ t s ou r c e ] ! CO FIN ( my id ) ;78 : : e l s e −>79 a s s e r t ( f a l s e ) ;80 f i ;81 goto S Connected ;82 f i ;83 : : t rue −> /∗ Server dec ide s to c l o s e the connect ion ∗/84 i f85 : : g o t f i n == true −>86 ch [ my c l i ent ] ! CO FIN ( my id ) ;87 g o t f i n = f a l s e ;88 goto S L i s t en ;89 : : e l s e −>90 goto S Want2Close ;91 f i ;92 f i ;9394 S Want2Close :95 ch [ my id ] ? t type ( t s ou r c e ) ; /∗ only r e c e i v i n g messages ∗/96 i f97 : : t s ou r c e == my c l i ent −> /∗ cur rent s e s s i o n ∗/98 i f99 : : t type == CO FIN −> /∗ the c l i e n t wants to c l o s e ∗/

100 ch [ my c l i ent ] ! CO FIN ( my id ) ;101 goto S L i s t en ;102 : : t type == CO ACK −> /∗ r e p e t i t i o n o f ACK message ∗/103 goto S Want2Close ;104 : : e l s e −>105 a s s e r t ( f a l s e ) ;106 f i ;107 : : e l s e −> /∗ other s e s s i o n s ∗/108 i f109 : : t type == CO SYN −> /∗ another c l i e n t ’ s r eque s t ∗/110 ch [ t s ou r c e ] ! CO NAC ( my id ) ;111 goto S Want2Close ;112 : : t type == CO FIN −> /∗ r e p e t i t i o n o f FIN message ∗/113 ch [ t s ou r c e ] ! CO FIN ( my id ) ;114 goto S Want2Close ;

38

115 : : e l s e −>116 a s s e r t ( f a l s e ) ;117 f i ;118 f i ;119120 }

Client

The client process has 3 local variables: my server, t source and t type. The firstone holds the server’s identity of the current session while the last two are thetemporary variables to hold the description of a message, e.g., t source indicatesthe sender and t type indicates the type of the message.

When a client sends a message to a server with identity A, it will send itdirectly to its channel ch[A]. So, the code in Promela will have the structurech[A]!t type(my id), where t type is the type of the message and my id the client’sidentity so the server knows the sender. Similarly, if it receives a message it takesit from its channel ch[my id]?t type(A) e.g. it receives a t type message fromserver A.

The code the client process is shown below and it is also a translation fromthe state machine explained in previous subsection.

1 proctype c l i e n t ( byte my id )2 {3 byte my server ;4 byte t s ou r c e ;5 mtype t type ;67 C Closed :8 /∗ random se rv e r s e l e c t i o n ∗/9 my server = choo s e s e r v e r ( ) ;

1011 i f12 : : ch [ my server ] ! CO SYN ( my id)−> /∗ asks f o r s e r v i c e ∗/13 goto C WaitSynAck ;14 : : ch [ my id ] ? t type ( t s ou r c e ) −>15 a s s e r t ( t type == CO FIN ) ; /∗ s e r v e r f i n i s h i n g an old connect ion ∗/16 goto C Closed ;17 f i ;1819 C WaitSynAck :20 i f21 : : ch [ my id ] ? t type ( t s ou r c e ) ;2223 i f24 : : t s ou r c e == my server −> /∗ cur rent s e s s i o n ∗/25 i f26 : : t type == CO ACK −> /∗ expected ACK message ∗/27 ch [ my server ] ! CO ACK ( my id ) ;28 goto C Connected ;29 : : t type == CO NAC −> /∗ the s e r v e r i s busy and not ab le to s e rve ∗/30 ch [ my server ] ! CO FIN ( my id ) ;31 goto C WaitFinAck ;32 : : t type == CO FIN −> /∗ s e r v e r f i n i s h i n g an old connect ion ∗/33 goto C WaitSynAck ;34 : : e l s e −>

39

35 a s s e r t ( f a l s e ) ;36 f i ;3738 : : e l s e −>39 a s s e r t ( t type == CO FIN ) ; /∗ another s e r v e r r epea t s the FIN message ∗/40 goto C WaitSynAck ;41 f i ;42 : : ch [ my server ] ! CO SYN ( my id ) ; /∗TIMEOUT IN REALITY∗/43 goto C WaitSynAck ;44 f i ;4546 C Connected :47 i f48 : : ch [ my id ] ? t type ( t s ou r c e)−> /∗ r e c e i v i n g a message ∗/49 i f50 : : t s ou r c e == my server −> /∗ cur rent s e s s i o n ∗/51 a s s e r t ( t type == CO ACK) ;52 ch [ my server ] ! CO ACK ( my id ) ;53 : : e l s e −> /∗ o ld s e s s i o n ∗/54 a s s e r t ( t type == CO FIN ) ; /∗ another s e r v e r r epea t s the FIN message ∗/55 f i ;56 goto C Connected ;5758 : : ch [ my server ] ! CO FIN ( my id ) −> /∗ the c l i e n t wants to c l o s e ∗/59 goto C WaitFinAck ;60 f i ;6162 C WaitFinAck :63 i f64 : : ch [ my id ] ? t type ( t s ou r c e ) ; /∗ r e c e i v i n g messages ∗/65 i f66 : : my server == t s ou r c e −> /∗ cur rent s e s s i o n ∗/67 i f68 : : t type == CO ACK −> /∗ r e p e t i t i o n o f ACK message ∗/69 ch [ my server ] ! CO FIN ( my id ) ;70 goto C WaitFinAck ;71 : : t type == CO NAC −> /∗ r e p e t i t i o n o f NAC message ∗/72 ch [ my server ] ! CO FIN ( my id ) ;73 goto C WaitFinAck ;74 : : t type == CO FIN −> /∗ s e r v e r i s c l o s ed ∗/75 goto C Closed ;76 : : e l s e −>77 a s s e r t ( f a l s e ) ;78 f i ;79 : : e l s e −>80 a s s e r t ( t type == CO FIN ) ; /∗ another s e r v e r r epea t s the FIN message ∗/81 goto C WaitFinAck ;82 f i ;83 : : ch [ my server ] ! CO FIN ( my id ) ; /∗TIMEOUT IN REALITY − r e s ends FIN ∗/84 goto C WaitFinAck ;85 f i ;86 }

40


The first question to emerge when it comes to the verification phase is ”what doesone want to verify?”. These are the properties that the protocol needs to possesin order to be considered to be correct. In this section, we first enumerate thecorrectness properties of the connection management protocol and show how theyhave been verified in Spin. After that, we discuss a number of techniques thathave been used in order to achieve a high level of protocol abstraction.

4.6.1 Correctness Properties & their Implementation

The main functionality of the connection management protocol is to make surethat data is sent only after both client and server agree on and establish a con-nection. It should also guarantee that no data is exchanged after a connection isclosed. Furthermore, connection integrity is supposed to be ensured, in that noother client or server interferes the connection. All those properties are directlyrelated to the clarity of the protocol. We verify that there is no other messagetype coming from a specific client or server at a specific state than the ones de-fined. That is, all states are clearly stated and the server or client only receiveexpected packets. Consequently, we always use assertions in our client and serverimplementation, in all the cases and states.

In addition, Spin automatically checks the algorithm for cycles, deadlocks ordead code.

4.6.2 Abstraction

Due to the number of states, channels, clients and servers involved, Spin asks for ahigh level of abstraction in order to get the system fully verified. Otherwise, Spinis not able to handle a full verification because of memory limitations. Cutting theproblem in smaller parts is out of scope as the client and server should both existto make the conversation available and to make the connection establishment andclosing create a continuous cycle. In the next paragraph we discuss several wayswhere we tried to reduce the state space generated by Spin without decreasingthe validity of the verification.

1. Network design: We argue that the number of channels we need is equal tothe number of nodes N that we have (N = clients + servers). An overheadwhich will be added when there are less than N channels, is the additionof a third variable in each message indicating the identity of the receiver.Spin, nevertheless will search all possible combinations in the channel andthe complexity will again increase even though the number of channels isless. Having one channel per node keeps the solution simple and clear.An issue we had to face in all possible network designs was a timeout thatblocks the whole system. Imagine the following scenario:Assume that client A and server B are in a conversation. Both their channelsare full and they both have received a message that they need to answer to.Then there is no way that the server can send the message to the clientbecause its channel is full and the same with the client. Furthermore, noneof them can absorb a message from the channel because they wait to send amessage to each other. So, the system is blocked and Spin reports an erroras a result of this deadlock.

41

A possible solution found for this issue is compiling the algorithm by using -mSpin’s option. This option says that if a message is sent to a full channel themessage will be lost. In this way, not only we prevent the system of gettingblocked but we also create a non-stable network that is closer to reality(erroneous connections). Another way of representing the same networkwithout blocking is by giving the option to send a message or not dependingon the channels’ free space.

2. Channel capacity: This variable indicates the number of messages thata channel can hold. After experiment, we found that if we keep this valueequal to 1, is enough to handle all possible cases - there is no dead code.If we keep the value equal to 0, it implies rendezvous communication. Spinfound a non-tested case (dead code) with rendezvous communication wherethe client and the server have to process every message immediately. In thisway, the cases that require sending a message before knowing what the otherparty has sent are not covered. Thus the channel capacity has to be of atleast 1 message.We argue that when the channel capacity is 1 message, all the actions andscenarios between clients and servers can still occur. Between the connectionof a particular server and client, both ends get one message and reply withone message at most. So, the worst case is when the client and the server arein a state where they both reply with this one message. This case is handledby the channel capacity of 1 message. For the rest of the cases, Spin exploitsand checks all different combinations. A replay of messages still occurs andeven if there is a chance that the channel is full and the message will be lostthere would be at least a case that the message will arrive to its destination.The same happens also with messages from different clients or servers thatare not connected. If a client or a server wants to send a message to a party,this message can be lost because a channel is full but eventually (with Spin’sexploitation) it will arrive to its destination. Remarkable to mention, thatthe algorithm is tested with bigger number of channel capacity on DiVinEand it is still fully verified.

3. Number of nodes: An additional server or client may increase the statespace dramatically (an extra channel is needed and this node will add extracombinations in other channels too). It is very important that we concentrateon the minimum amount of nodes needed so we do not loose any case andwe still verify the full functionality of the protocol. At least 2 clients areneeded because only then we have the competition among clients to get aconnection with a server. In addition we need a 2nd server, because we needto have more than one connection at the same time and make sure that theone does not interfere with the other. Therefore, in total we need at least 4nodes (2 clients + 2 servers).We believe that 4 nodes are sufficient to fully verify the protocols in our case.Each node is concentrated to its connected node, if any. With all the rest,it just answers with a FIN or NAC message and it returns to the samestate where it was. For both type of ends, server and client, we create anenvironment where the scenarios with the extra server or client exist. Thatis what the 2nd client and the 2nd server are used for. Each client repeatedlychooses one of server randomly and asks for service. Spin, exploits all thecases where both clients contact the same server, or different ones at differenttimes. There are no special variables that can be affected by more clients or

42

servers. In addition, the fact that we do not have dead code when we acceptexternal clients and servers in all states, it means that all cases are tested.

4. Reduce the number of different message types: We reduce the statespace by carefully reducing the number of possible values for the type of themessage by 1. Due to our observation that the CO DATA message doesnot create any new state and it created the same result as the CO ACKmessage, we decided to exclude it from the verification. As a result, thestate space was reduced and the correctness of the algorithm is not affected.

This version could be fully verified!

43

5 Router Management Protocol

We would like to prevent flooding the network because of congestion. Thus, weintroduce a congestion control mechanism for our system. We only allow a con-nection to be setup if there is enough bandwidth left. So, each connection reservesthe appropriate bandwidth on the routers during the connection set up and re-turns it when the connection is closed. In this section we introduce two protocols(the latter being an optimization of the former) which define the way that thereservation and release of the bandwidth is done.

5.1 Introduction

In this subsection we introduce the main symbols used for the diagrams. As inprevious sections, we assume that the connections are bidirectional and static, i.e.,the route that packets take from node A to node B always stays the same. So, allmessages travel in order.

Start

In between correct state

End

Addition of bandwidth that has been deducted

Subtraction of bandwidth that has been asked

Network is not capable to serve

Network can serve the connection

Replay message when timeout occurs

During this period the router has to keep triple (client, server, bandwidth) in memory

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

Figure 29: Legend for diagrams

Figure 29 shows all the symbols that are used in our diagrams in this section.We describe them with further detail below as shown in the legend.

1. This is the start state of the components (client, server, router);

2. We use this symbol when one of the end nodes moves from one state toanother and no error occurs;

3. This represents the end of the conversation;

44

4. This is only used for the routers and it indicates that with the arrival of thismessage the router will reduce the available bandwidth by the value that thenew connection requires;

5. This is only used for routers and it indicates that with the arrival of thismessage the router will increase the available bandwidth by the value thatthe connection took;

6. This is only used for routers and it indicates that the available bandwidth isnot enough to serve the connection;

7. This is only used for routers and it indicates that there is enough bandwidthavailable to serve the connection;

8. This is a general symbol which shows that a timer is set and the messagecan be replayed after a timeout;

9. This is necessary that routers have to hold triples of (client, server, band-width) in memory for some time so the double reservation or double releaseof the bandwidth is prevented. By this line covering the time line of therouters we indicate the time that the router should remember this triple.The beginning of the line means the creation of the triple and the end of theline means the deletion of the triple.

5.2 Router Congestion Control Protocol

In this subsection we study the router’s algorithm for congestion control as anadditional function of the connection management protocol. We also discuss asimpler solution with less messages needed. There is no need of changing themain connection management protocol. The protocol is designed to work withmore than one router in between but in some of the diagrams we show only one ofthem for reachability. In Figure 30 we describe the protocol’s behavior where thereis sufficient bandwidth along the line from client to server and reversely where inFigure 31 we show the protocol where there is not enough bandwidth in at leastone of the routers. The description of the connection management protocol is notrepeated so we suggest reading Section 4 before going through this section (as therouter congestion control management is build on top of connection managementprotocol).

Figure 30 shows the full connection procedure between a client and a server.The client starts by sending the SYN message to the server. Within the SYNmessage it also sends a specification of the bandwidth that needs to be reserved.Note that the routers do not make any reservation on receiving the SYN message,but just forward to the next router or server. The server accepts the client and itsends an ACK with an aggregate value of its client’s bandwidth value (bw1) and itsown bandwidth value (bw2). The routers, when receiving an ACK message fromthe server to the client, will first search if the triple (client, server, bandwidth)already exists in their memory. If it does then they won’t bother anything becauseit means that this is the second ACK message that they receive. If the tripledoesn’t exist they will immediately create it with a bandwidth value equal tobw1 + bw2. As we assume that all routers have sufficient bandwidth to serve theconnection, they reduce their bandwidth value by bw1+ bw2. The connection willbe established by the next ACK message and then data can be exchanged. Therouters won’t do anything else until they receive the FIN message from the serverto the client. That indicates the end of the connection on both sides. Then the

45

Client ServerInternal Routers

SYN(bw1)

ACK(bw1+ bw2)

ACK

ACK

ACK

DATA

DATA

DATADATA

FIN

FIN

FIN

SYN(bw1)

Closed Listen

WaitAck

WaitAck

Connected

Connected

ListenClosed

WaitFin

FIN

Figure 30: Message sequence when there is sufficient bandwidth

router will check if the triple still exists in memory. If it does, it means that thisis the first FIN message that the router receives for the route from server to clientand thus it will increase the available bandwidth value by bw1 + bw2 and removethe triple from memory. Note that the client cannot apply for a second SYNrequest before all the routers receive at least one FIN message. It cannot closethe connection without receiving a FIN from the server which implies that themessage will go successfully through all the routers. In addition note that replaysof the messages SYN, ACK and FIN do not interfere with our protocol becausethe triple is known and controlled in the routers during the basic transactions.

In Figure 31 we assume that there is not enough bandwidth in at least in oneof the routers. In contrast with the example where there is enough bandwidth inevery router, here the routers will not add the triple (client, server, bandwidth) inmemory. After an ACK message from the server to the client, they will first checkif the triple is already stored. If no, and they have enough bandwidth to serve theconnection, they will store the triple and reduce the bandwidth. If they do not

46

Client Internal Routers Server

Closed ListenSYN(bw1)

SYN(bw1)WaitAck

WaitAckACK(bw1+ bw2)

ERR

FIN

FINWaitFinListen

Closed Listen

FIN

FIN

Figure 31: Message sequence when there is NO sufficient band-width

have enough bandwidth then they will simply send an ERR message in the client’sdirection without storing the triple. If the triple is there already, it means thatthe reservation is already considered for this connection. So, the ACK messagewill just be forwarded. If the message ERR(router, client) is lost then the clientreplays the SYN message or the server replays the ACK message after timeout.The next routers after the one with insufficient bandwidth receive and forwardthe ERR message. When the client receives the ERR message it will immediatelyclose the connection as exactly it does when it receives a NAC message (by sendinga FIN and waiting for the server’s FIN to arrive). A second ACK message mayarrive to a router which has sent a NAC or ERR message before. This time, therouter may have adequate bandwidth to serve the connection. The connection,can still get established if the ERR or NAC message didn’t arrive to the client.In case the client has received one of them, then the connection will get closedbecause the ACK message from the client to the server will never be delivered butonly FIN messages. So, even if a router stores the triple and sends an ACK afterthe client receives an ERR message, the triple and the bandwidth will be releasedright after the reception of the FIN message from the server to the client.

Drawback

It is important that internal routers are aware over time (represented by the thickline) of the triple(client, server, bandwidth) because we avoid any repetition ofreducing or increasing the bandwidth value. Moreover, this solution is clear and

47

simple as the triple is there during the data exchange and it is removed whenthe connection is closed. Consequently, the routers are required to have a lot ofmemory in order to serve all possible connections. This idea does not scale becauseUniPro can have up to 128 devices. This can create thousands of connectionsbecause one device can have multiple ports.

With the current connection management protocol, it is necessary that thetriples stay saved in memory during the whole conversation. The basic reason thatthis is needed is for the closing of the connection; there are only two messages.For setting up the connection there are 3 messages (SYN-ACK-ACK) which aresufficient to make sure that the reduction of the bandwidth occurs only once. Withthe 3 messages of 3-way-handshaking we can ensure that the client won’t send anyother SYN message after the ACK (3rd message) because it has received the ACKfrom the server (2nd message). So, if the triple is saved in the routers after theSYN message and it is released after the ACK (3rd) message then the bandwidthregulation is safe. Unfortunately, this is not the case with only 2 messages. Ascenario is given below:

1. The connection between the server and the client is successfully set up andthe triple is released from memory;

2. Server and client exchange data;

3. The client wants to close the connection and it sends a FIN message to theserver;

4. All the routers put the triple in their memory with the first FIN messagethat they receive;

5. The FIN message arrives at the server and it replies back with another FINwhich will release all the triples in the routers and increase their availablebandwidth;

6. Before the server’s FIN arrives to the client, the client’s timer timeouts andit sends another FIN to the server;

7. The client receives the FIN and closes but at the same time the secondclient’s FIN message is in the channel.

So, the routers have no information to tell wether the second FIN message is of aconnection that has already closed and they will simply repeat the procedure andincrease the available bandwidth too.

To conclude with, for the current protocol triples need to be saved in the routersduring the entire conversation. In order to avoid this limitation, the opening andclosing of the connection are supposed to be done using at least with 3 messageseach. This will be studied in the next subsection.

5.3 Router Congestion Control protocol Optimization

In this subsection we introduce a new optimized idea based on our considerationsin previous subsection. Our aim is to decrease the time that the triple (client,server, bandwidth) stays in the routers’ memory. The triple is stored only duringthe setting up and closing the connection and not during the data exchange. Thiscan be handled easier by UniPro. Opening or closing connection at the sametime can be limited to a number. The fundamental difference with the previousprotocol is the addition of the extra message FIN ACK at the end, so we create a3-way-closing of the connection. Figure 32 shows a scenario where there is enough

48

Client ServerInternal Routers

SYN(bw1)

ACK(bw1+ bw2)

ACK

ACK

ACK

DATA

DATA

DATADATA

FINFIN

SYN(bw1)

Closed Listen

WaitAck

WaitAck

Connected

Connected

ListenClosed

WaitFinFIN (bw1+bw2)

FIN_ACK

FIN (bw1+bw2)

Figure 32: Message sequence when there is sufficient bandwidth

bandwidth and Figure 33 shows a scenario where there is insufficient bandwidthin at least one of the routers. The triple will be stored, deleted and modified inthe following way:

• Messages from client to server:

– SYN(bw1): The triple(client, server, 0) is stored if it doesn’t exist inmemory

– FIN(bw1+bw2): If the triple does not exist, then it will be created withbandwidth bw = bw1 + bw2. If it exists, this can be with bandwidthbw = bw1 + bw2 or bw = 0 in case the bandwidth has been already

49

returned– ACK: If the triple exists, it will be deleted 6

– FIN ACK: If the triple exists, will be deleted 7

• Messages from server to client:

– ACK(bw1+bw2): Only if the triple exists in the form (client, server,bw = 0) and there is enough available bandwidth, the value bw in thetriple will become bw = bw1 + bw2 and the same value will be reservedfrom the available bandwidth. In case the triple exists with bw = 0 andthere is no sufficient bandwidth, the triple will stay as it is and an ERRmessage will be sent to the next router or client. In case that the routerdoesn’t have the triple stored, it simply forwards the message

– FIN: If the triple exists with bandwidth bw ≥ 0, the available bandwidthwill be increased by bw and the bandwidth in the triple will becomebw = 0. Otherwise, the messages is just forwarded

– NAC: The message is just forwarded– ERR: The message is just forwarded

Client Router #1 ServerClosed Listen

SYN(bw1)

SYN(bw1)WaitAck

WaitAckACK(bw1+ bw2)

ERR

FIN(0)WaitFin

Closed Listen

FIN

FIN

Router #2

SYN(bw1)

ACK(bw1+ bw2)

FIN

FIN(0)

FIN(0)

Figure 33: Message sequence when there is NOT sufficient band-width

6No other SYN message will be sent for this session7No other FIN message will be sent from client to server in this session

50

In the variation that the figures represent we add an extra timer at therouter’s site for the FIN ACK message. In this way the previous connectionmanagement protocol (the algorithm at server and client) does not need to changeat all. Another advantage is that the server is not blocked until it receives theFIN ACK message and it can serve another client. However, practically it mightbe easier to move this timeout to the server because the timer is already imple-mented at server’s site and the routers do not have to run a timer for each ofthe connections which are in the process of closing. Moreover, during the realimplementation, if the timer takes place at the server, another issue has to betaken into consideration: the negative answer NAC of the server 8 to the messageSY N .

As we showed, the only way that the connection establishment algorithm iscorrect, is when the client goes through the closing of the connection after itreceives a NAC message. In this case the 3-way closing of the connection isunnecessary and not practical . The server sets a timer for it and is blockedwhere it is busy with another client. A question emerges wether the 3rd message,FIN ACK, in case of an ERR or NAC is necessary. We observed that it is notneeded if the message of such a closing differs from the usual FIN and there is away to distinguish them. We call this extra message FIN ERR 9

• In case of a NAC: When a NAC is returned it means that none of therouters has subtracted any bandwidth. So, the triple (client, server, band-width) has the value bandwidth = 0. Clearly, multiple messages or multipletransactions won’t affect the correct available bandwidth value. If an ACKmessage follows a NAC then the bandwidth value in the triple will changeand the algorithm is still correct for the same reason as it is in the case ofan ERR.

• In case of an ERR: As with the ACK followed by NAC, an ERR messagecan be followed by an ACK. But once the ACK message arrives at theclient -meaning all the routers have reserved bandwidth for this connection-it cannot change to ERR. In this case the value of available bandwidthand the bandwidth in the triple will also change. If any of the ERR orNAC messages arrives at the client, the connection will be finalized even ifthe client would receive an ACK. This does not affect the correctness of theprotocol, as by the reception of the message FIN ERR from the client to theserver10 the routers won’t do anything and the server will reply with anotherFIN ERR and it will close. The FIN ERR message from the server to theclient is the crux to return the bandwidth to the routers. The triple will bedeleted and the bandwidth mentioned in the triple will be released. If thetriple is not held in memory, it means that it has already been deleted andno transaction will take place but the message will be forwarded to the nextrouter closer to the client. The ACK message cannot be followed by anyerror, so if an ERR or NAC does not arrive at the client, the connectionwill eventually be set up. We conclude that the 3rd message is not necessaryin this case.

The issue with the server’s answer NAC while serves another client, is solvedby the 2-way closing algorithm. Another consideration is wether is necessary to

8The server is connected to another client9Note that the bandwidth value is not carried with the FIN ERR message

10Notice that this message won’t create any new triple

51

introduce a new message FIN ERR or we can reuse FIN . In the first case(NAC), when the server receives the FIN to close the connection, it can be intwo states; (a) still busy with another client or listening or (b) waiting for theclient’s second ACK. In the first one, the server and the routers are safe (nonetried to set up the connection as the server is the initiator and it is in a safestate). So, the server only sends a FIN and it skips blocking and waiting fora FIN ACK message. In the second case, the server has sent at least an ACKmessage. So, the bandwidth may be reserved in some of the routers. In thiscase the bandwidth in the triple won’t be zero (bw 6= 0). In addition, the clientis the one who knows that the connection failed. So, it sends FIN(0) instead ofFIN(bw1+bw2). Consequently, the routers that have subtracted bandwidth, willincrease the correct value after the server’s FIN message and repetitions of theclient’s FIN(0) message won’t affect it. Furthermore, the server by receiving aFIN(0) message knows that it may close without blocking. To sum it up, theFIN message can be used for the 2-way closing too. For a successful connectionsetup, the 3-way closing is needed while for a connection failure the 2-way one isadequate.

52


In the previous subsection we informally proved the correctness of the algorithmin the following cases:

1. The extra timer for the FIN ACK message can be implemented at theserver or at the router;

2. The connection can still be set up after a NAC or ERR message;

3. If the connection fails, i.e., an ERR or NAC message is received by the client,the connection maybe closed using the 2-way closing method.

In the verification discussed in this section, we keep the extra timer at the serverbecause it produces less code than implementing it at the router. Our modelincludes the second property where the connection insists on getting established.So, if a router has sent an ERR message (instead of forwarding an ACK message)because of insufficient bandwidth and it receives another ACK when it is ableto serve the connection, it sends an ACK. In addition, for simplicity reasons weonly use the 3-way closing of the connection when the connection fails to set up.Otherwise, an extra message type has to be introduced or another variable hasto be included in a message indicating the bandwidth 0 or 1 -the second optionworks in reality- (see previous subsection).

Network

X X X

Client ServerRouter 1 Router 2 Router 3

in[0] in[1] in[2] in[3]

out[0] out[1] out[2] out[3]

Figure 34: Model of the network in Promela

Figure 34 presents the network as modeled in our Promela implementation. Itconsists of two end nodes, a client and a server, and three routers in the middle.We keep the lowest number of end nodes because in this section we concentrate onthe router’s protocol and not on the connection management protocol (which hasbeen verified in previous section). We have three routers and the reason is thatin this way we cover all kind of routers that are possible between the client andthe server: the one that connects the client and another router (router 1), the onethat is only connected to other routers (router 2) and the one that is connected tothe server and the rest of the network (router 3). A router which directly connectsthe client and the server can be substituted from this network if we remove twoof the routers.

Since network links are bidirectional, in Promela two channels are needed tocreate traffic for both directions. This does not affect the verification, as bothchannels go through exactly the same routers. We call the path from client toserver in and the path from server to client out. So, each router forwards theinformation from the client to the server (left to right) through the in path andinformation from server to client (right to left) through the out channel. In thisway, the routers always know the direction of the packets.

53

The server accepts a message from the in[3] channel (in[3]?t type) and it sendsmessages to the out[3] channel (out[3]!t type), where t type is the type of themessage. The client receives messages from the out[0] channel (out[0]?t type) andsends messages to the in[0] channel (in[0]!t type), respectively.

As expected, two other message types are added. The message data type nowvaries between the following typesmtype = [CO SY N,CO FIN,CO ACK,CO NAC, ERR,CO FIN ACK]. Themessage ERR is added as a signal of the routers that they cannot serve the linewhere the CO FIN ACK message is used as the 3rd message of closing the con-nection. In addition, note that the only parameter needed in the message is thetype; the source and the destination can be avoided in this study as we only haveone server and one client. Furthermore, we keep the connection bandwidth re-quirement always to one unit and thus the bandwidth values bw1 + bw2 are notcarried within the message.

Router

1 proctype route r ( byte my id ){2 mtype t type ;3 b i t zero bw=true ; /∗ i f the bandwidth in the t r i p l e i s ze ro ∗/45 /∗main loop ∗/6 NetworkLoop :7 do8 : : t rue −>9 /∗ check 2 ca s e s s epa r a t e l y ( c l i e n t to s e r v e r and s e r v e r to c l i e n t )∗/

10 i f11 : : in [ my id ] ? t type −> /∗ c l i e n t to s e r v e r ∗/12 i f13 : : ( t type==CO SYN) −>14 /∗ s t o r e t r i p l e i f not a l r eady s to r ed ∗/15 open con [ my id ]= true ;16 c l o s e c on [ my id ]= f a l s e ;17 : : ( t type==CO ACK) −>18 /∗ remove the t r i p l e ∗/19 open con [ my id ]= f a l s e ;2021 /∗FIN because the connect ion was unable to e s t a b l i s h ∗/22 /∗ or ACK msg was l o s t & no r e a l data msgs were sent ∗/23 : : ( t type==CO FIN && open con [ my id]==true ) −>24 sk ip ;25 /∗FIN a f t e r a s u c c e s s f u l connect ion setup ∗/26 : : ( t type==CO FIN && open con [ my id]== f a l s e )−>27 i f28 /∗ f i r s t FIN msg r e c e i v ed ente r the29 t r i p l e with non−zero bandwidth ∗/30 : : ( c l o s e c on [ my id]== f a l s e )−>31 c l o s e c on [ my id ]= true ;32 zero bw=f a l s e ;33 : : e l s e−>sk ip ;34 f i ;3536 : : ( t type==CO FIN ACK)−>37 /∗ i n i t a l l v a r i a b l e s − remove t r i p l e ∗/38 /∗next SYN w i l l be from the same d i r e c t i o n ∗/

54

39 open con [ my id ] = f a l s e ;40 c l o s e c on [ my id ] = f a l s e ;41 : : e l s e −>42 a s s e r t ( f a l s e ) ;43 f i ;4445 /∗ forward message or l o o s e i t ∗/46 i f47 : : in [ my id +1] ! t type ;48 : : sk ip49 f i ;5051 : : out [ my id+1]? t type−> /∗ s e r v e r to c l i e n t ∗/52 i f53 : : ( t type==CO ACK && open con [ my id]==true ) −>54 /∗ check i f a l r eady the bw i s non zero in the t r i p l e ∗/55 i f56 : : ( zero bw==f a l s e ) −> sk ip ; /∗ j u s t forward the msg∗/57 : : e l s e −>58 /∗ check i f the re i s enough bandwidth∗/59 i f60 : : t rue −> /∗ yes ∗/61 /∗ reduce bandwidth and add the value in the t r i p l e ∗/62 ava i l ab l e bw [ my id ] = ava i l ab l e bw [ my id ] − 1 ;63 zero bw = f a l s e ;64 : : t rue −> /∗no∗/65 i f66 : : out [ my id ] ! ERR /∗ send ERR msg in s t ead ∗/67 : : sk ip /∗ l o o s e message ∗/68 f i ;69 goto NetworkLoop ;70 f i ;71 f i ;72 : : ( t type==CO ACK && open con [ my id]== f a l s e ) −>73 sk ip /∗ s t i l l has to send the msg∗/74 : : ( t type==CO FIN) −>75 i f76 /∗ the t r i p l e i s s to r ed ∗/77 : : ( open con [ my id]==true | | c l o s e c on [ my id]==true)−>78 i f79 /∗ i f the bandwidth in the t r i p l e i s ze ro −> sk ip ∗/80 : : zero bw==true −>81 sk ip ;82 : : e l s e −>83 /∗ i n c r e a s e a v a i l a b l e bandwith∗/84 ava i l ab l e bw [ my id ] = ava i l ab l e bw [ my id ] + 1 ;85 /∗make bandwidth in the t r i p l e ze ro ∗/86 zero bw=true ;87 f i ;88 /∗ the t r i p l e i s not s to r ed ∗/89 : : e l s e−>sk ip ;90 f i ;9192 : : ( t type==CO NAC) −> sk ip ; /∗ j u s t forward i t ∗/93 : : ( t type==ERR) −> sk ip ; /∗ j u s t forward i t ∗/94 : : e l s e −>95 a s s e r t ( f a l s e ) ;

55

96 f i ;9798 /∗ forward the message or l o o s e i t ∗/99 i f

100 : : out [ my id ] ! t type ;101 : : sk ip102 f i ;103 f i ;104 od ;105 }

The code above shows the router’s algorithm for the verification. We havetwo global tables of type bit and one of type byte; the open con which is truewhen the routers have the triple of (client, server, bandwidth) stored during theconnection establishment, the close con which is true when the triple is stored forthe connection closing and the available bw which holds the available bandwidthof all the routers. We keep those variables global in order to check the flow andthe correctness of our algorithm. In addition, we use two local variables. Thet type to store the value of the type of the message when the router receives amessage and the zero bw which is true when the bandwidth in the triple is zeroand false otherwise.

When a router receives a message, it will indicate wether the message goesfrom the client to the server (left to right) or from the server to the client (rightto left). Depending on the direction of the message and the type, the routerwill take the appropriate action. In case of the direction from the client to theserver, the router may receive a CO SY N message. That should store the tripleif it doesn’t already exist. After receiving the CO ACK message, the router willinitialize the open con variable as the triple is to be deleted. Accordingly, with theCO FIN message, the triple will be stored in memory until the CO FIN ACKmessage is received. In case the triple is there, no action takes place, otherwisethe triple will be stored with a bandwidth value different from 0. In case a routerreceives the CO FIN ACK message, it removes the triple its storage. For closingthe connection we have a different variable than opening the connection and thathelps to verify the protocol’s properties. In this case the router will forward themessage to the next channel on the right.

In case the router receives a message from the server to the client it will checkwhich case applies according to the type. For the CO ACK message, if the tripleis still stored in memory, it will check wether there is available bandwidth or not.This is chosen randomly for the verification (see lines 57-70 in the router process).Similarly, after receiving a CO FIN message, the router will check if the triple isthere and wether the bandwidth in the triple is not zero (bw 6= 0). That impliesthis is the first time that the router receives this message for this session. So, itincreases the bandwidth and changes the bandwidth in the triple to zero. For therest of the messages , CO NAC, ERR and CO ACK, when the triple has beenalready removed from memory, the router just forwards the message to the nextchannel on the left. Note that even if the triple has been removed, the router stillneeds to forward the message because one of the components might not receivethe message and the server sends a repetition of the message after a timeout.

56

Client

The client process did not change a lot from the connection management protocol.The difference is that we simplify the process where some code wasn’t necessarybecause the number of nodes is reduced from 4 to 2. Furthermore, we take care ofthe extra CO FIN ACK message. In contrast with previous client’s process, herethe client answers to the server’s CO FIN message. We also add the receptionof an ERR message where the client will react the same as receiving a CO NACmessage.

1 proctype c l i e n t ( ){2 mtype t type ;34 C Closed :5 i f6 : : in [CLIENT CHANNELS ] ! CO SYN−> /∗ sends a new reque s t ∗/7 goto C WaitSynAck ;8 : : out [CLIENT CHANNELS] ? t type −>9 a s s e r t ( t type == CO FIN ) ; /∗ s e r v e r f i n i s h i n g an old connect ion ∗/

10 /∗ won ’ t r ep ly because o f one s e r v e r − i t w i l l send SYN ins t ead ∗/11 goto C Closed ;12 f i ;1314 C WaitSynAck :15 i f16 : : out [CLIENT CHANNELS] ? t type ; /∗ r e c e i v e s a msg∗/17 i f18 : : t type == CO ACK −> /∗ expected msg∗/19 in [CLIENT CHANNELS ] ! CO ACK;20 goto C Connected ;21 : : ( t type == CO NAC | | t type == ERR)−> /∗ the connect ion f a i l s ∗/22 in [CLIENT CHANNELS ] ! CO FIN ; /∗ f i n a l i z e connect ion ∗/23 goto C WaitFinAck ;24 : : t type == CO FIN −> /∗ s e r v e r f i n i s h i n g an old connect ion ∗/25 /∗dont r ep ly i f i t i s my s e r v e r ∗/26 goto C WaitSynAck ;27 : : e l s e −>28 a s s e r t ( f a l s e ) ;29 f i ;3031 : : in [CLIENT CHANNELS ] ! CO SYN; /∗TIMEOUT−resend msg∗/32 goto C WaitSynAck ;33 f i ;3435 C Connected :36 /∗ check that the r e s e r v a t i o n i s done in a l l the r ou t e r s ∗/37 a s s e r t ( ava i l ab l e bw [0]==0 && ava i l ab l e bw [1]==0 && ava i l ab l e bw [ 2 ] == 0 ) ;38 i f39 : : out [CLIENT CHANNELS] ? t type−> /∗ r e c e i v e s msg∗/40 a s s e r t ( t type == CO ACK) ; /∗ r e p e t i t i o n o f ACK msg∗/41 in [CLIENT CHANNELS ] ! CO ACK;42 goto C Connected ;43 : : in [CLIENT CHANNELS ] ! CO FIN −> /∗ f i n a l i z e s connect ion ∗/44 goto C WaitFinAck ;45 f i ;4647 C WaitFinAck :

57

48 i f49 : : out [CLIENT CHANNELS] ? t type −> /∗ r e c e i v e s a msg∗/50 i f51 : : t type == CO ACK −> /∗ r e p e t i t i o n o f an ACK msg∗/52 in [CLIENT CHANNELS ] ! CO FIN ;53 goto C WaitFinAck ;54 : : t type == CO FIN −> /∗ r e c e i v e s FIN ∗/55 in [CLIENT CHANNELS ] ! CO FIN ACK;56 a s s e r t ( ava i l ab l e bw [0]==1 && ava i l ab l e bw [1]==157 && ava i l ab l e bw [ 2 ] == 1 ) ;58 goto C Closed ;5960 /∗ r e p e t i t i o n o f ERR or NAC msg∗/61 : : ( ( t type == ERR) | | ( t type == CO NAC)) −>62 in [CLIENT CHANNELS ] ! CO FIN ;63 goto C WaitFinAck ;64 : : e l s e −>65 a s s e r t ( f a l s e ) ;66 f i ;67 : : in [CLIENT CHANNELS ] ! CO FIN−> /∗TIMEOUT IN REALITY∗/68 goto C WaitFinAck ;69 f i ;70 }

Server

The server’s process is also similar to the original connection management proto-col. The fundamental differences here are the new state of the server, S WaitF inAck,and the reception of the message CO FIN ACK. In addition, we create a com-petitive environment for the unique client (as in case with multiple clients), so thecase of a NAC message is also covered (see lines 12-20 in server’s process).

1 proctype s e r v e r ( ){2 mtype t type ;34 S L i s t en :5 in [SERVER CHANNELS] ? t type ; /∗ the s e r v e r only r e c e i v e s msg∗/6 i f7 : : t type == CO FIN ACK −> /∗ r e p e t i t i o n o f FIN ACK ∗/8 goto S L i s t en ;9 : : t type == CO FIN −> /∗when NAC was sent a FIN can be r e c e i v ed ∗/

10 out [SERVER CHANNELS ] ! CO FIN ;11 goto S WaitFinAck ;12 : : t type == CO SYN −> /∗ i f a new connect ion i s reques ted ∗/13 /∗random busy − non busy technique ∗/14 /∗ c r e a t e an environment with more than 2 nodes ∗/15 i f16 : : out [SERVER CHANNELS ] ! CO ACK; /∗ can be ACKed∗/17 goto S WaitSynAck ;18 : : out [SERVER CHANNELS ] ! CO NAC−> /∗ or NACed∗/19 goto S L i s t en ;20 f i ;21 : : e l s e −>22 a s s e r t ( f a l s e ) ;23 f i ;2425 S WaitSynAck :

58

26 i f27 : : in [SERVER CHANNELS] ? t type −> /∗ r e c e i v e s msg∗/28 i f29 : : t type == CO ACK −> /∗ expected msg∗/30 goto S Connected ;31 : : t type == CO FIN −> /∗ caused because o f prev ious NAC or ERR∗/32 out [SERVER CHANNELS ] ! CO FIN ;33 goto S WaitFinAck ;34 : : t type == CO SYN −> /∗ r e p e t i t i o n o f SYN msg∗/35 out [SERVER CHANNELS ] ! CO ACK ;36 goto S WaitSynAck ;37 : : e l s e −>38 a s s e r t ( f a l s e ) ;39 f i ;40 : : out [SERVER CHANNELS ] ! CO ACK −> /∗ TIMEOUT − r epea t s ACK ∗/41 goto S WaitSynAck ;42 f i ;4344 S Connected :45 /∗ a l l the t r i p l e s are erased from the rou t e r s ∗/46 a s s e r t ( open con [0]== f a l s e && open con [1]== f a l s e && open con [2]== f a l s e ) ;47 a s s e r t ( ava i l ab l e bw [0]==0 && ava i l ab l e bw [1]==0 && ava i l ab l e bw [ 2 ] == 0 ) ;4849 /∗data messages are ommited∗/50 in [SERVER CHANNELS] ? t type ; /∗ only r e c e i v i n g msg∗/51 i f52 : : t type == CO FIN −> /∗ c l i e n t asks to c l o s e the connect ion ∗/53 out [SERVER CHANNELS ] ! CO FIN ; /∗ s e r v e r accept s ∗/54 goto S WaitFinAck ;55 : : t type == CO ACK −> /∗ r e p e t i t i o n o f ACK msg∗/56 goto S Connected ;57 : : e l s e −>58 a s s e r t ( f a l s e ) ;59 f i ;6061 S WaitFinAck :62 i f63 : : in [SERVER CHANNELS] ? t type −> /∗ r e c e i v e s a message ∗/64 i f65 : : t type == CO FIN −> /∗ r e p e t i t i o n o f FIN msg∗/66 out [SERVER CHANNELS ] ! CO FIN ;67 goto S WaitFinAck ;68 : : ( t type == CO FIN ACK | | t type ==CO SYN) −> /∗ c l i e n t i s c l o s ed ∗/69 goto S L i s t en ;70 : : e l s e −>71 a s s e r t ( f a l s e ) ;72 f i ;73 : : t rue −> /∗ t imeout ∗/74 out [SERVER CHANNELS ] ! CO FIN ;75 goto S WaitFinAck ;76 f i ;77 }

59


In this section, we enumerate the correctness properties of the protocol and showhow we verify them in Spin. In addition, we discuss the abstractness of the model.

5.5.1 Correctness Properties & their Implementation

Within this implementation, we check four different properties. Their definitionsand the way we check them are explained.

Flow and correct reception of messages

One of the main reasons to use Spin, is to understand and realize the flow of themessages and consequently, the flow of the protocol itself. So, we ensure thatwe think about all possible receivable messages in each state at each componentand realize their reception. To achieve this, we include assertions in each state(Listen, Closed, etc ) of each component (router, client, server), so we prevent anyreception of an unexpected message. In case we receive an unexpected message,an error will be reported by Spin. This method helped us in discovering cases wecouldn’t think of at first glance. Some examples of this are:

• Line 41 in router’s process in Section 5.4

• Line 22 in client’s process in Section 5.4

• Line 28 in server’s process in Section 5.4

Triple management

For the router management protocol, a lot of discussion and considerations werefor the triple (client, server, bandwidth). We ensure that all routers add andremove the triples correctly. Because of possible repetitions or lost messages andthe different non-synchronized nodes’ states, that wasn’t trivial. For our goal, wecreated two different global variables which indicate wether a triple is stored ornot: open con and close con. The former indicates whether the triple is storedbecause the connection is being set up and the latter indicates whether the triple isstored because the connection is closing. Assertions are included in the code of theserver which ensure the deletion of the triple. An example is line 46 in the server’sprocess in Section 5.4. When the server is in state connected, it means that itreceived an ACK message and so all triples have to be deleted from the routersbecause of opening the connection. However, this is not the case with the predicateclose con. A client should send a FIN message that will save the triple again inthe routers while server is still connected. Lastly, triples are important becausethey are created to prevent any double reservation or return of bandwidth. So, thetriple management correctness is indirectly covered by the next two properties.

Bandwidth reservation

With this property, we ensure that bandwidth is reserved during connection set upand it is released by closing the connection in all routers. The predicate used forthis property is the byte-table available bw. Its values are found in the router’sprocess and checked in the client’s and server’s ones. The available bandwidthof each router changes depending on the status of the triple and the type of themessage being received. We check this property by using assertions in three placesin the client’s and server’s code:

60

• Line 37 in client’s code in Section 5.4: Here the client is in ”Connected”state, so in all routers the reservation is made and the value of the availablebandwidth is 0;

• Line 56-57 in client’s code in Section 5.4: Here the client is in ”WaitFinAck”state and it received a FIN message, so in all routers the release of thebandwidth is done and the value of available bandwidth is 1;

• Line 47 in server’s code in Section 5.4: Here the server is in ”Connected”state, so in all routers the reservation is done and the value of availablebandwidth is 0.

Unique transactions of bandwidth reservation and return

1 /∗ de f i n e c o r r e c t a v a i l a b l e bandwidth va lue s f o r each route r ∗/2 #de f i n e r1 ( ava i l ab l e bw [0]==0 | | ava i l ab l e bw [0]==1)3 #de f i n e r2 ( ava i l ab l e bw [1]==0 | | ava i l ab l e bw [1]==1)4 #de f i n e r3 ( ava i l ab l e bw [2]==0 | | ava i l ab l e bw [2]==1)56 never {7 T0 in i t :8 i f9 /∗ ensure in each execut ion that bw has only c o r r e c t va lue s ∗/

10 : : ( r1 && r2 && r3 ) −> goto T0 in i t ;1112 /∗ i f one o f the r ou t e r s does not13 acceptance cy c l e occur −> show e r r o r ∗/14 : : e l s e−> goto a c c e p t a l l ;15 f i ;16 a c c e p t a l l :17 sk ip18 }

Figure 35: Never claim for the verification

With this property we prove that there are no duplications in additions andsubtractions of the available bandwidth. The available bandwidth values are allinitialized to 1. In our model there are only two nodes, so only one connectioncan be set up at any time. Consequently, the available bandwidth can have onlytwo values; 0 when this connection is open and 1 when the connection is closed.This property must hold in every execution of the algorithm. A never claim issuitable for this purpose (see Figure 35). We define the correct status for each ofthe 3 routers; r1, r2 and r3. In each execution of the algorithm, line 10 of thenever claim is checked. If in any step one of the routers has different availablebandwidth value from 0 or 1, the never claim will jump to the accept all label andterminate. This is considered an acceptance cycle so Spin will produce an error.

61

5.5.2 Abstraction

A significant consideration in the verification phase is the choice of abstractnesslevel. Our model’s abstractness reduces the verification complexity and aims not torestrict the protocol’s properties of being fully verified. The number of processes,number of channels and their specifications and number of execution paths withinthe processes, all play important role in the verification complexity. For the routermanagement protocol, we consider and analyze the following parameters.

• Number of routers: The purpose of this implementation was to model therouter management protocol in combination with the connection manage-ment protocol. The connection management protocol was already fully ver-ified in previous section. The routers and their configuration are the mainpurpose of this section. We therefore use three routers in the middle, be-tween the client and the server, so we ensure that all different cases of routers(between a client or a server, between other routers) are covered. In addi-tion there are more than 2 routers where the case that the one router hassufficient bandwidth and the other it doesn’t, is also tested.

• Number of nodes: We use only one server and one client (two nodes in total).The reason is that we already analyzed the client and server behavior in theprevious section. So, we realize all states and behaviors in the previoussection. We use the minimum number of nodes that can create a connectionbut at the same time we implement them in such a way that the environmentwith multiple clients and servers are covered, e.g., the server can still sendNAC even if there is no other client. Moreover, the routers also send an ERRto cover cases that the bandwidth is insufficient. There is no dead code andall the states and cases which have been found in previous section are stillin use with the aforementioned extra messages.

• Channels and Channel Capacity: This predicate indicates the number ofmessages that a channel can hold (router buffer in practice). The lower itis, the less the time and memory will be needed for verification. We chooseit to be of value 1 because it is enough to cover all the cases. There are8 channels in total which is the minimum number of channels to modelour network. Each one has a capacity of one message which implies therecan be 4 messages on hold from client to server and another 4 from serverto client. As explained in the previous section, those are enough to coverthe cases where client’s and server’s messages are crossed. Note that withchannel capacity equal to zero, we have rendezvous communication. If arouter receives a message, it has to process it immediately and send it tothe next router. The next router the same until the message arrives in theend node. This thus reduces to the scenario that the server does not sendanything until the message from the client arrives or the other way around.When one end sends a message, the other is blocked waiting for the messageto arrive. If it does not wait, at least one of the messages will get lost ina router with another message, so at most one message can proceed. So,rendezvous communication does not cover the scenario where messages arecrossed.

• 2-way & 3-way closing: The choice of only using the 3-way closing saves alot of verification overhead. By adding both of them, another message typehas to be added or an extra one bit variable has to be added to each message(messages in each channel have exactly the same template). For the NAC

62

message, the 2-way closing is indirectly used because the server knows itdoesn’t wait for the FIN ACK to be arrived. In the case of ERR, 3-wayclosing is used.

63

6 Run Time

Tool Version Compute Nodes #

Spin 5.1.7 1

DiVinE - reachability 1.0 build 9 multiple

DiVinE - distr map 1.1 build 5 multiple

DiVinE - OWCTY 1.0 build 14 multiple

Table 3: Tool versions which have been used

In this section, we present some results of the verifications of the three mainprotocols separately. We use the first protocol, USW, to learn and compareSpin’s options, and compare DiVinE’s performance and different algorithms. InSpin, we compare the verifications with different compression algorithms whilein DiVinE, we compare the simulations on a different number of DAS nodes 11

[1, 2, 4, 8, 16, 32] and three different distributed algorithms: one reachability al-gorithm (reachability) and two cycle detection algorithms (OWCTY and MAP).The first one searches for assertion violations and the two others for acceptingcycles (wether a never claim is accepted). Table 3 shows all the tools, their ver-sions and algorithms used. An example of a script that we use to run the codewritten in Promela on DAS3 is shown in Appendix A.1. Before running the verifi-cation, compilation of the Promela code takes place. Then, we run the verificationon 2 compute nodes (−np2) using 2 processors on each (−1). We first use thereachability algorithm and after the OWCTY one.

USWcc=2 USWcc=3 USWcc=4 CMcc=1 CMcc=2 RMcc=1

Assertions√ √ √ √ √ √

Never Claims√ √ √ √

Spin√ √

DiVinE√ √

≥ 32√

≥ 4 ≥ 16

Table 4: Protocols’ Overview

Table 4 presents all the protocols, USW, CM and RM tested with a differ-ent values of channel capacity (cc). For USW and RM, we use assertions andnever claims to verify the protocols but for CM we only use assertions. The twolast rows of the table show the cases when the verification was complete. Whenthe

√is absent, it means that the verification couldn’t finish because of memory

limitation. In case of DiVinE, when a number is presented means that the execu-tion couldn’t finish in a limited time when the number of compute nodes used issmaller than this number. During comparisons, we concentrate on the full verifi-cation execution time and the total memory needed to store the different states.

11Each node consists of at least two processors and we use both of them in all the experiments

64

No Spin-DiVinE comparisons are shown in the last two protocols, connection androuters management. The main reasons are:

• For small problems results are similar to the USW protocol.

• Assertion violations in DiVinE are not fully supported (they are detected butsupport for error trails is not yet in the default version on DAS-3). In the 2nd

protocol, we only use assertions and our interest is in assertion violations.As Spin finished a full verification on it with channel capacity equal to 1, useof DiVinE wasn’t necessary. In addition, we test the protocol on DAS usingDiVinE with channel capacity equal to 2.

• The router management protocol with channel capacity to set 1 couldn’t beverified in Spin on a single machine but only in DeVinE by using more than8 DAS3 nodes.

OS Nodes Cores Speed Network

DAS3-VU Scientific Linux 2 2.4GHz Myri10G & GbE

ST-ERICSON Linux jupiter7 4 3.16GHz

Table 5: Machines where simulations were executed

Table 5 shows the machines specifications where we run our verifications. Thefirst row is the specification of nodes of the cluster at Vrije Universiteit (VU) whereDeVinE is used and the second row represents a single machine where we run thesimulations in Spin. All values presented in the tables and plots are the average of3 different executions. Standard deviation and standard error are not presentedas in all cases the 3 executions had negligible numerical differences. Values arerounded to at most 2 decimal places for time and to integers for the memory. Thenumber of states is always the same for each tool and the same problem. So, weonly compare the states when we investigate the parameters of a problem.

6.1 USW protocol

For the USW protocol we test two never claims independently: the first neverclaim checks the 1st (error control) and 2nd (order) correctness properties andthe second one checks the 3rd (unique) property. Both of them are fully verifiedand as we will notice there is a negligible difference in states, memory and time.So, in the comparisons we show only the results from the 1st never claim and weignore the 2nd never claim (some more results can be found in the Appendix).The conclusion is the same with the 1st never claim.

6.1.1 Spin simulation

Spin was only able to verify the protocol with maximum window size N ≤ 3. Fora bigger N value, the available memory was insufficient and the full verification

65

couldn’t finish. Trying to verify the protocol with N = 4 by using Spin’s compres-sion options didn’t result to a new result either. Bare in mind that, by increasingthe value of N , the number of messages that a channel can hold (channel capacity)and in the sequence number of the message can varied in [0, 1, 2, 3] instead [0, 1, 2].Remembering the formula in 2.2, this can lead to a steep increase of the numberof states, total memory and verification time correspondingly.

None -DCOLLAPSE -DMA=84 -DHC -DBITSTATE

Memory (MB) 473 470 390 11 2

Time(sec) 1.43 1.27 4.23 0.62 0.86

States 451692 451692 451692 451692 451662

Table 6: Spin’s compression algorithms tested for USWcc=2

In Table 6 we show the memory needed to store the states, the execution timeand the total number of states for the 4 different compression algorithms and N =3. The first column shows the results when no extra compression option is used.The second and third columns (-DCOLLAPSE and -DMA=84) are compressionalgorithms where full verification is still performed (the number of states searchedare the same in these three cases). As one observes, -DCOLLAPSE is the quickesteven if the compression overhead is added, but with no big difference in memoryneeded in comparison with the first column. In constrast, the -DMA algorithmsaves 17.5% of memory but this causes extra computation overhead. Normally,there is a well known tradeoff between time and memory use. This is not arepresentable case for the -DCOLLAPSE option because the problem is small andthere is no much need or effect in the compression. The fourth and fifth column(-DHC and -DBITSTATE) represent an approximation of the full verification.For our protocol the two approximation algorithms perform well, but when therandom seed is the default one, the -DBITSTATE option finds less different statesthan the correct number. Thus, there is no reason for using them for even greaterN values.

From our experience, the total memory is directly influenced by the number ofvariables, the types of the variables and the complexity of the code. So, we keepthe code as simple and small as possible. We avoid ”printf” statements duringverification and we reduce the number of variables needed. In addition, if a valueof a variable is known, it is better written as a macro and not being recalculatedover and over again. This results to a better full verification time.

The output of the verification with the 1st never claim enabled is shown inAppendix B.1.1 and with the 2nd never claim enabled in Appendix B.1.2.

6.1.2 DiVinE

In DiVinE we test USW protocol’s correctness properties by using 3 differentalgorithms; one reachability algorithm (reachability) and two cycle detection al-gorithms (OWCTY and MAP). Examples of the output that we get from DiVinEare shown in the appendix B.1.3 for reachability and B.1.4 for accepting cycles. A

66

nodes

Figure 36: Total memory usage when the 1st never claim isenabled and N = 3

nodes

Figure 37: Total memory usage when the 1st never claim isenabled and N = 4

correct execution is the one with no assertion violations for the former and withno accepting cycles for the latter.

In contrast with the verification execution in Spin on a single machine, DiVinEruns on DAS3 and its algorithms aim to decay the memory limitations met in Spin.However, the execution verification time can be expected to increase sharply asN increases. We test the USW protocol with N = 3, N = 4 and N = 5 on

67

[1, 2, 4, 8, 16, 32] dual nodes. For the execution with N = 5, we show the resultonly when it executes on 32 nodes, as it takes too long to finish.

Figure 36 and 37 represent the total memory usage while the number of nodesincreases for N = 3 and N = 4 respectively. The difference between the 3 algo-rithms when N = 3 is invisible while the problem is too small. When N = 4,there is a static difference between the 3 algorithms with the reachability algo-rithm to use less memory and the OWCTY algorithm to use the most memory.The memory increases gradually, when the number of nodes is increased as a re-sult of of the −H20 option during the verification. This option reserves specifichash memory space in each node where the states can be stored. Thus, even ifnot necessary for small problems, this memory will still be counted and shown inthe total memory used. That is the reason which the total memory used increasesgradually, no matter the size of the problem. Comparing with Spin, the memoryneeded in DiVinE is slightly less than the one in Spin. We had similar behaviorwhile the 2nd never claim is tested (see Appendix 45). An interesting point tomention is that the memory increase is less when increasing the number of nodesfor larger problems.

nodes

Figure 38: Execution time when the 1st never claim is enabledand N = 3

Figures 38 and 39 show the verification execution time for N = 3 and N = 4.As expected, when N = 3 the problem is relatively small to run on a cluster.The horizontal line represents the time that Spin took to finish the executionon a single machine and that is only reached when 8 dual nodes are in charge.The time becomes even worse when 32 dual nodes work. On the other hand, theproblem with N = 4 has very good behavior 12. The speedup of the executions byseeking the number of nodes are also shown in Figures 40 and 41. The line (x = y)represents the perfect speedup and the other the different algorithms. Clearly, theN = 3 problem gives bad speedups as the distribution overhead is greater than

12Remember that Spin wasn’t able to finish the execution

68

nodes

Figure 39: Execution time when the 1st never claim is enabledand N = 4

the execution time. However, we have an ideal speedup with N = 4 which makesDiVinE useful. Similarly, the 2nd never claim for N = 3 is tested and gives similarresults (see Figures 46 and 47 in Appendix).

nodes

ideal

Figure 40: Speedup when the 1st never claim is enabled andN = 3

69

nodes

ideal

Figure 41: Speedup when the 1st never claim is enabled andN = 4

Spin / 1 proc N = 3 N = 4 N = 5

Memory (MB) 473 / 370 5689 (104) 10973 (105) 108963 (106)

Time(sec) 1.43 / 10.9 1.85 10.8 226.5

States 451692 / 1291063 1291063 (107) 3373862 (107) 615232954 (109)

Table 7: Comparisons by increasing the value N for 32 nodes

Finally, we show a table where we keep the number of nodes stable to 32 -except from the first column- and we vary the number of N between [3,4,5] (seeTable 7). The first column before the ”/” shows the results when we execute theverification using Spin and for N = 3 and after the ”/” the results in DiVinEby using only one node. One can easily observe that by increasing N , memoryand states increase by almost a factor 10. It is again clear that DiVinE performswell for big problems that couldn’t get verified by Spin, which makes its studyvaluable. We also highlight the difference in the number of states between Spinand DiVinE on a single machines for N = 3. DiVinE creates much more statesbut needs less memory to store them than Spin. In conclusion, the number ofstates sharply increases by increasing the value of N and DiVinE’s states havedifferent meaning than the ones in Spin.

6.2 Connection Management protocol

The Connection management protocol needs channel capacity cc = 1 in order tocover all the possible cases of our model. The model was fully verified in Spinand we extend its verification by running a simulation on DAS using DiVinE andchannel capacity cc = 2. Results are shown below.

70

6.2.1 Spin simulation

In our Promela model for the connection management protocol, we keep the num-ber of channels low so we reduce the complexity of the verification. This leads toan error in Spin because it causes a timeout. There are some combinations whereall the channels are blocked and all nodes need to send a message to some othernode. As there is no space in any of the channels and Spin reports that. Onewould first think increasing the channel capacity might help. This fails becausewhatever the capacity is, there would always be a case caused by repetition ofmessages which will fill the channels. Another thought is to create a mechanismin each case where a message is sent, that can choose between sending or loosing amessage. When a channel is full, Spin will automatically choose the second choice.However, there is a better solution, offered by Spin’s option ”-m”. This optionchanges the semantics of send events: if a message is sent to a full channel, themessage will then be lost.

Total States Time[sec] Memory [MB]

4807137 56 118

Table 8: Simulation in Spin

The verification analyses 4807137 states in 56 seconds and uses 118 MBytesof storage (see Table 8). An output with a successful full verification and channelcapacity cc = 1 is presented in Appendix B.2.1. In addition, we reports results forchannel capacity cc = 0 (which implies rendezvous communication) in AppendixB.2.2. In this case Spin shows unreached dead code. This is an indicator thatcc = 0 is not enough to fully verify our protocol.

6.2.2 DiVinE simulation

Even though we informally proved that a channel capacity of one message isenough to verify our protocol, we tried to extend the verification to a channelcapacity of two messages. As Spin was unable to finish this verification, we ranit on DAS using DiVinE. Indeed this verification finishes successfully. We presentone of the outputs in B.2.3.

In the connection management model we only use assertions. Thus, the onlyDiVinE algorithm we can use is the one for reachability. Each execution has a timelimit of 2 hours (7200sec). We tried the verification with a different number ofnodes [1, 2, 4, 8, 16, 32] where only the ones of [4, 8, 16, 32] could finish within thetime limit. We show the results in logarithmic plots 42, 43 and 44. The Figure42 shows the memory usage for a different number of nodes. For 1 and 2 nodesthis value is set to zero as DiVinE does not show any information for incompleteverifications. Figure 43 represents the execution time for a different number ofnodes. The value 7200 is set for the incomplete verifications. It is importantto observe that the execution of the verification takes no more than 100 seconds(84.9 seconds). Although the expected execution time on 2 nodes is 200 seconds,it is not able to finish within 7200 seconds. This overhead is a consequence ofthe overload of memory each node needs (causing excessive swapping to the disk).

71

Figure 42: Total memory usage for channel capacity cc = 2

nodes

Figure 43: Execution time for channel capacity cc = 2

The values of speedup graph 44 are calculated regarding to the time execution on4 nodes. The linear line shows the ideal speedup for any number of nodes. Ourresults strictly follow the line for 4 or more nodes.

6.3 Router Management protocol

In router management protocol we need 8 channels of at least one-message channelcapacity. This fact effects to verification complexity. We tried to fully verify ourmodel in Spin using the highest compression option (-DMA=84) and with theoption of loosing data, when channels are full, on(-m). The simulations terminatesby partially verifying the model as it runs out of memory after the 88949473rd

72

nodes

Figure 44: Speedup for channel capacity cc = 2

depth is reached. Until this depth, no error was found. The output is shownin Appendix B.3.1. In this verification, DiVinE is successful, however, there ourmodel could be fully verified, without further simplification, on 16 or more nodes.

6.3.1 DiVinE simulation

We simulate our model, written in Promela, with channel capacity of one messagecc = 1 on DAS3. As the router management verification code contains assertionsand never claims we use both kind of DiVinE algorithms, reachability and accept-ing cycles. We run our simulation on 16 and 32 nodes. Some of the results areshown below.

73

Nodes reachability OWCTY MAP

16 43076 52493 47775

32 45494 54144 49388

Table 9: Memory[MB] usage with channel capacity cc = 1

Table 9 shows the total memory in MBytes needed to store the different states.As in the USW protocol, the OWCTY algorithm needs slightly more memory thanthe two other algorithms. Another observation is the relatively small differencein MBytes between 16 nodes and 32 nodes. Note that the average memory usageper node when we run the simulation on 16 nodes is almost double the one on 32.

Nodes reachability oWCTY MAP

16 249.05 466.6 312.2

32 125.7 135.1 130.4

Table 10: Execution time[sec] with channel capacity cc = 1

The execution time in the different cases are represented in Table 10. Amongthe three algorithms, the reachability one is the fastest where between the twoaccepting cycle algorithms MAP is the fastest. An interesting point is the relationbetween the two data rows. One expects that the first row has double the valueof the second row. This holds for reachability and the MAP algorithm but notfor OWCTY. The limited amount of memory per node explains this imperfection.Some swapping was most probably necessary.

An example of the output for the reachability algorithm (assertions) is providedin AppendixB.3.2 and one for accepting cycles (never claims) in Appendix B.3.3.

74

7 Conclusions & Future Work

In this report,we analyzed and verified different properties and protocols for theUniPro interface technology. UniPro is an abstract interface which is designedto interconnect multiple components within a mobile device. Its high standardsand specifications bring UniPro to the top innovations of its class, making itsuitable to the current and future requirements. As a crucial technology, it requiresdeep research and understanding of its main functionalities. Thus, proving thecorrectness of its protocols is a challenging process.

We studied and proved the correctness for several protocols of UniPro in threemain sections. The first protocol, USW, is an already implemented protocol onUniPro and follows the Data Link layer of OSI model. As it belongs to thesliding window protocol family, it guarantees reliable data transmission and flowcontrol between two nodes. The two last parts are meant for the same suite:the connection establishment between the server and the client through multiplerouters. In the connection management protocol section we concentrated on thecorrect flow of the protocol and the transfer of messages between client and server.In the last main section, about the router management protocol, we extended ourconnection management protocol to the bandwidth reservation of the in-betweenlinks before a connection is established.

USW is a variation of the Go-back-N protocol. Our implementation allows upto I, the maximum packet identity (0 identity is included), unacknowledged datapackets to be sent at a certain time. In addition to the ACK responses, we includeNAC responses which prevent the network from being flooded by unnecessarypackets. Our interest was to prove the correctness and reliability of the protocolby checking that all messages sent are eventually received in order and withoutduplications. The USW protocol was successfully verified in both Spin (for Ndifferent packets identities N = 3) and DiVinE (for N = 3, 4, 5). USW was testedin its simplest form. We believe that in practice there are smarter solutions thatwould reduce the number of wasted messages even more, e.g., based on the roundtrip time. A UniPro simulator can be used to extended the protocol and find anoptimized solution based on its specific requirements.

The connection management protocol ensures that the client and the serverexchange data after they both know about the connection and similarly the con-nection is closed only when both of them agree upon it. In addition, no other clientor server interferes with an already established connection even if the environmentis erroneous (for instance with repetition or loss of messages). We show that a3-way handshaking is required and we introduce a simple 2-way closing method.Our CM protocol is simpler than the TCP one as the closing method requiresless messages, no session and timing variables are needed. A full verification wascompleted in Spin (for channel capacity of one message) and a more complex ver-ification (for channel capacity of two messages) was finished in DiVinE. Furtherresearch can be done on our connection management protocol, as well as testingits performance on a real platform.

The last part of our study concentrated on the route reservation before aconnection is set up. We extended the idea of the previous protocol by addingseveral routers and limited bandwidth in each channel. We showed two solutions.In the first one, there is no need to change our connection management protocolbut it is required that the routers remember the connection during data exchange.In order to avoid this storage, from the connection establishment to the closing,we analyzed a second solution. The second solution adds an extra message to our

75

closing message sequence. It outperforms the first one as more connections canbe established in a certain amount of time. An intuition was given and informallyproven, about how this extra message in can be avoided in some specific cases. Byusing DiVinE (for channel capacity of one message), we formally proved that theflow of the algorithm performs well in an imperfect environment and that the linkreservation and release is successfully done only once per session. As a furtherstep, verification of our intuition can be considered, as well as other variations ofour algorithms, e.g., in the case of 3-way closing the server may be the one whofirst sends the FIN message. The most valuable variations could be analyzed andcompared. In addition, more UniPro protocols could be tested and fully verified.

During the verifications we compared the two tools, Spin and DiVinE. Spin isa widely used verification checker but it runs only on a single machine. Due to thecomplexity of the verification, costs of time and memory, running a verificationon multiple nodes was performed using the distributed verification environment,DiVinE. For small problems (like USW with N = 3), Spin performs better thanDiVinE. For problems where the distribution overhead of DiVinE is hidden, Di-VinE is better with a good speedup. Where Spin is not capable of verifying amodel, DiVinE is (e.g., the router management protocol). DiVinE is fast wherethe memory needed per node is less than the node’s memory. Otherwise, the ex-ecution time can be extremely high due to paging overhead (the same is true forSpin, however). DiVinE is still under development. Better feedback to the usershould be considered, like trail an error, execution details when the verificationis partially finished, etc. Moreover, introducing compression algorithms mightsignificantly reduce DiVinE’s memory usage, which currently exceeds Spin’s.

76

A Codes

A.1 Script created to run on DAS

1 #!/bin / sh2 export DEBIS TOOLS=/home0/ogk200/ t o o l s34 #check arguments5 i f [ $# −ne 1 ] ; then6 echo Usage : $0 promela source >&27 ex i t 18 f i9

10 #se t my envinroment11 #module load de fau l t−myrinet12 #module de l mpich13 #module add openmpi/ gcc /64/1.3− snapshot1415 #compile source16 echo Compilation . . .17 $DEBIS TOOLS/ compile pml . sh $118 #compile−pml $11920 #run the prun f o r r e a c h ab i l i t y21 echo Run f o r r e a c h ab i l i t y . . .22 prun −1 −np 2 −sge−s c r i p t $DEBIS TOOLS/sge−s c r i p t−openmpi23 / usr / l o c a l /package/ div ine −0.7.2−openmpi−vu/bin / d iv in e . d i s t r r e a c h a b i l i t y24 −S −H20 $1 . i . pr . b2526 #run f o r LTL ana l y s i s27 echo Run f o r LTL ana l y s i s . . .28 prun −1 −np 2 −sge−s c r i p t $DEBIS TOOLS/sge−s c r i p t−openmpi29 / usr / l o c a l /package/ div ine −0.7.2−openmpi−vu/bin / d iv in e . owcty30 −S −H20 $1 . i . pr . b3132 echo End .

B Full Verification Outputs

B.1 USW protocol

B.1.1 1st never claim

1 ( Spin Vers ion 5 . 1 . 6 −− 9 May 2008)2 + Pa r t i a l Order Reduction34 Fu l l s t a t e spac e search f o r :5 never c la im +6 a s s e r t i o n v i o l a t i o n s + ( i f with in scope o f c la im )7 acceptance c y c l e s + ( f a i r n e s s d i s ab l ed )8 i n v a l i d end s t a t e s − ( d i s ab l ed by never c la im )9

10 State−vec to r 76 byte , depth reached 9972 , e r r o r s : 011 167820 s ta t e s , s t o r ed12 283872 s ta t e s , matched13 451692 t r a n s i t i o n s (= sto r ed+matched )

77

14 234634 atomic s t ep s15 hash c o n f l i c t s : 29430914 ( r e s o l v ed )1617 Stat s on memory usage ( in Megabytes ) :18 16 .645 equ iva l en t memory usage f o r s t a t e s ( s to r ed ∗( State−vec to r + overhead ) )19 11 .807 ac tua l memory usage f o r s t a t e s ( compress ion : 70.93%)20 s tate−vec to r as s to r ed = 46 byte + 28 byte overhead21 4 .000 memory used f o r hash tab l e (−w19)22 457.764 memory used f o r DFS stack (−m10000000 )23 473.483 t o t a l a c tua l memory usage2425 unreached in proctype Tx26 l i n e 122 , s t a t e 61 , ”−end−”27 (1 o f 61 s t a t e s )28 unreached in proctype Rx29 l i n e 161 , s t a t e 13 , ”−end−”30 (1 o f 13 s t a t e s )31 unreached in proctype no i s e32 l i n e 183 , s t a t e 6 , ”−end−”33 (1 o f 6 s t a t e s )34 unreached in proctype Source35 l i n e 201 , s t a t e 17 , ”−end−”36 (1 o f 17 s t a t e s )3738 pan : e l apsed time 1 .46 seconds39 pan : r a t e 114945.21 s t a t e s / second

B.1.2 2nd never claim

1 ( Spin Vers ion 5 . 1 . 6 −− 9 May 2008)2 + Pa r t i a l Order Reduction34 Fu l l s t a t e spac e search f o r :5 never c la im +6 a s s e r t i o n v i o l a t i o n s + ( i f with in scope o f c la im )7 acceptance c y c l e s + ( f a i r n e s s d i s ab l ed )8 i n v a l i d end s t a t e s − ( d i s ab l ed by never c la im )9

10 State−vec to r 76 byte , depth reached 9972 , e r r o r s : 011 170031 s ta t e s , s t o r ed12 283874 s ta t e s , matched13 453905 t r a n s i t i o n s (= sto r ed+matched )14 234634 atomic s t ep s15 hash c o n f l i c t s : 18884227 ( r e s o l v ed )1617 Stat s on memory usage ( in Megabytes ) :18 16 .864 equ iva l en t memory usage f o r s t a t e s ( s to r ed ∗( State−vec to r + overhead ) )19 12 .001 ac tua l memory usage f o r s t a t e s ( compress ion : 71.17%)20 s tate−vec to r as s to r ed = 46 byte + 28 byte overhead21 4 .000 memory used f o r hash tab l e (−w19)22 457.764 memory used f o r DFS stack (−m10000000 )23 473.678 t o t a l a c tua l memory usage2425 unreached in proctype Tx26 l i n e 122 , s t a t e 61 , ”−end−”27 (1 o f 61 s t a t e s )28 unreached in proctype Rx29 l i n e 161 , s t a t e 13 , ”−end−”

78

30 (1 o f 13 s t a t e s )31 unreached in proctype no i s e32 l i n e 183 , s t a t e 6 , ”−end−”33 (1 o f 6 s t a t e s )34 unreached in proctype Source35 l i n e 201 , s t a t e 17 , ”−end−”36 (1 o f 17 s t a t e s )3738 pan : e l apsed time 1 .13 seconds39 pan : r a t e 150469.91 s t a t e s / second

B.1.3 1st never claim of DiVinE - reachability algorithm - on 2nodes

1 Reading bytecode source . . .2 WARNING: Unable to f i nd unreachable code s i n c e the system3 i n t e r f a c e cannot work with t r a n s i t i o n s . Turning the4 unreachable code de t e c t i on mode o f f .56 WARNING: Cannot perform deadlock de t e c t i on f o r promela7 bytecodes . Turning the deadlock de t e c t i on mode o f f .89 WARNING: Test f o r a s s e r t i o n v i o l a t i o n i s not f u l l y supported

10 f o r promela bytecodes . I f performed and an a s s e t i o n i s11 v io l a t ed , NIPS RUNTIME ERROR occurs .1213 t 3 0 i 2 #R 5964 #S 5902 #Sr 0 a 8249 R 15 .6 T 15 .5 Tr 0 .0 P 112 .5

8280 3 1 8 S1 1 U0 35656114 t 3 0 i 1 #R 5938 #S 5941 #Sr 0 a 8243 R 15 .6 T 15 .6 Tr 0 .0 P 112 .5

0 3 0 8 S1 4 U0 35469415 t 3 0 i 3 #R 5928 #S 5952 #Sr 0 a 8251 R 15 .5 T 15 .6 Tr 0 .0 P 112 .5

0 3 0 8 S1 1 U0 35447116 t 3 0 i 0 #R 5933 #S 5978 #Sr 0 a 8245 R 15 .5 T 15 .7 Tr 0 .0 P 112 .5

0 3 0 8 S1 4 U0 35443217 No a s s e r t i o n v i o l a t e d .18 Sta t e s : 129106319 t r a n s i t i o n s : 337603420 c r o s s t r a n s i t i o n s : 253163821 s i z e o f i n i t i a l s t a t e : 1322 a l l memory : 534 .0 MB23 time : 5 . 6 s24 −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−25 0 : l o c a l s t a t e s : 32246126 2 : l o c a l s t a t e s : 32301027 0 : l o c a l memory : 132 .728 2 : l o c a l memory : 133 .729 1 : l o c a l s t a t e s : 32253630 3 : l o c a l s t a t e s : 32305631 1 : l o c a l memory : 133 .732 3 : l o c a l memory : 133 .7

B.1.4 1st never claim of DiVinE - owcty algorithm - on 2 nodes

1 Reading bytecode source . . .2 =======================================3 Reachab i l i t y & Reset . . . t 3 0 i 0 #R 5433 #S 5513 #Sr 0 a 8240 R 14 .2 T 14 .4 Tr 0 .0 P 119 .7

0 3 0 8 S1 3 U0 324505

79

4 t 3 0 i 3 #R 5407 #S 5565 #Sr 0 a 8237 R 14 .2 T 14 .6 Tr 0 .0 P 119 .70 3 0 8 S1 1 U0 323383

5 t 3 0 i 2 #R 5546 #S 5315 #Sr 0 a 8246 R 14 .4 T 13 .9 Tr 0 .0 P 119 .70 3 0 8 S1 2 U0 329244

6 t 3 0 i 1 #R 5459 #S 5457 #Sr 0 a 8200 R 14 .3 T 14 .2 Tr 0 .0 P 119 .78280 3 1 8 S1 3 U0 325825

7 done . (1291063 s t a t e s )8 Number o f s t a t e s in S : 09 a l l memory : 596 .8 MB

10 =======================================11 −−− No accept ing cy c l e −−−12 =======================================13 Sta t e s : 129106314 t r a n s i t i o n s : 337603415 i t e r a t i o n s : 016 s i z e o f a s t a t e : 1317 s i z e o f appendix : 8018 hashtab le s i z e : 104857619 c r o s s t r a n s i t i o n s : 253163820 a l l memory : 596 .8 MB21 time : 6 . 2 s22 −−−−−−−−−−−−−−−−−−−−−−−−23 0 : l o c a l s t a t e s : 32246124 2 : l o c a l s t a t e s : 32301025 3 : l o c a l s t a t e s : 32305626 1 : l o c a l s t a t e s : 32253627 0 : l o c a l memory : 149 .228 2 : l o c a l memory : 149 .229 3 : l o c a l memory : 149 .230 1 : l o c a l memory : 149 .2

B.2 Connection Management protocol

B.2.1 Channel capacity of 1 message in Spin

12 ( Spin Vers ion 5 . 1 . 6 −− 9 May 2008)3 + Pa r t i a l Order Reduction45 Fu l l s t a t e spac e search f o r :6 never c la im − ( none s p e c i f i e d )7 a s s e r t i o n v i o l a t i o n s +8 acceptance c y c l e s − ( not s e l e c t e d )9 i n v a l i d end s t a t e s +

1011 State−vec to r 88 byte , depth reached 328821 , e r r o r s : 012 864641 s ta t e s , s t o r ed13 3942496 s ta t e s , matched14 4807137 t r a n s i t i o n s (= sto r ed+matched )15 5760633 atomic s t ep s16 2.45496 e+06 l o s t messages17 hash c o n f l i c t s : 1 .2374857 e+09 ( r e s o l v ed )1819 Stat s on memory usage ( in Megabytes ) :20 95 .652 equ iva l en t memory usage f o r s t a t e s ( s to r ed ∗( State−vec to r + overhead ) )21 68 .059 ac tua l memory usage f o r s t a t e s ( compress ion : 71.15%)22 s tate−vec to r as s to r ed = 55 byte + 28 byte overhead23 4 .000 memory used f o r hash tab l e (−w19)

80

24 45 .777 memory used f o r DFS stack (−m1000000 )25 117.745 t o t a l a c tua l memory usage2627 unreached in proctype s e r v e r28 l i n e 48 , s t a t e 10 , ” a s s e r t (0 ) ”29 l i n e 73 , s t a t e 25 , ” a s s e r t (0 ) ”30 l i n e 82 , s t a t e 34 , ” a s s e r t (0 ) ”31 l i n e 109 , s t a t e 53 , ” a s s e r t (0 ) ”32 l i n e 118 , s t a t e 62 , ” a s s e r t (0 ) ”33 l i n e 150 , s t a t e 88 , ” a s s e r t (0 ) ”34 l i n e 161 , s t a t e 99 , ” a s s e r t (0 ) ”35 l i n e 165 , s t a t e 105 , ”−end−”36 (8 o f 105 s t a t e s )37 unreached in proctype c l i e n t38 l i n e 215 , s t a t e 24 , ” a s s e r t (0 ) ”39 l i n e 268 , s t a t e 62 , ” a s s e r t (0 ) ”40 l i n e 278 , s t a t e 75 , ”−end−”41 (3 o f 75 s t a t e s )42 unreached in proctype : i n i t :43 (0 o f 19 s t a t e s )4445 pan : e l apsed time 56 seconds46 pan : r a t e 15453.816 s t a t e s / second

B.2.2 Rendezvous communication in Spin

12 ( Spin Vers ion 5 . 1 . 6 −− 9 May 2008)3 + Pa r t i a l Order Reduction45 Fu l l s t a t e spac e search f o r :6 never c la im − ( none s p e c i f i e d )7 a s s e r t i o n v i o l a t i o n s +8 acceptance c y c l e s − ( not s e l e c t e d )9 i n v a l i d end s t a t e s +

1011 State−vec to r 88 byte , depth reached 10287 , e r r o r s : 012 12239 s ta t e s , s t o r ed13 23508 s ta t e s , matched14 35747 t r a n s i t i o n s (= sto r ed+matched )15 39073 atomic s t ep s16 hash c o n f l i c t s : 482822 ( r e s o l v ed )1718 Stat s on memory usage ( in Megabytes ) :19 1 .354 equ iva l en t memory usage f o r s t a t e s ( s to r ed ∗( State−vec to r + overhead ) )20 1 .050 ac tua l memory usage f o r s t a t e s ( compress ion : 77.56%)21 s tate−vec to r as s to r ed = 62 byte + 28 byte overhead22 4 .000 memory used f o r hash tab l e (−w19)23 45 .777 memory used f o r DFS stack (−m1000000 )24 50 .753 t o t a l a c tua l memory usage2526 unreached in proctype s e r v e r27 l i n e 56 , s t a t e 10 , ” a s s e r t (0 ) ”28 l i n e 81 , s t a t e 25 , ” a s s e r t (0 ) ”29 l i n e 90 , s t a t e 34 , ” a s s e r t (0 ) ”30 l i n e 117 , s t a t e 53 , ” a s s e r t (0 ) ”31 l i n e 126 , s t a t e 62 , ” a s s e r t (0 ) ”32 l i n e 158 , s t a t e 88 , ” a s s e r t (0 ) ”

81

33 l i n e 169 , s t a t e 99 , ” a s s e r t (0 ) ”34 l i n e 173 , s t a t e 105 , ”−end−”35 (8 o f 105 s t a t e s )36 unreached in proctype c l i e n t37 l i n e 223 , s t a t e 24 , ” a s s e r t (0 ) ”38 l i n e 226 , s t a t e 28 , ” a s s e r t ( ( t type==CO FIN) ) ”39 l i n e 245 , s t a t e 40 , ”ch [ my server ] !CO ACK, my id”40 l i n e 247 , s t a t e 42 , ” a s s e r t ( ( t type==CO FIN) ) ”41 l i n e 242 , s t a t e 43 , ” ( ( t s ou r c e==my server ) ) ”42 l i n e 242 , s t a t e 43 , ” e l s e ”43 l i n e 268 , s t a t e 54 , ”ch [ my server ] ! CO FIN , my id”44 l i n e 271 , s t a t e 57 , ”ch [ my server ] ! CO FIN , my id”45 l i n e 276 , s t a t e 62 , ” a s s e r t (0 ) ”46 l i n e 279 , s t a t e 66 , ” a s s e r t ( ( t type==CO FIN) ) ”47 l i n e 286 , s t a t e 75 , ”−end−”48 (10 o f 75 s t a t e s )49 unreached in proctype : i n i t :50 (0 o f 19 s t a t e s )5152 pan : e l apsed time 0 .1 seconds53 pan : r a t e 122390 s t a t e s / second

B.2.3 Channel capacity of 2 messages in DiVinE - on 32 nodes

1 No a s s e r t i o n v i o l a t e d .2 Sta t e s : 627288223 t r a n s i t i o n s : 1125809994 c r o s s t r a n s i t i o n s : 1108226525 s i z e o f i n i t i a l s t a t e : 136 a l l memory : 15596.4 MB7 time : 11 .0 s

B.3 Router Management protocol

B.3.1 Channel capacity of 1 message running in Spin

1 pan : out o f memory2 h int : to reduce memory , recompi l e with3 −DCOLLAPSE # good , f a s t compression , or4 −DHC # hash−compaction , approximation5 −DBITSTATE # supert race , approximation67 ( Spin Vers ion 5 . 1 . 6 −− 9 May 2008)8 Warning : Search not completed9 + Pa r t i a l Order Reduction

10 + Graph Encoding (−DMA=84)1112 Fu l l s t a t e spac e search f o r :13 never c la im +14 a s s e r t i o n v i o l a t i o n s + ( i f with in scope o f c la im )15 acceptance c y c l e s + ( f a i r n e s s d i s ab l ed )16 i n v a l i d end s t a t e s − ( d i s ab l ed by never c la im )1718 State−vec to r 104 byte , depth reached 88949473 , e r r o r s : 019 1.5706852 e+08 s ta t e s , s t o r ed20 3.0225784 e+08 s ta t e s , matched21 4.5932636 e+08 t r a n s i t i o n s (= sto r ed+matched )22 4 atomic s t ep s

82

23 5.87806 e+07 l o s t messages24 hash c o n f l i c t s : 0 ( r e s o l v ed )2526 Stat s on memory usage ( in Megabytes ) :27 17975.066 equ iva l en t memory usage f o r s t a t e s ( s to r ed ∗( State−vec to r + overhead ) )28 1295.508 ac tua l memory usage f o r s t a t e s ( compress ion : 7.21%)29 2670.288 memory used f o r DFS stack (−m100000000 )30 3965.601 t o t a l a c tua l memory usage313233 pan : e l apsed time 6 .01 e+03 seconds34 pan : r a t e 26112.935 s t a t e s / second

B.3.2 Channel capacity of 1 message running in DiVinE- reacha-bility algorithm - on 32 nodes

1 No a s s e r t i o n v i o l a t e d .2 Sta t e s : 1883338273 t r a n s i t i o n s : 9970011464 c r o s s t r a n s i t i o n s : 9814242495 s i z e o f i n i t i a l s t a t e : 136 a l l memory : 45505.2 MB7 time : 123 .7 s

B.3.3 Channel capacity of 1 message running in DiVinE- owctyalgorithm - on 32 nodes

1 a l l memory : 54142.5 MB2 =======================================3 −−− No accept ing cy c l e −−−4 =======================================5 Sta te s : 1883338276 t r a n s i t i o n s : 9970011467 i t e r a t i o n s : 08 s i z e o f a s t a t e : 139 s i z e o f appendix : 80

10 hashtab le s i z e : 104857611 c r o s s t r a n s i t i o n s : 98142424912 a l l memory : 54142.5 MB13 time : 132 .2 s

C Further Elaboration on Run Time section

C.1 Running 2nd never claim using DiVinE

83

nodes

Figure 45: Total memory usage when the 2nd never claim isenabled and N = 3

nodes

Figure 46: Execution time when the 2nd never claim is enabledand N = 3

References

[1] Gerard J. Holzmann, The SPIN Model Checker, Primer and reference manual,(book) 2004.

[2] Mobile Industry Processor Interface (MIPI), MIPI Aliance Specification forUnified Protocol (UniPro) draft version 1.10.00, 2008.

84

nodes

Figure 47: Speedup when the 2nd never claim is enabled andN = 3

[3] Branislav Kusy, Sherif Abdelwahed, FTSP Protocol Verification using SPIN,2006.

[4] David Cypher, David Lee, Marta Martin-Villalba, Christiaan Prins andDavid Su, Formal specification, verification, and automatic test generation ofATM routing protocol: PNNI. In Formal Description Techniques & ProtocolSpecification, Testing, and Verification ((FORTEPSTV) IFIP),1998.

[5] P. Wolper, Specifying interesting properties of programs in propositional tem-poral logic. In proceedings 13th ACM Symposium on Principles of Program-ming Languages, 1986.

[6] Jose Garcia-Fanjul, Javier Tuya and Jose Antonio Corrales, Formal Verifi-cation and Simulation of the NetBill Protocol using Spin. In proceedings offourth Spin Workshop, 1998.

[7] Kees Verstoep, Henri E. Bal, Jiri Barnat and Lubos Brim, Efficient Large-Scale Model Checking, 2009.

[8] Distributed ASCI Supercomputer 3(DAS3) www.cs.vu.nl/das3/

[9] Andrew S. Tanenbaum, Computer Networks, Prentice Hall 4th edition, 2002.

[10] UniPro reference, http://en.wikipedia.org/wiki/Unipro

[11] G.J. Holzmann, The Model checker Spin. IEEE transcripts on Software En-gineering, volume 23, No. 5, 279-295, 1997.

[12] E.M. Clarke, O. Grumberg and D.A. Pedel, Model checking, MIT Press, 2000.

[13] Dimitri Chkliaev, Jozef Hooman and Eric de Vink, Verification and Improve-ment of the sliding window protocol. In Proc. 9th Conference on Tools andAlgorithms for the Construction and Analysis of Systems (TACA’03), LNCS2619, Springer-Verlag, p. 113-127, 2003.

85

http://en.wikipedia.org/wiki/Unipro

[14] M.A. Bezem and J.F. Groote, A correctness proof of a one bit sliding windowprotocol in mCRL. The Computer Journal, 37(4): 289-307.

[15] A. Udaya Shankar. Verified data transfer protocols with variable flow control.ACM Transactions on Computer Systems, 7(3):281-316, 1989.

[16] Yong Deng and Zhangqin Huang, Modeling and Performance analysis of asliding window protocol

[17] Internetworking with TCP/IP: Principles, Protocols, and Architecture. Pren-tice Hall 5th edition, 2006.

[18] DiVinE http://divine.fi.muni.cz/

[19] B. Badban, W. J. Fokkink and J. C.van de Pol, Mechanical verification ofa two-way sliding window protocol. In (P.H. Welch et al.) Proc. 9th Confer-ence on Communicating Process Archetectures - CPA’08, York, ConcurrentSystems Engineering Series 66, pp. 179-202, IOS Press, 2008.

86

http://divine.fi.muni.cz/

design & veriﬁcation of unipro protocols for mobile …tcs/mt/galataki.pdf · design &...

Documents