Transcript
Page 1: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Thesis for the Degree of Doctor of Philosophy

Formal and Informal SoftwareSpecifications

Kristofer Johannisson

Department of Computer Science and EngineeringChalmers University of Technology and Goteborg University

SE-412 96 Goteborg, Sweden

Goteborg, 2005

Page 2: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Formal and Informal Software SpecificationsKristofer JohannissonISBN 91-628-6535-8

c© Kristofer Johannisson, 2005

Technical Report no. 6 DDepartment of Computer Science and EngineeringLanguage Technology Research Group

Department of Computer Science and EngineeringChalmers University of Technology and Goteborg UniversitySE-412 96 Goteborg, SwedenTelephone + 46 (0)31-772 1000

Printed at Chalmers, Goteborg, Sweden, 2005

Page 3: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Abstract

The topic of this thesis is to bridge the gap between formal and informal softwarespecifications. Formal specifications are required for the use of formal methodsto verify the correctness of software. If we expect formal methods to be used inrealistic software development projects, we need to enable people with varyinglevels of familiarity with formal specification languages to understand, maintainand create formal specifications.

To address these problems, we provide a tool for translating specificationswritten in the formal language OCL, a substandard of UML, to natural language.We also provide a multilingual, syntax-directed editor where OCL and naturallanguage specifications can be edited in parallel.

The implementation of our work is to a large extent based on the Gram-matical Framework (GF). GF is a grammar formalism based on type theory,which provides a special purpose language for defining grammars, and a com-piler for this language. We have developed a GF grammar for specificationsin OCL and natural language. The grammar captures the OCL type system,its built-in constructions and the predefined types of the OCL library. It isdynamically extended with domain-specific concepts by generating GF gram-mar modules from UML class diagrams. The generated modules make use of agrammar-level API of common constructions, which means that these modulescan be modified without requiring GF expertise. To improve the readabilityof the translation of OCL specifications, the grammar includes formatting ofthe produced natural language. Inspired by Natural Language Generation tech-niques like aggregation, we also apply transformations to abstract syntax treesusing a program external to the GF grammar.

Our tool is a part of the KeY system, which integrates formal softwarespecification and verification into the industrial software engineering processes.The tool has successfully been used for a non-trivial case study: translating theOCL specifications of the Java Card API into English.

iii

Page 4: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

iv

Page 5: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

List of Included Papers

This thesis is based on work contained in the following papers:

Paper 1: Kristofer Johannisson. Natural Language Specifications. Prelimi-nary version of a chapter of The KeY Book, edited by Bernhard Beckert,Reiner Hahnle and Peter H. Schmitt, which will be published by Springerin the LNAI subseries.

Paper 2: Reiner Hahnle, Kristofer Johannisson, and Aarne Ranta. An author-ing tool for informal and formal requirements specifications. In ETAPS/FASE-2002: Fundamental Approaches to Software Engineering, edited byR. D. Kutsche and H. Weber, Springer LNCS, vol. 2306, pp. 233–248,2002.

Paper 3: Kristofer Johannisson. Disambiguating Implicit Constructions inOCL. Accepted to the OCL and Model Driven Engineering Workshop,at the UML conference in Lisbon, 2004. Online proceedings at http://www.cs.kent.ac.uk/projects/ocl/oclmdewsuml04/description.htm.

Paper 4: David A. Burke and Kristofer Johannisson. Translating formal spec-ifications to natural language — a grammar-based approach. In Pro-ceedings of LACL 2005, edited by Philippe Blache and Edward Stabler,number 3492 in Springer LNAI, 2005 (forthcoming).

v

Page 6: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

vi

Page 7: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Contents

Introduction 1

1 Overview 1

2 The KeY Project 1

3 The Object Constraint Language (OCL) 2

4 The Grammatical Framework (GF) 3

5 An Example GF Grammar 4

6 Included Papers 8

7 Related Work 10

8 Future Work 11

9 Contributions 11

Paper 1Natural Language Specifications 17

Paper 2An Authoring Tool for Informal and Formal Requirements Specifica-

tions 35

Paper 3Disambiguating Implicit Constructions in OCL 53

Paper 4Translating Formal Software Specifications to Natural Language — A

Grammar-Based Approach 71

vii

Page 8: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

viii

Page 9: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Acknowledgements

During the course of my PhD studies, my supervisor Aarne Ranta has been asource of generous support, encouragement and friendship, for which I am verygrateful.

I have also benefited a lot from the support of Reiner Hahnle. This workis based on the ideas of Aarne Ranta and Reiner Hahnle, and would not havebeen possible without them. The cooperation with and supervision of David A.Burke and Hans-Joachim Daniels has also been an important part of this work.

Furthermore, I would like to thank all fellow PhD students and other em-ployees of our department for making it an enjoyable place to work in.

Finally, I am grateful to my Master’s thesis supervisor Peter Dybjer, whogot me started as a PhD student.

ix

Page 10: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

x

Page 11: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Introduction

1 Overview

The topic of this thesis is bridging the gap between formal and informal softwarespecifications by providing a link between the formal language OCL and naturallanguage.

Formal specifications are required for using formal methods: if we wantto prove that a program is correct, we need a formal specification of whatproperties it should satisfy. Also, even if we do not intend to formally verifythat a program is correct, we may still benefit from the precision of a formalspecification. However, producing formal specifications is not part of commonsoftware engineering practice. Instead, informal specifications, often naturallanguage ones, are used.

Our approach is to make it possible to translate OCL specifications to naturallanguage. This means that once formal OCL specifications have been produced,they can be translated and then understood by people who do not know OCL.The translation may for instance be used by a customer who needs to validatethat a formal specification captures the intended behavior of a system, or bya programmer who is supposed to implement a system according to a formalspecification.

To support the development of OCL specifications, we provide a multilingual,syntax-directed editor in which specifications can be edited in OCL and naturallanguage in parallel. This editor can be used to create and maintain OCLspecifications by people who are not OCL experts.

The context of our work is given by the KeY project, which integrates formalsoftware specification and verification into the industrial software engineeringprocesses. Sect. 2 gives a brief introduction to KeY, followed by a section aboutthe language OCL.

The implementation of our work is to a large extent based on the Grammat-ical Framework (GF). GF is described in sections 4 and 5.

Sect. 6 describes the papers which make up this dissertation, thereby alsogiving an overview of our work. This is followed by sections on related work,future work, and on the contributions of the thesis.

2 The KeY Project

The KeY project [1] aims to integrate formal software specification and verifica-tion into the industrial software engineering processes. The starting point is acommercial CASE (Computer Aided Software Engineering) tool, which is aug-mented by capabilities for formal specification and verification. The ultimategoal is to make the verification process transparent for the user with respect tothe informal object-oriented model.

The KeY system supports specification and verification of Java Card [23]

1

Page 12: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Company Personsalary : Integer

0..* 0..*employers employees

0..* 1 president

Figure 1: Example Class Diagram

programs. Java Card is a subset of Java (e.g. there is no threading or garbagecollection), tailored to smart cards and other devices with limited resources.It is a suitable target for formal methods since it is has a simpler semanticsthan full Java, and also because applications are expected to be security or costcritical (e.g. consider electronic banking, or phone cards which are distributedin large numbers and cannot be easily upgraded).

In the KeY system, specifications of Java Card programs are developed inthe Object Constraint Language (OCL), which will be described in the nextsection. For verification, the KeY system translates OCL specifications, alongwith the corresponding Java Card programs, into proof obligations in a DynamicLogic for Java Card. Using a theorem prover which supports both interactiveand automatic deduction, the programs can then be verified against their spec-ifications.

3 The Object Constraint Language (OCL)

The Object Constraint Language (OCL) [10] is a part of the UML standard [25].It is a language of side-effect free expressions which can be used for navigatingUML models — typically class diagrams — and for formulating constraints tomake UML models more precise. Expressions are allowed to use the attributes,associations and side-effect free operations from the UML model, as well asa library of standard types (e.g. integers, string, and collections). An OCLexpression of boolean type can be attached to a UML model as an invariant of aclass, or as a pre- or postcondition of an operation. These boolean expressionsare reminiscent of first order logic formulas using an object oriented notation,adapted to the UML setting.

Fig. 1 shows an example class diagram with two classes Person and Company.We could now specify, for instance, that no employee of a company has highersalary than the president by giving an OCL invariant for Company:

context Company inv:self.employees->forAll(p : Person |

p.salary <= self.president.salary)

This OCL constraint makes use of three properties from the UML model:

2

Page 13: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

employees (a collection of persons), salary (an integer attribute) and presi-dent (a person). From the OCL library we are using forAll and the <= com-parison operator (defined for real numbers, but Integer is a subtype of Real).The arrow (->) is used instead of a dot when making property calls to collec-tions in OCL. The keyword self refers to an instance of the class given by thecontext, in this case Company.

The official definition of OCL 2.0 [10] contains an informal description ofthe language, a semi-formal definition formulated in terms of UML and OCL,as well as a formal semantics (based on set theory) given as an appendix. Thereare also other attempts at formally defining OCL, e.g. [4, 3]. According to oneof these alternative definitions, the expressive power of OCL version 2.0 is thatof the primitive recursive functions, while OCL version 1.4/5 is Turing complete[3]. As noted above, the KeY tool defines the semantics of OCL by a translationof OCL to Dynamic Logic [2, 22].

4 The Grammatical Framework (GF)

The GF Formalism. The Grammatical Framework (GF) is a formalism fordefining grammars [18]. A GF grammar consists of one part which describesabstract syntax, and another part which describes concrete syntax. The abstractsyntax part is formulated in a version of Martin-Lof’s type theory [16], andcan be seen as a description of how to construct abstract syntax trees. Theconcrete syntax then consists of linearization rules telling how to present thesetrees as expressions of a particular language. This means that grammars arewritten from the perspective of linearization rather than parsing. In fact, wecan consider the GF formalism as a linearization (or generation) oriented typedfunctional language.

By having multiple concrete syntaxes for the same abstract syntax we achievemultilinguality : we can present the same tree in several languages in parallel,and we can translate (within the language fragment described by the grammar)by parsing using one concrete syntax and linearizing with another.

The GF system. The GF system [19] provides functionality such as parsingand linearization for grammars written in the GF formalism. The system alsoincludes a syntax editor [15] in which the user can load a GF grammar andthen edit the abstract syntax trees described by the grammar. The trees areat all times presented in the languages defined by the concrete syntaxes ofthe grammar. By editing the abstract syntax tree and observing the resultsin a familiar language, a user can then interactively produce texts in foreignlanguages.

The GF Resource Grammar Library. An important part of the GF projectis the resource grammar library [17], which provides an API of types and func-tions for common linguistic structures. There are resource grammars available

3

Page 14: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

for English, Finnish, French, German, Italian, Russian and Swedish, which to alarge extent share the same interface.

Grammar Engineering A typical GF application grammar describes a well-defined fragment of natural language for a restricted domain, e.g. in our casesoftware specifications. The resource grammar library provides a division oflabour: the author of an application grammar can be a domain expert, whodoes not need to be familiar with linguistic details. His or her task is to comeup with an abstract syntax which models the domain, and to link the abstractsyntax to concrete language by using the resource grammars. The linguisticexpert is in turn responsible for the implementation of the resource grammars,where no knowledge of a particular domain is needed.

5 An Example GF Grammar

In this section we will explain the basic ideas of the GF formalism by consideringa small example grammar in some detail.

We go back to the class diagram in Fig. 1, which shows a class Personwith an integer attribute salary. We will define a grammar for a very smalllanguage which allows us to express for instance that the salary of a personis greater than zero. We will define an abstract syntax where such a sentencecan be represented, and concrete syntaxes to present abstract syntax trees inEnglish and German, as well as in an OCL-like notation.

Abstract Syntax. The first step is to define the abstract syntax, as follows:

cat Class;Expression (c:Class);Sentence;

fun Integer, Person : Class;zero : Expression Integer;salary : Expression Person -> Expression Integer;greaterThan : (a,b : Expression Integer) -> Sentence;self : (c:Class) -> Expression c;

This introduces three categories (types), and some functions for buildingtrees in these categories. Class represent classes, and Integer and Person aretwo constants of type Class. To represent expressions, we use a dependent cate-gory: for every class c, there is a category Expression c. In this way we encodea type system for expressions in the grammar. So, we can type zero as an inte-ger expression, and salary as a function which takes a person as an argument(i.e. an argument of type Expression Person) and returns an integer. There isjust one way to build a sentence in the grammar: by applying greaterThan totwo integer expressions. Finally we include an OCL self construction, whichtakes a class c as an argument, and returns an Expression c.

4

Page 15: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

A sentence stating that the salary of a person is greater than zero could nowbe represented as the tree greaterThan (salary (self Person)) zero.

Concrete Syntax for OCL. The first concrete syntax we will define presentstrees in a notation inspired by OCL. The tree greaterThan (salary (selfPerson)) zero will be linearized as “self.salary > 0”. This is achieved bythe following linearization judgements:

lin Integer = {s = "Integer"};Person = {s = "Person"};zero = {s = "0"};salary p = {s = p.s ++ "." ++ "salary"};greaterThan x y = {s = x.s ++ ">" ++ y.s};self _ = {s = "self"};

There is one linearization judgement for each function in the abstract syntax.The first one says that the linearization of Integer is a record, containing justone field s, which is set to the string “Integer”. The operator ++ denotes stringconcatenation, the dot is used for selecting a field from a record.

On the right hand side of a judgement we can refer to the linearization ofthe arguments (or subtrees) of the function. For instance, in the linearization ofgreaterThan we refer to the s fields of the linearization of the subtrees x andy. There is no way of accessing the subtrees themselves in the concrete syntax;this is what makes GF grammars compositional.

The function self is always linearized as “self”. The argument, of typeClass, is not used and is therefore not given a name.

Concrete Syntax for English. The concrete syntax for presenting trees inEnglish is fairly similar to the one for OCL, it differs mainly in the choice ofstrings:

lin Integer = {s = "integer"};Person = {s = "person"};zero = {s = "0"};salary p = {s = ["the salary of"] ++ p.s};greaterThan x y = {s = x.s ++ ["is greater than"] ++ y.s};self c = {s = "the" ++ c.s};

One difference is that we now make use of the argument to self. Thelinearization of our example tree using this concrete syntax is: “the salary ofthe person is greater than 0”.

Concrete Syntax for German. Defining a concrete syntax for German isnot quite as simple as for English. For instance, we have to take into accountthat nouns have a gender, and are inflected in case. We will therefore introduceparameter types for gender and case:

5

Page 16: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

param Gender = Masc | Fem | Neutr;Case = Nom | Dat;

For the few constructions included our small grammar, we will never needthe accusative or genitive case, which is why we left them out in the definitionof Case. We allow ourselves this ad hoc approach since we show a more generalsolution below, using the GF resource grammar library.

We also have to use records which contain more than just a string field. Wedefine new linearization types for the categories Class and Expression:

lincat Class = {s : Case => Str; g : Gender};Expression = {s : Case => Str};

For Class we add an inherent feature for gender as a new record field. Also,the s field for both Class and Expression is now an inflection table: a finitefunction from Case to strings.

Given these parameters and linearization types, we can proceed with thelinearization judgements:

lin Integer = {s = table {Nom => ["ganze Zahl"];Dat => ["ganzen Zahl"]};

g = Fem};Person = {s = table {_ => "Person"}; g = Fem};zero = {s = table {_ => "0"}};salary p = {s = table {

Nom => ["das Gehalt von"] ++ p.s ! Dat;Dat => ["dem Gehalt von"] ++ p.s ! Dat

}};greaterThan x y = {s = x.s ! Nom ++ ["ist großer als"] ++

y.s ! Nom};self c = {s = table {

Nom => case c.g of {Masc => "der" ++ c.s ! Nom;Fem => "die" ++ c.s ! Nom;Neutr => "das" ++ c.s ! Nom};

Dat => case c.g of {Fem => "der" ++ c.s ! Dat;_ => "dem" ++ c.s ! Dat}

}};

The linearization of Integer and Person now specifies a gender. We alsosee the use of the table construction to introduce inflection tables: for everyfunction of type Expression or Class, we must supply a nominative as well asdative inflection. The exclamation mark is used for selecting one row from atable. For instance, in salary, we use p.s ! Dat to select the dative inflectionof p.s, since the preposition “von” must be followed by dative. The linearization

6

Page 17: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

of Integer is again slightly ad hoc: we only give the weak adjective inflection,since the strong inflection will not be needed.

The linearization of the tree greaterThan (salary (self Person)) zerowith this German concrete syntax is “das Gehalt von der Person ist großer als0”.

Using the Resource Grammar Library. While the German concrete syn-tax above is very small, it illustrates some general ideas: we need linguisticknowledge when defining concrete syntax for a natural language. Also, we willhave to solve similar problems — such as keeping track of the gender and in-flection of nouns — every time we define German linearization rules in othergrammars. As mentioned above, these are the motivations for the GF resourcegrammar library. This library provides an API of linguistically motivated pa-rameter types (e.g. for gender and case), linearization types (e.g. nouns andverbs) and helper functions (e.g. for forming a sentence from a subject and apredicate).

By using the resource grammar library, we can redefine our German concretesyntax as follows:

lincat Class = CN;Expression = NP;Sentence = S;

lin Integer = ModAdj (apReg "ganz") (UseN (nFrau "Zahl"));Person = UseN (nFrau "Person");zero = npReg "0";salary p =

DefOneNP (AppFun (funVon (nBuch "Gehalt" "Gehalter")) p);greaterThan x y =

predAComp (aDeg3 "groß" "großer" "großt") x y;self c = DefOneNP c;

We are now using a lot of types and helper functions from the resource API.We will not explain them all here, but just note that we have abstracted awayfrom explicitly defining parameter types and inflection tables. Instead we makeuse of more general concepts, such as common nouns (CN) or noun phrases (NP).Of course, we have to spend some effort in learning the API, which makes sensein case of larger grammars, but perhaps not for our small example.

The resource grammar library provides more or less the same API for En-glish, Finnish, French, German, Italian, Russian and Swedish. This is helpfulwhen writing concrete syntaxes for several of these languages. For instance, aresource grammar version of our English concrete syntax is very similar to theGerman one, as shown below:

lincat Class = CN;Expression = NP;Sentence = S;

7

Page 18: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

lin Integer = UseN (nNonhuman "integer");Person = UseN (nHuman "person");zero = npReg "0";salary p = DefOneNP (AppFun (funNonhuman "salary") p);greaterThan x y = predAComp (aReg "great") x y;self c = DefOneNP c;

6 Included Papers

Paper 1: Natural Language Specifications

This paper is a preliminary version of a chapter in a forthcoming book on theKeY system. Although it is chronologically the last of the included papers, itis placed as the first one in this thesis since it serves as an introduction to ourwork.

The paper describes our work from the point of view of a user of the KeYsystem. We motivate and describe the two kinds of natural language function-ality which we provide: translation of OCL specifications to natural language,and a multilingual, syntax-directed editor which allows editing of specificationsin OCL, English and German in parallel. We also give an overview of howthe different components of our work fit together. The central component is amultilingual GF grammar for specifications in OCL, English and German.

This paper was written by the thesis author. The underlying work is for mostparts also described in papers 2, 3 and 4. The implementation of the integrationof GF and KeY was started by the author, and is currently being worked onalso by Master’s thesis student Hans-Joachim Daniels. The German parts ofthe GF grammar were developed as a study thesis by Daniels [7], supervised bythe author and Aarne Ranta.

The KeY Book is edited by Bernhard Beckert, Reiner Hahnle and Peter H.Schmitt, and will be published by Springer in the LNAI subseries.

Paper 2: An Authoring Tool for Informal and Formal Re-quirements Specifications

This paper describes the basic motivations and design principles of our work.It points out the gap between formal and informal software specifications, andsuggests how we can bridge that gap by a multilingual GF grammar for spec-ifications in OCL and English. This addresses the problems of authoring andmaintaining specifications (by providing a multilingual, syntax-directed editor),as well as synchronizing specifications of different levels of formality (by enablingthe translation of OCL into English).

This paper was written together with Reiner Hahnle and Aarne Ranta. Pub-lished in ETAPS/FASE-2002: Fundamental Approaches to Software Engineer-ing, edited by R. D. Kutsche and H. Weber, Springer LNCS, vol. 2306, pp.

8

Page 19: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

233–248, 2002. The contribution of this author to the work described in thepaper was developing the GF grammar for specifications in OCL and English.

Paper 3: Disambiguating Implicit Constructions in OCL

This paper describes work done in connection with the implementation of acustom OCL parser and typechecker to be used with our GF based system. Itpresents disambiguating rules for dealing with the various implicit constructionsof the concrete syntax of OCL, in the form of a type derivation system. Similarrules are also given in the OCL 2.0 specification [10], but in the form of anattribute grammar using OCL itself for formulating the disambiguating rules.

The paper describes work done by this author. It was accepted to the OCLand Model Driven Engineering Workshop, at the UML conference in Lisbon,2004.

Paper 4: Translating Formal Specifications to Natural Lan-guage — A Grammar-Based Approach

In paper 4, we show that our approach using a GF grammar for linking OCL tonatural language scales well enough to handle a non-trivial case study: translat-ing OCL specifications of the Java Card API to English. We describe a numberimprovements made to the system as described in paper 2.

To make the translation more readable, we add formatting (LATEX andHTML) to the GF concrete syntax for English. Inspired by Natural LanguageGeneration techniques such as aggregation, we introduce a program (externalto the GF grammar) to perform transformations on GF abstract syntax treesbefore linearization.

To handle domain specific vocabulary, we generate GF grammar modulesfrom UML class diagrams. The generated modules make use of a grammar-level API of common constructions, which make them customizable also to auser who is not a GF expert.

This paper is based on the Master’s thesis of David Burke, which was su-pervised by the author. The author was involved in developing the basic ideastogether with Burke. Burke worked independently on the case study of theJava Card API by extending previous work (by the author and Hans-JoachimDaniels) on the GF grammars. The author also did some work on the GF level,as well as all of the work related to parsing OCL and transforming GF abstractsyntax trees.

The paper has been accepted for publication in Proceedings of LACL 2005,edited by Philippe Blache and Edward Stabler, number 3492 in Springer LNAI,2005 (forthcoming).

9

Page 20: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

7 Related Work

Natural Language Generation. Natural Language Generation (NLG) isinformally described in [20] as producing understandable natural language textfrom a non-linguistic representation of information. This very general descrip-tion also fits GF linearization: taking abstract syntax trees into expressions ofa concrete language.

A slightly more refined view of NLG is that it consists of two problems:discourse planning (“what to say”) and surface realization (“how to say it”) [14].Discourse planning consists of deciding what pieces of information to presentfrom a knowledge base. It therefore does not seem to correspond to linearizationin a natural way: when performing linearization in GF, we already know “whatto say”, since we have the abstract syntax tree. Surface generation seems tobe more similar to linearization: we can think of linearization as a two-stepprocedure, where the linearization rules go from non-linguistic abstract syntax tolinguistically motivated resource grammar constructions. The resource grammarimplementation then takes the step to surface strings in natural language.

To improve the quality of the natural language generated by our GF gram-mar, our work has been inspired by NLG surface generation techniques, inparticular what is known as aggregation [6]. Such improvements can partly beperformed in the linearization rules of concrete syntax, but we may also needto manipulate abstract syntax trees using a program external to GF.

Proof Presentation. Our aggregation-inspired work to improve the qualityof the natural language is also similar to [5], which presents rewriting rulesto transform proofs in the Coq proof assistant to natural language. Some ofthese rules ensure that iterated constructions are transformed into non-repetitivenatural language.

The idea of dynamically extending a grammar with rules for domain-specificconcepts is also used in [12], which is about presenting proofs of the Alfa proofassistant using GF grammars which are extended with user defined concepts.

OCL. As mentioned in Sect. 3, there are a number of papers giving alternativedefinitions of OCL. Our work in Paper 3 has in particular been inspired by [4]and [3] which define their own semantics and type systems for OCL.

A survey of available OCL tool is given in [24]. Important to the KeY systemis the the Dresden OCL Toolkit [8, 13]. This toolkit offers among other thingsOCL parsing and typechecking, as well as generation of Java assertion code fromOCL specifications. In the KeY system it is used in the translation of OCL intoDynamic Logic.

As far as we know, there is no other tool for linking OCL to natural language.

Requirements Engineering. In the field of requirements engineering andconceptual modelling, there is work on producing natural language from a formalrepresentation to enable validation by people not trained in formal languages.

10

Page 21: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

For instance, the paper [11] presents a system for generating natural languageexplanations from conceptual models, and also gives an overview of relatedwork. The basic difference to our approach is that we translate textual OCLspecifications — not e.g. class diagrams or other parts of a UML model — intonatural language, while [11] generates explanations from “. . . process-orientedand static ER-like languages. . . ”, e.g. data flow diagrams. The generation ofexplanations is a NLG problem, which includes discourse planning as well assurface generation.

There is also work on going in the other direction: from informal, naturallanguage text to conceptual models, e.g. [21]. However, the problem of auto-matically translating informal, natural language text into OCL falls outside thescope of this thesis.

The fragment of English described by our GF grammar can be considered asa controlled language of specifications, and in this sense our approach is similarto [9].

8 Future Work

There are many possible lines of future work, e.g. improving the quality ofthe generated natural language, adding concrete syntaxes for more languages(formal and natural ones), or involving users in an evaluation of the translationtool and the syntax editor. We are currently focusing on two issues: 1) Makingit possible to use formal notation such as OCL for parts of a natural languagespecification, e.g. for an arithmetic expression, or a method call. 2) Improvingthe syntax editing of specifications in OCL and natural language. This entailswork both on the level of the syntax-editor GUI as well as on the GF grammarlevel.

9 Contributions

This thesis provides a link between OCL and natural language in the form ofa tool which enables the translation of OCL to natural language, and providesa multilingual editor in which OCL and natural language specifications can beedited in parallel. This is one step towards bridging the gap between formal(as needed for formal methods) and informal (as used in software engineeringpractice) specifications. By linking OCL to natural language, we make under-standing, maintenance and creation of formal specifications possible for peoplewith varying levels of familiarity with formal specification languages.

Our tool is a non-trivial application of the Grammatical Framework. Wehave developed a multilingual GF grammar for specifications in OCL and natu-ral language. The grammar captures the OCL type system, its built-in construc-tions and the predefined types of the OCL library. It is dynamically extendedwith domain-specific concepts by generating GF grammar modules from UMLclass diagrams. The generated modules make use of a grammar-level API of

11

Page 22: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

common constructions, which means that these modules can be modified with-out requiring GF expertise. To improve the readability of the translation ofOCL specifications, the grammar includes formatting of the produced naturallanguage. Inspired by Natural Language Generation techniques like aggrega-tion, we also apply transformations to abstract syntax trees using a programexternal to the GF grammar.

Parts of our work on a separate OCL parser and typechecker are presentedas disambiguating rules for the concrete syntax of OCL in the form of a typederivation system.

Our tool is a part of the KeY system, which integrates formal softwarespecification and verification into the industrial software engineering processes.It has successfully been used for a non-trivial case study: translating the OCLspecifications of the Java Card API into English.

References

[1] Wolfgang Ahrendt, Thomas Baar, Bernhard Beckert, Richard Bubel, Mar-tin Giese, Reiner Hahnle, Wolfram Menzel, Wojciech Mostowski, AndreasRoth, Steffen Schlager, and Peter H. Schmitt. The KeY tool. Software andSystems Modeling, 4:32–54, 2005.

[2] Bernhard Beckert, Uwe Keller, and Peter H. Schmitt. Translating theObject Constraint Language into first-order predicate logic. In Proceed-ings, VERIFY, Workshop at Federated Logic Conferences (FLoC), Copen-hagen, Denmark, 2002. Available at i12www.ira.uka.de/~key/doc/2002/BeckertKellerSchmitt02.ps.gz.

[3] Marıa Victoria Cengarle and Alexander Knapp. Ocl 1.4/1.5 vs. 2.0 ex-pressions: Formal semantics and expressiveness. In Software and SystemsModeling, volume 3, 2004.

[4] Tony Clark. Type checking UML static diagrams. In Robert France andBernhard Rumpe, editors, UML’99—The Unified Modeling Language. Be-yond the Standard. Second International Conference, Fort Collins, CO,USA, October 28-30. 1999, Proceedings, volume 1723 of LNCS, pages 503–517. Springer, 1999.

[5] Yann Coscoy, Gilles Kahn, and Laurent Thery. Extracting text from proofs.In M. Dezani-Ciancaglini and G. Plotkin, editors, Proc. Second Int. Conf.on Typed Lambda Calculi and Applications, volume 902 of LNCS, pages109–123, 1995.

[6] Hercules Dalianis and Eduard Hovy. Aggregation in natural language gen-eration. In Giovanni Adorni and Michael Zock, editors, Trends in NaturalLanguage Generation: an Artificial Intelligence Perspective, EWNLG’93,Fourth European Workshop, volume 1036 of LNAI. Springer, 1996.

12

Page 23: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

[7] Hans-Joachim Daniels. Eine deutsche Grammatik fur OCL. Studienarbeit,2003. http://www.cs.chalmers.se/~krijo/gfspec/.

[8] Dresden OCL toolkit homepage, 2005. http://dresden-ocl.sourceforge.net/.

[9] Norbert E. Fuchs, Uta Schwertel, and Rolf Schwitter. Attempto ControlledEnglish—not just another logic specification language. In P. Flener, editor,Logic-Based Program Synthesis and Transformation, Eighth InternationalWorkshop LOPSTR’98, volume 1559 of LNCS. Springer, 1999.

[10] Object Managment Group. OCL 2.0 specification, 2003. http://www.omg.org/cgi-bin/doc?ptc/2003-10-14.

[11] Jon Atle Gulla. A general explanation component for conceptual modelingin CASE environments. ACM Transactions on Informal Systems, 14(3),1996.

[12] Thomas Hallgren and Aarne Ranta. An extensible proof text editor. InM. Parigot and A. Voronkov, editors, Logic for Programming and Auto-mated Reasoning, LPAR, LNAI 1955, pages 70–84. Springer, 2000.

[13] Heinrich Hussmann, Birgit Demuth, and Frank Finger. Modular archi-tecture for a toolset supporting OCL. In Andy Evans, Stuart Kent, andBran Selic, editors, Proc. 3rd Int. Conf. on the Unified Modeling Language,LNCS 1939, pages 278–293. Springer, 2000.

[14] Daniel Jurafsky and James H. Martin. Speech and Language Processing.Prentice-Hall, 2000.

[15] Janna Khegai, Bengt Nordstrom, and Aarne Ranta. Multilingual syntaxediting in GF. In A. Gelbukh, editor, Intelligent Text Processing and Com-putational Linguistics (CICLing-2003), number 2588 in LNCS. Springer,2003.

[16] Per Martin-Lof. Intuitionistic Type Theory. Bibliopolis, Napoli, 1984.

[17] Aarne Ranta. The GF resource grammar library, 2004. http://www.cs.chalmers.se/~aarne/GF/lib/resource/.

[18] Aarne Ranta. Grammatical Framework: A Type-theoretical Grammar For-malism. The Journal of Functional Programming, 14(2):145–189, 2004.

[19] Aarne Ranta. Grammatical Framework homepage, 2005. www.cs.chalmers.se/~aarne/GF/.

[20] Ehud Reiter and Robert Dale. Building applied natural language generationsystems. Journal of Natural Language Engineering, 3(1):57–87, 1997.

13

Page 24: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

[21] Colette Rolland and C. Proix. A natural language approach for require-ments engineering. In Advanced Information Systems Engineering, 4th In-ternational Conference CAiSE ’92, volume 593 of LNCS. Springer, 1992.

[22] Peter H. Schmitt. A model theoretic semantics of OCL. In B. Beckert,R. France, R. Hahnle, and B. Jacobs, editors, Proceedings, IJCAR Work-shop on Precise Modelling and Deduction for Object-oriented Software De-velopment, Siena, Italy, pages 43–57. Technical Report DII 07/01, Dipar-timento di Ingegneria dell’Informazione, Universita degli Studi di Siena,2001.

[23] Sun Microsystems. Java Card homepage, 2005. http://java.sun.com/products/javacard/.

[24] Ambrosio Toval, Vıctor Requena, and Jose Luis Fernandez. Emerging OCLtools. Software and Systems Modeling, 2, December 2003.

[25] Unified Modeling Language specification homepage, 2005. http://www.uml.org/.

14

Page 25: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Paper 1.Natural Language SpecificationsKristofer Johannisson. Natural Language Specifications. Preliminary versionof a chapter of The KeY Book, edited by Bernhard Beckert, Reiner Hahnle andPeter H. Schmitt, which will be published by Springer in the LNAI subseries.

Page 26: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement
Page 27: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Natural Language Specifications ?

Kristofer Johannisson

This chapter describes how to use the KeY tool to bridge the gap between formaland informal specifications. Specifications need to be understood, maintainedand authored by people with varying levels of familiarity with a formal specifi-cation language such as OCL. While a user of the KeY theorem prover shouldknow a formal specification language, we cannot expect the same from a typicalsoftware developer, manager or customer. Hence there is need for specificationsof different levels of formality, and we need to keep these different versions syn-chronized.

The KeY tool addresses these problems by making it possible to automaticallytranslate formal (OCL) specifications to natural language (NL), and by providinga multilingual editor in which specifications can be edited in OCL and naturallanguage in parallel.

This chapter starts with an overview of the natural language features ofKeY in Sect. 1. Sections 2 and 3 describe basic principles and components.The multilingual editor is described in Sect. 4. We outline how domain specificvocabulary is handled in Sect. 5, and conclude with pointers to further readingand a summary in sections 6 and 7.

1 Feature Overview

This section gives an overview of the natural language features of the KeY tool.While the later sections give a more thorough description, this should give youan idea about what is possible to achieve, and what limitations there are.

1.1 Translating OCL to Natural Language

Using the KeY tool, it is possible to translate all OCL specifications in a Togeth-erCC1 project to natural language. Figures 1 and 2 show a class diagram andOCL specifications from the KeY example project PayCard, and Fig. 3 containsthe NL translation provided by KeY.

The translation in Fig. 3 is produced automatically, no user interaction isrequired (unless we want to customize the translation). The output is formatted,using either LATEX (as shown here) or HTML.

Note that the structure of the natural language text is very similar to thestructure of the OCL specification, and has the same level of abstraction. We

? This is a preliminary version of a chapter in the upcoming KeY book, edited byBernhard Beckert, Reiner Hahnle and Peter H. Schmitt, to be published in theSpringer LNAI subseries.

1 TogetherCC is the commercial CASE tool which is used in the KeY system.

17

Page 28: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

PayCardid : Integerlimit : Integerbalance : IntegerunsuccessfulOperations : Integeravailable() : Integercharge(amount : Integer)

PayCardJuniorjuniorLimit : IntegercreateCard() : PayCardJuniorcheckSum(sum : Integer) : IntegercomplexCharge(amount : Integer)

Fig. 1. Example Class Diagram

context PayCard::charge(amount : Integer)

pre: amount > 0

post: balance >= balance@pre

context PayCard::available() : Integer

post: result = balance or unsuccessfulOperations > 3

context PayCardJunior

inv: (self.balance >= 0) and (self.balance < juniorLimit)

and (juniorLimit < limit)

context PayCardJunior::createCard() : PayCardJunior

post: result.limit = 10

context PayCardJunior::charge(amount : Integer)

pre: amount > 0

post: if (balance@pre + amount < juniorLimit)

then (balance = balance@pre + amount)

else ((balance = balance@pre) and

(unsuccessfulOperations = unsuccessfulOperations@pre + 1))

endif

context PayCardJunior::checkSum(sum : Integer) : Integer

post: if (result = 1) then (sum < juniorLimit)

else (sum >= juniorLimit) endif

context PayCardJunior::complexCharge(amount : Integer)

pre: amount > 0

post: if (balance@pre + amount < limit)

then (amount = balance - balance@pre)

else (balance = balance@pre) and

(unsuccessfulOperations = unsuccessfulOperations@pre + 1)

endif

Fig. 2. Example OCL Constraints

18

Page 29: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

For the operation charge ( amount : Integer ) of the class PayCard ,given the following pre-condition :

– amount is greater than 0

then the following post-condition should hold :

– the balance is at least the previous value of the balance

For the operation available () : Integer of the class PayCard ,the following post-condition should hold :

– the result is equal to the balance or the unsuccessful operations is greater than 3

For the class PayCardJunior the following invariant holds :

– the following conditions are true• the balance is at least 0• the balance is less than the junior limit• the junior limit is less than the limit

For the operation createCard () : PayCardJunior of the class PayCardJunior ,the following post-condition should hold :

– the limit of the result is equal to 10

For the operation charge ( amount : Integer ) of the class PayCardJunior ,given the following pre-condition :

– amount is greater than 0

then the following post-condition should hold :

– if the previous value of the balance plus amount is less than the junior limit then:• the balance is incremented by amount

otherwise:• the balance does not change and the unsuccessful operations is incremented by 1

For the operation checkSum ( sum : Integer ) : Integer of the class PayCardJunior ,the following post-condition should hold :

– if the result is equal to 1 then:• sum is less than the junior limit

otherwise:• sum is at least the junior limit

For the operation complexCharge ( amount : Integer ) of the class PayCardJunior ,given the following pre-condition :

– amount is greater than 0

then the following post-condition should hold :

– if the previous value of the balance plus amount is less than the limit then:• amount is equal to the balance minus the previous value of the balance

otherwise:• the balance does not change and the unsuccessful operations is incremented by 1

Fig. 3. Example Natural Language Translation of OCL Constraints

19

Page 30: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

get a direct translation of the OCL specification, not an informal explanation ofwhat it means.

For translating the domain specific concepts from a class diagram (classes,attributes, operations and associations) we use some heuristics which often workwell, but not always. For instance, translating juniorLimit as “junior limit” isprobably fine, while for unsuccessfulOperations we may prefer e.g. “number ofunsuccessful operations” rather than the default translation “unsuccessful op-erations”. We therefore allow user customization of the translation of domainspecific concepts, as described in Sect. 5.

1.2 Multilingual Specification Editor

The KeY tool provides a multilingual, syntax-directed editor for editing of OCLand natural language specifications in parallel. In this editor the user constructsan abstract syntax tree of a specification (for instance an invariant of a class)by selecting alternatives from menus. The syntax tree is at all times presentedto the user in both OCL and English.

Figure 4 shows an example editing session, where we have just started editingan invariant for the class PayCard. There are three main parts of the editor win-dow: the syntax tree display (top left), the linearization area (top right), and therefinements menu (bottom). The syntax tree display shows the abstract represen-tation of the specification, while the linearization area presents the specificationin OCL and English. Unfinished parts of the specification — called goals, ormetavariables — are shown as question marks. The refinements menu presentspossible ways of filling in the goals. Basic editing proceeds by selecting a goal(by clicking in the text or in the tree) and a refinement (by clicking in the refine-ments menu). Since the tree is presented in both OCL and English, it is enoughto know one of the languages.

Assume that we wish to complete the unfinished invariant in Fig. 4 into e.g.balance >= 0 (OCL) or “the balance is at least 0” (English). We would thenproceed in a top-down fashion, first adding the comparison operator, and thenthe left and right argument to it. In Fig. 4, the refinements menu is shown inEnglish (the user can choose between OCL and English), so we should select the“at least” refinement. Figure 5 shows the editor after performing this one step.

We now have a specification ? >= ? (OCL) or “? is at least ?” (English). Sincethe comparison operator takes two arguments, we have two new goals to fill in.In the figure, the leftmost goal has been selected. Notice that the refinementsmenu presents only type correct alternatives, which in this case means that weare only allowed to fill in instances of the OCL library type Real (or any of itssubtypes).

To complete the example, we have to fill in the left goal with balance, andthe right with 0, but we omit these steps here. The syntax editor is furtherexplained in Sect. 4.

20

Page 31: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Fig. 4. Example Editor Session 1

Fig. 5. Example Editor Session 2

21

Page 32: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

1.3 Suggested Use Cases

Translation of OCL to Natural Language. Being able to automatically translateOCL to natural language means that OCL specifications can be presented topeople who do not know OCL. The translation can for instance be shown to acustomer, who can then validate if it captures the desired behavior of a system,or to a programmer who does not know OCL but needs to implement a systemaccording to the specifications.

However, the provided natural language translations are on the same ab-straction level as the original OCL specifications (as noted above). The intendedreader of the translations must therefore be comfortable with this abstractionlevel. For instance, we cannot expect a translation of OCL specifications involv-ing low-level implementation issues to be understandable to a customer.

The Multilingual Editor. The editor allows editing of OCL and natural languagein parallel, and does not allow the construction of syntactically incorrect OCL. Itshould therefore be useful for instance to a person who is not an OCL expert, butwho needs to modify existing OCL specifications, as well as to people learningOCL.

For people who are already proficient in OCL, and who are not concernedwith natural language translation, a traditional text editor is a more suitabletool for creating and modifying OCL specifications.

OCL as Single Source. An important part of our approach is to use OCL as“single source”: by creating and maintaining specifications in OCL (possiblyusing the multilingual editor), and then automatically translating them to nat-ural language, we avoid the problem of having different versions of the samespecification which need to be synchronized.

2 The Grammatical Framework

The natural language functionality in KeY is based on a multilingual grammarof specifications written in the Grammatical Framework (GF) formalism [6].

A GF grammar defines abstract and concrete syntax. The abstract syntaxgives rules for how to form abstract syntax trees. In a typical GF applicationgrammar these trees are used as a non-linguistic, semantic representation of arestricted domain. In our case, we use abstract syntax trees to represent require-ments specifications.

The concrete syntax defines how to present abstract syntax trees as expres-sions of a particular language, which can be a formal or a natural one. Byhaving several concrete syntaxes for the same abstract syntax we get a mul-tilingual grammar. We have defined concrete syntaxes for OCL, English, and

22

Page 33: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

German,2 which means that specifications represented in GF abstract syntaxcan be presented in these three languages.

The multilingual grammar for OCL, English and German specifications iswritten in the GF formalism. The GF system then provides functionality basedon this grammar: it derives parsers and linearizers for the three languages asshown in Fig. 6. We can e.g. parse an OCL specification (resulting in an abstractsyntax tree) and then linearize it into English or German. Although we canalso parse English or German specifications, the fragment of these languagesdescribed by our grammar is very small: we cannot expect to successfully parsearbitrary informal English or German specifications.

linearizati

onparsi

ng

OCL Text

linea

rizati

on

parsi

ng

Abstract Syntax Tree

EnglishText

linearization

parsing

German Text

Fig. 6. GF Parsing and Linearization

As noted above in Sect. 1.1, the structure of the natural language translationof an OCL specification provided by our tool is very similar to the structureof the original OCL specification. We can now explain the reason for this: thetranslation and the original specification both share the same abstract syntax,and the linearization rules as defined by the concrete syntaxes for OCL, Englishand German cannot be arbitrarily complex. GF linearization rules must be com-positional, meaning that the linearization of a tree is always expressed in termsof the linearization of its subtrees, not the subtrees themselves.

An important aspect of our multilingual GF grammar is that it consists of astatic as well as a dynamic part. The static part captures the OCL type system,basic OCL constructions such as invariants or if-then-else expressions, and thepredefined types and operations of the OCL library. The dynamic part is adescription of the domain specific concepts — classes, attributes, operations andassociations — found in the class diagram of the current TogetherCC project.

2 We show no German examples in the current version of this chapter, since thetranslation of the domain-specific concepts from the class diagram is work-in-progressfor this language.

23

Page 34: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

This part of the GF grammar is generated from the current class diagram. Sect. 5describes the basics of this generation, and how it can be customized.

3 System Overview

There are a number of components involved in linking OCL to natural language:a multilingual GF grammar, the GF system, a syntax-directed editor, a GFgrammar generator taking class diagrams as input, and also a stand-alone OCLparser and typechecker. Fig. 7 shows how these components relate to each otherin terms of input and output.

OCL Text

UML Class Diagram

GF Grammar Modules (dynamic)

Parsing & Typechecking

GF Abstract Syntax Tree

Grammar Generator

GF Grammar Modules (static)

GF

OCL Text

English Text

German Text

Fig. 7. System Components

Grammar Generation. All functionality relies on the existence of the GF gram-mar for specifications, and as described in Sect. 2 above, parts of this grammarare dynamically generated from a class diagram. The class diagram is in turnextracted from TogetherCC.

OCL Parsing and Typechecking. When translating an OCL specification to nat-ural language, or when starting the multilingual editor for a given OCL speci-fication, the first step is to turn the OCL text into a GF abstract syntax tree.To do this, we are not using the parser automatically derived by GF, but a cus-tom parser and typechecker. Note that typechecking OCL requires also the classdiagram as input.

There are a number of reasons for using a custom parser and typechecker: weneed to work around a limitation in the parser derived by GF for our particular

24

Page 35: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

grammar, it makes it simpler to deal with all the various implicit forms in OCLconcrete syntax, and it also makes it possible to give better error messages, e.g.when encountering type errors. Finally, we expect the external parser, which isderived using a standard context-free parser generator, to be more efficient thanthe GF parser when parsing large specifications.

GF. The input to GF is the grammar (static and dynamic parts) and an abstractsyntax tree. To translate OCL to natural language, the tree is then just linearizedinto English and German. In case of the editor, the user will manipulate thesyntax tree in the editor, while viewing the result in OCL, English and Germanin parallel.

4 The Multilingual Editor

The multilingual editor allows you to edit specifications in OCL, English andGerman in parallel. The editor is started from the KeY submenu of the contextmenu of any class or operation in TogetherCC. If the class or operation is alreadyannotated with an OCL specification, it will be parsed and shown in the editor,otherwise the editor starts up with an empty invariant (for classes) or with emptypre- and postconditions (for operations). The editor is intended for editing theOCL specification of one class or operation at a time.

As of this writing, the editor is under active development.3 We will thereforeonly present the general principles of the editor here, since it is not yet clearwhat the details will be.

4.1 Syntax-Directed Editing

The editor is syntax-directed: editing consists of manipulating the abstract syn-tax tree of a specification, rather than a string of characters as in a typical texteditor. The tree is at all times presented in OCL, English and German, as definedby the GF grammar for specifications (the user can choose which languages toshow). Since we are editing a syntax tree, we can only construct syntacticallycorrect specifications. The editor also includes a type system and ensures thatthe syntax tree is always type-correct.

There are two basic ways of manipulating a syntax tree in the editor: refine-ment (top-down editing) and wrapping (bottom-up).

4.2 Top-Down Editing: Refinement

Refinement consist of selecting a goal — an unfinished part of the tree, displayedas a question mark — and filling in this goal by selecting a refinement from a

3 The forthcoming Master’s thesis of Hans-Joachim Daniels concerns the developmentof a customized version of the generic GF syntax-directed editor, tailored for devel-oping specifications in KeY.

25

Page 36: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

menu. The selected refinement may in turn contain new goals which need to befilled in.

Each goal has a type, and the refinements menu only lists refinements ofthis type. A type can for instance be “integer expressions”, “sentences”, or “at-tributes”. The types and refinements available are given by the underlying GFgrammar.

We will consider the example from the beginning of this chapter again, asshown in Fig. 8. In the upper left part of the editor window we see the abstractsyntax tree of a specification, which is presented in OCL and natural language inthe upper right part of the window. There are two unfinished parts (goals), onefor each argument to the comparison operator (>=). The editor shows the type ofthe current goal (the left one), which in this case is the OCL library class Real.The refinements menu in the lower part of the window only lists constructionswhich have type Real.

Fig. 8. Editing by Refinement

The refinements menu can be set to show OCL, English or German, or justuse the naming from the abstract syntax. It is also possible to show the type ofeach refinement in the menu.

4.3 Bottom-Up Editing: Wrapping

Wrapping consists of selecting any part of the syntax tree — with or withoutunfinished parts — and replacing it with a new construction, which will containthe previously selected subtree as a part. For instance, if we have constructedthe invariant self.balance >= 0, and would like to add that balance shouldalso be smaller than limit, we do this by wrapping it using and. The first step isto select the subtree corresponding to self.balance >= 0, as shown in Fig. 9.

26

Page 37: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Fig. 9. Editing By Wrapping, Step 1

The current selection is now a sentence. Since and is a construction whichtakes two sentences into a new sentence, we can wrap the current selection usingand. This is done by clicking on “w Sent 0 and Sent 1 0” in the refinementsmenu. The “w” stands for wrapping, the 0 is an index which refers to the firstargument of and. The result is shown in Fig. 10: the previously selected subtreebalance >= 0 has now been wrapped as the first argument to and, resulting inbalance >= 0 and ?.

4.4 Other Editor Features

The editor also includes other features, e.g. as you would expect there is aclipboard for copying and pasting syntax trees, as well as an undo command.Another feature is refinement by parsing: instead of filling in a goal by selecting arefinement, one can enter a text string. The string is then parsed and (if parsingwas successful) the goal is filled in with the resulting syntax tree. In this case,it is the parser derived by GF which is being used, not the custom OCL parserand typechecker.

4.5 Expressions and Sentences

The editor makes a distinction between expression and sentences. Expressionsare instances of any of the classes from the class diagram, or of the OCL librarytypes such as Integer or Boolean. Sentences are used to express invariants, pre-and postconditions. An example expression is self.balance (“the balance”),an example sentence is self.balance >= 0 (“the balance is at least 0”). Wemention this distinction since it is not present in OCL itself: there is no conceptof sentences in the OCL language specification. Instead, expressions of type

27

Page 38: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Fig. 10. Editing by Wrapping, Step 2

Boolean are used for invariants, pre- and postconditions. However, in the editorexpressions and sentences are two different types: goals of expression type cannotbe filled in with a sentence, and vice versa.

All OCL library operations as well as all domain specific attributes and op-erations which return Boolean from the point of view of OCL are considered assentences in the editor. It is always possible to convert a sentence into a booleanexpression, but this has to be done explicitly.

4.6 Subtyping

The OCL type system includes subtyping: wherever an expression of a type T isexpected, we can also use an expression of type T ′ as long as T ′ is a subtype ofT . For instance, the OCL comparison operators <, >, <=, and >= are all definedfor the class Real. However, since Integer is a subtype of Real, we can also usethem to compare integers.

GF has no built-in notion of subtyping. In the GF grammars for specifica-tions, this problem is solved by including explicit coercions (typecasts). Thesecoercions are part of the abstract syntax tree, but are not visible in the OCLor natural language rendering of the tree. The editor will usually create thesecoercions automatically without requiring user interaction, but sometimes — inparticular when an existing specification is modified — the user has to be awareof the coercions.

5 Translation of Domain Specific Concepts

As previously mentioned, the translation of domain specific concepts is definedby GF grammar modules which are generated from the class diagram of the

28

Page 39: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

current project in TogetherCC. This generation is based on some simple rulesdescribed below. If the automatically derived translation is not appropriate, itcan be customized by hand.

5.1 Grammar Generation

The grammar generation provides default translations for the concepts — classes,attributes, operations, and associations — in a class diagram. Currently, thisgeneration is based on a few simple rules:

– Classes are treated as common nouns, or as common noun phrases. In casethe name of the class is capitalized (as in e.g. PublicKey), it is split intoseparate words, where the last word is considered as a noun which is modifiedby the other words. For instance, a class Person is treated as a common noun“person”, while a class PublicKey is treated as a common noun phrase “publickey”.

– Properties (attributes, operations and associations) are treated as noun phras-es, except for boolean properties, which are treated as sentences. Capitaliza-tion is used also for properties, e.g. an attribute juniorLimit is translated asthe noun phrase “junior limit”. Boolean properties which start with “is-”,e.g. isEmpty or isValidated, are treated as adjectives (e.g. “. . . is empty”, “. . .is validated”).

5.2 Customizing the Translation

If the translation provided by the generated grammar modules is not appropri-ate, it can be customized by hand. We plan to make it possible to perform suchcustomization by having the user add annotations to the TogetherCC class dia-gram, but at present there is no such functionality. To customize the translation,one must instead modify the generated GF grammar files directly. However, asdescribed below, this can be done without requiring GF expertise.

Customization is done on the level of concrete syntax. The generated con-crete syntax makes use of a grammar-level API, which contains functions forcommon constructions. This API abstracts from the complexity of the rest ofthe grammar. To modify the generated concrete syntax it is therefore enough tohave an understanding of the API, it is not necessary to be a GF expert.

This API is described in detail elsewhere,4 here we will just consider a smallexample. As mentioned in the previous example in Sect. 1.1, the default transla-tion of the unsuccessfulOperations attribute of the PayCard class is “unsuccessfuloperations”, although “number of unsuccessful operations” might be a more nat-ural translation. The generated GF concrete syntax for unsuccessfulOperations isthe following:

4 There will be a website accompanying the KeY book. In the meantime, we refer tothe website of the OCL-Natural Language tool [3].

29

Page 40: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

lin unsuccessfulOperations = mkSimpleProperty (adjCN["unsuccessful"] ((strCN ["operations"])));

The left hand side of this linearization judgement is simply the name of theconstruction in the abstract syntax which represents unsuccesfulOperations. Theright hand side gives the linearization of this construction, expressed using thefunctions mkSimpleProperty, adjCN and strCN of the grammar API.

This generated linearization can be changed to produce “the number of un-successful operations” instead by using the ofCN and strCN functions:

lin unsuccessfulOperations = mkSimpleProperty (ofCN(strCN "number") (adjCN ["unsuccessful"] ((strCN["operations"]))));

6 Further Reading

The basic motivations and design principles of a GF based tool to link OCL andnatural language are described in the paper [2]. A later paper shows that thetool scales well enough to handle a case study: translating OCL specifications ofthe Java Card API to natural language [1]. There is also a website for the tool[3].

An extensive description of GF is found in the paper [6]. There is also a paperabout the generic GF syntax editor [5] as well as a manual [4]. Finally there isa GF website [7].

7 Summary

The KeY tool makes it possible for people who are not OCL experts to create andmaintain OCL specifications, by providing a multilingual, syntax-directed editorin which specifications can be edited in OCL and natural language in parallel.OCL specifications can also be translated to natural language independently ofthe editor, which enables people who have no knowledge of OCL to make use offormal specifications.

A limitation is that the provided natural language translation has roughlythe same structure and level of abstraction as the original OCL specification.In this sense, we do not provide informal explanations of formal specifications.Also, automatic formalization of arbitrary informal specifications falls outsidethe scope of the KeY tool.

The natural language tools are built around a multilingual GrammaticalFramework grammar for specifications in OCL, English and German. The trans-lation of domain-specific concepts can be customized on the grammar level.

Acknowledgements

We thank Aarne Ranta and Philipp Rummer for careful reading of drafts of thispaper.

30

Page 41: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

References

1. David A. Burke and Kristofer Johannisson. Translating formal software specifi-cations to natural language—a grammar-based approach. In Philippe Blache andEdward Stabler, editors, LACL 2005, number 3492 in LNAI. Springer, 2005.

2. Reiner Hahnle, Kristofer Johannisson, and Aarne Ranta. An authoring tool forinformal and formal requirements specifications. In Ralf-Detlef Kutsche and HerbertWeber, editors, Fundamental Approaches to Software Engineering (FASE), Part ofJoint European Conferences on Theory and Practice of Software, ETAPS, Grenoble,volume 2306 of LNCS, pages 233–248. Springer, 2002.

3. Kristofer Johannisson. OCL-Natural Language tool homepage, 2005. http://www.cs.chalmers.se/~krijo/gfspec/.

4. Janna Khegai. Tutorial for the GF Java GUI, WWW homepage, 2004. http:

//www.cs.chalmers.se/~aarne/GF/doc/javaGUImanual/javaGUImanual.htm.5. Janna Khegai, Bengt Nordstrom, and Aarne Ranta. Multilingual syntax editing in

GF. In A. Gelbukh, editor, Intelligent Text Processing and Computational Linguis-tics (CICLing-2003), number 2588 in LNCS. Springer, 2003.

6. Aarne Ranta. Grammatical Framework: A type-theoretical grammar formalism.The Journal of Functional Programming, 14(2):145–189, 2004.

7. Aarne Ranta. The Grammatical Framework WWW homepage, 2005. http://www.cs.chalmers.se/~aarne/GF.

31

Page 42: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

32

Page 43: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Paper 2.An Authoring Tool for Informaland Formal RequirementsSpecificationsReiner Hahnle, Kristofer Johannisson, and Aarne Ranta. An authoring toolfor informal and formal requirements specifications. In ETAPS/FASE-2002:Fundamental Approaches to Software Engineering, edited by R. D. Kutsche andH. Weber, Springer LNCS, vol. 2306, pp. 233–248, 2002.

Page 44: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement
Page 45: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

An Authoring Tool for Informal and Formal

Requirements Specifications

Reiner Hahnle, Kristofer Johannisson, and Aarne Ranta

Chalmers University of Technology, Department of Computing ScienceS-41296 Gothenburg, Sweden, {reiner,krijo,aarne}@cs.chalmers.se

Abstract. We describe foundations and design principles of a tool thatsupports authoring of informal and formal software requirements specifi-cations simultaneously and from a single source. The tool is an attempt tobridge the gap between completely informal requirements specifications(as found in practice) and formal ones (as needed in formal methods).The user is supported by an interactive syntax-directed editor, parsersand linearizers. As a formal specification language we realize the ObjectConstraint Language, a substandard of the UML, on the informal side afragment of English. The implementation is based on the GrammaticalFramework, a generic tool that combines linguistic and logical methods.

1 Introduction

The usage of formal and semi-formal languages for requirements specificationsis becoming more widespread. Witness, for example, the Java Modeling Lan-guage (JML) [11], closely related to which is the ESC/Java specification lan-guage used in Extended Static Checking [12], the constraint language Alloy [9],and the Object Constraint Language (OCL) [15,21]. The OCL is not only usedin meta-modeling to supply a precise semantics for UML diagrams, but also inrequirements specification. A subset of the OCL is also used in iContract [10],the Java variant of design-by-contract, as an assertion language.

Although these languages make an effort to be more “user-friendly” thanearlier formal notations that were based on set theory and predicate logic, it stilltakes a considerable effort to master them and use them effectively. Moreover, itshould not be forgotten that the by far most popular language, wherein softwarespecifications are still written today is natural language (NL).

None of the approaches mentioned above offers support for authoring, un-derstanding, and maintaining formal specifications. We consider this deficiencyto be a serious obstacle to routine usage and further development of formal andsemi-formal methods. Specifically, the following problems have to be addressed, ifformal and semi-formal notations are to become a standard item in the softwareengineer’s toolbox:

Authoring. Support is needed for authoring well-written, well-formed formalspecifications. A syntax-directed editor is of help along with specificationtemplates.

35

Page 46: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Maintenance. Large and complex expressions in any formal language are noteasy to read, even if, like OCL, this language was designed to enhance read-ability. In realistic scenarios, numerous and complex expressions have to bemaintained and, therefore, understood by people who did not necessarilyauthor them or are even familiar with formal languanges.

Mapping Different Levels of Formality. No specification language fits allneeds. For different audiences and purposes it is important to have renderingsin, say, NL, OCL, and first-order logic. For effective communication parts ofthese must be mappable into each other efficiently and with a clear semantics.

Synchronisation. If a system is specified in languages of differing level of pre-cision, it is important to propagate changes consistently. For example, anychange in an OCL constraint should be instantly reflected in the correspond-ing NL description. It will not do to perform these changes manually.

In this paper we suggest a solution to the problems just outlined. We showthat a systematic connection between specification languages on differing lev-els of precision is possible. We concentrate on OCL and NL as specificationlanguages, but the method is not limited to this configuration.

Our approach is based on the Grammatical Framework (GF) [18], a flexiblemechanism that allows to combine linguistic and logical methods. The key idea isto specify (i) an abstract syntax for a specification language (in our case roughlycorresponding to OCL) together with semantic conditions of well-formednessand type-correctness, and (ii) concrete syntaxes for all supported notations (inour case, concrete OCL expressions as well as a fragment of English). For eachset of abstract/concrete syntaxes the GF system then implements algorithms forparsing, linearization, translation, type checking, and a graphical syntax editor.The abstract grammar is much richer than the usual context-free OCL grammar[15] and, together with the syntax editor, enables interactive editing of tem-plates for frequently needed specifications. The result is an authoring system forrequirements specifications that supports creation and maintenance of informaland formal specifications from a single source.

In Section 2 we walk through an example that serves as motivation and at thesame time demonstrates what can be done with our system. In Section 3 we givesome background on the GF formalism that is necessary to understand Section 4,where the implementation is discussed in detail. In Section 5 we evaluate ourapproach and we show how the problems outlined above are addressed in oursystem. The paper is rounded off with brief sections on related work, on futurework, and by concluding remarks.

The latest prototype of our system can be downloaded from http://www.

cs.chalmers.se/~krijo/GF/specifications.html.

2 Motivating Example

As a motivating example we will consider a standard queue data structure—aclass Queue—and show how to use our system for developing specifications ofthis class in OCL and natural language.

36

Page 47: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

2.1 A Class for Queues

For the purpose of this exam-Queue

enqueue(i: Integer): Integer

dequeue(): Integer

getFirst(): Integer {query}

size(): Integer {query}

asSequence(): Sequence(Integer) {query}

ple, we need to specify the in-terface of a class Queue for in-teger queues (we need not con-sider implementation details).We use standard OCL types indoing this (see figure on left).This class should be straight-

forward. We have an operation enqueue for enqueueing an integer on the queueand an operation dequeue for removing the first integer of the queue. The re-turn value of enqueue is simply the value of its argument. We also have anoperation getFirst for inspecting the first element of the queue. The operationasSequence gives us a Sequence (standard OCL type) with all the elementsfrom the queue, in their correct order. This operation is included for specifi-cation purposes; in an actual implementation of the class, asSequence is notrequired.

Note also that all operations which do not affect the state of the queue (“ob-server methods” or “queries”) have been tagged with {query}, using standardUML notation.2.2 Using the GF-based System

Our system is based on the GF system (described in Section 3) with grammarsfor OCL and English (Section 4). It features an interactive editor for formulatingconstraints in OCL and English:

Fig. 1. The Interactive Editor

1

2

3

4

5

Suppose that we want to author a postcondition for the method enqueue ofthe class Queue in the interactive editor. Figure 1 shows what the editor lookslike after a few initial steps.

In this screen-shot we see the beginning of a postcondition for an operation.The main part of the window shows the postcondition in OCL 1 and in English

37

Page 48: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

2 , and also an abstract (internal to GF) representation 3 . To complete thispostcondition, we select a subgoal (that is, metavariable or placeholder) of theform [?...?] and then select one of the possible refinements in the lower leftsubwindow 4 , until there are no more subgoals to fill in. In the example, thenext logical step is to specify the operation for which the current postconditionis intended, that is, enqueue. So we select the subgoal [?Operation?] and therefinement enqueue. Figure 2 shows the result.

Now a new subgoal [?BindPPCond?] is active, and new refinements appearin the lower left menu. The subgoal [?Class?] was automatically filled in withQueue, since this was the only correct refinement left after we chose the operationenqueue. The system is able to infer this automatically.

Note that we edit the postcondition in OCL and in English in parallel. Everychange is instantly reflected in both the OCL and the English version. Whatis actually going on is that we are editing the abstract representation, whichis linearized to English and OCL. This means that the user of the editor canproduce OCL constraints, even though he or she only understands the Englishform of the constraint.

There are also other ways to interact with the editor. Aside from choosingrefinements from a menu to fill in a subgoal, we can simply enter a string (at5 in Figure 1) in English or OCL which will be parsed by the GF editor. Wecan also wrap a term in a function, that is, perform bottom-up editing insteadof top-down.

As will be seen in Section 3, the interactive editor is merely one part of GF:having grammars for OCL and English means that we also have a parser forOCL and for a fragment of English as well as a translator between OCL and thisfragment of English.

Fig. 2. The Interactive Editor—one editing step later

2.3 More Examples

We present some more constraints for methods in the Queue class authored withour system and highlight some of the problems that had to be solved in order to

38

Page 49: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

obtain a smooth rendering in English. In the OCL versions of the constraints onlyline breaks and spaces were inserted by hand (this is a current limitation). Theformatting of the English version of the constraints was achieved by includingLATEX commands in the English grammar.

Operation getFirst

OCL: context Queue::getFirst() : Integer

pre: self.size() > 0

post: result = self.asSequence()->first

English: for the operation getFirst() : Integer of the class Queue, the following pre-condition should hold:

the size of the queue is greater than zeroand the following postcondition should hold:

the result is equal to the first element of the queue

The meaning of the OCL constraint self depends on the context. In these ex-amples, self refers to an instance of Queue, since we are formulating constraintsfor an operation of the class Queue. In English, self corresponds to an anaphoricexpression, which in this particular case is “the queue”.

The operation asSequence can be seen as a way of converting a Queue to anOCL Sequence. While this type cast is necessary in OCL, it is not that interest-ing in English. It is therefore omitted, so the OCL expression self.asSequence()

simply corresponds to “the queue” in English.

Operation dequeue

OCL: context Queue::dequeue() : Integer

pre: self.size() > 0

post: (self.size() > 0 implies self.asSequence() =

self.asSequence@pre() -> subSequence(2, self.size() + 1))

and result = self.getFirst@pre()

English: for the operation dequeue() : Integer of the class Queue, the following pre-condition should hold:

the size of the queue is greater than zeroand the following postconditions should hold:

– if the size of the queue is greater than zero, then the queue is equalto the subsequence of the queue at the start of the operation whichstarts at index 2 and ends at the index equal to the size of the queueplus one

– the result is equal to the first element of the queue at the start of theoperation

Here we see that a sequence of conjuncts in OCL (such as x and y and ...)can be displayed as an itemized list in English. This implies that we need to havethe word “postconditions” in plural form (in contrast to the getFirst example,where we have “postcondition”). We can also note that @pre in OCL simplycorresponds to “at the start of the operation” in English.

The OCL operation subSequence requires its second argument to be greaterthan or equal to its first argument – it never returns an empty sequence. Thisexplains why we use the condition that the size of the queue is greater than zeroin the first postcondition.

39

Page 50: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

3 Grammatical Framework

The Grammatical Framework (GF) is a framework for defining grammars andworking with them [18]. It is used for defining special-purpose grammars on topof a semantic model, which is expressed in type theory [13]. Type theory is apart of GF, the abstract syntax part. The concrete syntax part tells how type-theoretical formulas are translated into a natural language or a formal notation.

The first application of GF was in a project on Multilingual Document Au-thoring at Xerox Research Centre Europe [4]. The idea in multilingual authoringis to build an editor whose user can write a document in a language she does notknow (for example, French), while at the same time seeing how it develops in alanguage she knows (for example, English). From the system’s point of view, theobject constructed by the user is a type-theoretical formula, of which the Frenchand English texts are just alternative views.

The GF programming language, as well as the tools supporting multilingualauthoring, are designed to be generic over both the subject matter and thetarget language. While prototypes have been built for documents such as touristinformation and business letters, the most substantial application so far has beennatural-language rendering of formalized proofs [5]. Software specifications areanother natural GF application, since it is usually clear how specifications areexpressed in type theory. Most uses of type theory as a specification languagehave been based on the Curry-Howard isomorphism, but we will here use it forOCL specifications.

3.1 Abstract Syntax

GF, like other logical frameworks in the LF [6] tradition, uses a higher-order typetheory with dependent types. In this type theory, it is possible to define logicalcalculi, as well as mathematical theories, simply by type signatures. The type-theoretical part of a GF grammar is called the abstract syntax of a language.

To take an example, we first define the types of propositions and proofs,where the type of proofs depends on proposition.

cat Prop ; Proof Prop ;

We then define implication as a two-place function on propositions, and theimplication introduction rule is a function whose argument is a function fromproofs of the antecedent to proofs of the succedent:

fun Imp : Prop -> Prop -> Prop ;

fun ImpI : (A,B:Prop) -> (Proof A -> Proof B) -> Proof (Imp A B)

As usual in functional languages, GF expresses function application by juxtapo-sition, as in Proof A, and uses parentheses only for grouping purposes.

3.2 Concrete Syntax

On top of an abstract syntax, a concrete syntax can be built, as a set of lineariza-tion rules that translate type-theoretical terms into strings of some language. Forinstance, English linearization rules for the two functions above could be

40

Page 51: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

lin Imp A B = {s = "if" ++ A.s ++ "then" ++ B.s} ;

lin ImpI A B c = {s = "assume" ++ A.s ++ "." ++ c.s ++ "." ++

"Hence" ++ "if" ++ A.s ++ "then" ++ B.s} ;

As shown by these examples, linearization is not just a string, but a record ofconcrete-syntax objects, such as strings and parameters (genders, modes, etc.),and parameter-dependent strings. Notice that linearization rules can generatenot only sentences and their parts, but also texts. For instance, a proof of theimplication A&B → A as generated by the rules above, together with a rule forconjunction elimination, is a term that linearizes to the text:

Assume A and B. By the assumption, A and B. A fortiori, A. Hence if A andB then A.

Different languages generally have different types of concrete-syntax objects. Forinstance, in French a proposition depends on the parameter of mode, which weexpress by introducing a parameter type of modes and defining the linearizationtype of Prop accordingly:param Mode = Ind | Subj ;

lincat Prop = {s : Mode => Str} ;

The French linearization rule for the implication islin Imp A B =

{s = table {m => si (A.s ! ind) ++ "alors" ++ B.s ! m}} ;

which tells that the antecedent is always in the indicative mode and that themain mode of the sentence is received by the succedent. One may also noticethat si is not a constant string, but depends (in a way defined elsewhere in thegrammar) on the word following it (as in s’il vous plaıt).

Finally, in formal logical notation, linearization depends on a precedenceparameter:lincat Prop = {s : Prec => Str} ;

lin Imp A B = {s = mkPrec p0 (A.s ! p1 ++ "->" ++ A.s ! p0)} ;

where the function mkPrec (defined elsewhere in the concrete syntax) controlsthe usage of parentheses around formulas.

The examples above illustrate what is needed to achieve genericity in GF.In the abstract syntax, we need a powerful type theory in order to expressdependencies among parts of texts, such as in inference rules. In the concretesyntax, we need to define language-dependent parameter systems and complexstructures of grammatical objects using them.

3.3 Functionalities

GF helps the programmer of grammar applications by providing framework-levelfunctionalities that apply to any GF grammar. The main functionalities are lin-

earization (translation from abstract to concrete syntax), parsing (translationfrom concrete to abstract syntax), type-checking of abstract-syntax objects,syntax editing, and user interfaces (both line-based and graphical). Althoughthese functionalities apply generically to all grammars, it is often useful to cus-tomize them for the task at hand. For this end, the GF source code (written inHaskell) provides an API for easy access to the main functionalities.

41

Page 52: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

4 Implementation

4.1 Classes and Objects

In this section we give a general idea of how we have implemented GF gram-mars for OCL and natural language (at present English). We begin by definingcategories and functions for handling standard object-oriented concepts such asclasses, objects, attributes and operations:

cat Class;

cat Instance (c : Class);

There is a category (type) Class of classes, and a dependent type Instance,hence, for every class c there is a type Instance c of the instances of this class.OCL expressions are represented as instances of classes (and we can of coursesee an instance of a class as an object).

Classes are introduced by judgements like the following:

fun Bool : Class; Integer : Class; Real : Class;

This means that we have type checking in the abstract grammar: for example,where a term of type Instance Bool is expected, we cannot use a term of typeInstance Integer.

The linearizations of these functions to OCL is easy: for Integer we simplytake lin Integer = {s = "Integer"}, and so on. In English, a class can belinearized either as a noun (which can be in singular or plural form, say, “integer”or “integers”) or as an identifier of a class (“Integer”). In GF we handle this byusing parameters, as explained in Section 3.

Subtyping (inheritance) between classes is handled by defining a subtyperelation and a coercion function:

cat Subtype (sub, super : Class);

fun coerce : (sub,super:Class) -> Subtype sub super ->

Instance sub -> Instance super;

The function coerce is used for converting an instance of a class c into aninstance of any superclass of c. The arguments to this function are the classesin question, a proof that the subtyping relation holds between the classes, andfinally an instance of the subclass. The result is an instance of the superclass.For every pair of classes in OCL’s subtyping relation we introduce a term (aproof) of the type Subtype, e.g.:

fun intConformsToReal : Subtype Integer Real;

The linearization of coerce is interesting: since the whole point is to change thetype (but not the meaning) of a term, the linearization rule will leave everythingas it is. For both OCL and English we have:

lin coerce _ _ _ obj = obj;

GF converts to context free grammars to realize parsing, and this makes this rulecircular (it has the form Instance -> Instance). This means that we cannotuse our grammars to parse OCL or English with the GF system as it is now. Wewill have to implement custom modifications for coercion rules.

42

Page 53: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

4.2 Attributes, Operations and Queries

For operations and attributes we have three categories:

cat Attribute (c,a : Class);

Operation (c:Class) (args:ClassList) (returns:Class);

OperationQ (c:Class) (args:ClassList) (returns:Class);

Attributes are simple enough: the two arguments to Attribute give the classto which the attribute belongs, and the type (class) of the attribute itself, re-spectively. For operations, we need to know if they have side-effects, i.e. whetherthey are marked with {query} in the underlying UML model or not. This ex-plains why there are two categories for operations. The first argument of thesecategories is, again, the class to which they belong. The second argument is a(possibly empty) list of the types of the arguments to the operation, the thirdargument is the return type (possibly void) of the operation. The use of listsmakes these categories general (they can handle operations with any number ofarguments), but this generality also makes the grammar a bit more complex atplaces.

Here is how we use an UML attribute or query method (a term of typeOperationQ) within an OCL expression:

fun valueOf : (c, result:Class) -> (Instance c) ->

Attribute c result -> Instance result;

query : (c:Class) -> (args:ClassList) -> (ret:Class) ->

Instance c -> OperationQ c args ret -> InstList args ->

Instance ret;

The arguments to query are, in turn: the class of the object we want to query,a list of the classes of the arguments to the query, the return type of the query,the object we want to query, the query itself, and finally a list of the argumentsof the query. The result is an instance (an object) of the return type.

The linearization to OCL is fairly simple:

lin query _ _ ret obj op argsI =

dot1 obj (mkConstI (op.s ++ argsI.s ! brackets));

What happens here is that the list of arguments is linearized as a comma-separated list enclosed in parentheses (argsI.s ! brackets), then we put thename of the query (op.s) in front, and finally add the object we query and adot (dot1 ensures correct handling of precedence), so we end up with somethinglike obj.query(arg1, arg2, ...).

For the English linearization, we have the problem of having one categoryfor all queries, regardless of the number of arguments they depend on. Oursolution is to give a custom “natural” linearization of queries having up to threearguments (this applies to all query operations in Queue). For instance, thelinearization of the query getFirst produces “the first element of the queue”.For asSequence we take, as could be observed in Section 2, simply “the queue”.The implementation is based on the following:

43

Page 54: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

param Prep = To | At | Of | NoPrep;

lincat OperationQ = {s : QueryForm => Str;

preps : {pr1 : Prep; pr2 : Prep; pr3 : Prep}};

The idea is that the linearization of a query includes up to three prepositionswhich can be put between the first three arguments. If there are more than threearguments, these prepositions are ignored and we choose a more formal notationlike “query(arg1, arg2, . . . ) of the queue”.

4.3 Constraints

For handling OCL constraints (invariants, pre- and postconditions) we introducea category Constraint and various ways of constructing terms of this type. Thesimplest form of constraint is an invariant for a class:

cat Constraint;

fun invariant : (c:Class) -> (VarSelf c -> Instance Bool) ->

Constraint;

To construct an invariant we supply the class for which the invariant shouldhold: the first argument of invariant. We require a boolean expression (aterm of type Instance Bool) which represents the actual invariant property.An additional complication is that we want to be able to refer to (the cur-rent instance of) the class c in this boolean expression—in terms of OCL thismeans to use the variable self. This accounts for the type of the second argu-ment, VarSelf c -> Instance Bool, which can be thought of as a term of typeInstance Bool where we have access to a bound variable of type VarSelf c.This bound variable can only be used for one purpose: to form an expressionself of the correct type:

fun self : (c:Class) -> VarSelf c -> Instance c;

The linearization of invariant is simple, and we show the linearizations for bothOCL and English:

lin invariant c e = {s = "context"++c.s++"inv:"++e.s};

lin invariant c e = {s = ["the following invariant holds

for all"] ++ (c.s ! cn pl) ++ ":" ++ e.s} ;

Notice the choice of the plural form of a class: c.s ! cn pl produces, for ex-ample, “queues”, for Queue.

For formulating pre- and postconditions of an operation, we use the sametechnique employing bound variables. In this case one bound variable for eachargument of the operation is required, besides the ones for self and result.

4.4 The OCL Library and User Defined Classes

The grammar has to include all standard types (and their properties) of OCL.Just as an example, we show the Sequence type and some of its properties:

44

Page 55: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

fun Sequence : Class -> Class;

subSequence : (c:Class) -> Instance (Sequence c) ->

(a,b : Instance Integer) -> Instance (Sequence c);

seqConforms2coll : (c:Class) ->

Subtype (Sequence c) (Collection c);

The operations of Sequence (or any standard OCL type) are not terms of typeOperationQ, they are simply modelled as functions in GF. This is very conve-nient, but it also means that the grammar does not allow to express constraintsfor the standard OCL operations. User defined operations, however, must per-mit constraints, so they are defined using Operation and OperationQ. Here aresome operations of the class Queue from Section 2:

fun Queue : Class;

Queue_size : OperationQ Queue nilC Integer;

Queue_enqueue : Operation Queue (consC Integer nilC) Integer;

Note the use of the constructors nilC and consC to build lists of the types ofthe arguments to the operations.

5 Evaluation

5.1 Advantages

Our approach to building an authoring tool has a number of advantages for thedevelopment of requirements specifications:

Single Source Technology. Each element of a specification is kept only in oneversion: the annotated syntax tree of the abstract grammar. Concrete expressionsare generated from this on demand. In addition, edits made in one concreterepresentation, are reflected instantly in all others. This provides a solution to themaintenance and synchronization problems discussed in Section 1. The followingtwo items address the mapping problem:

Semantics. The rules of abstract and concrete GF grammars can be seen as aformal semantics for the languages they implement: for each pair of concrete lan-guages, they induce a function that gives to each expression its “meaning” in theother language. Working with the syntax directed editor, which displays abstractand concrete expressions simultaneously, makes it easy for users to develop anintuition for expressing requirements in different specification languages.

Extensibility. GF grammars constitute a declarative and fairly modular formal-ism to describe languages and the relationships among them. This makes itrelatively easy to adapt and extend our implementation.

These positive features rest mainly on the design principles of GF. From animplementor’s point of view, the GF base provides a number of additional ad-vantages. The fact that GF is designed as a framework is crucial:

Tools. GF provides a number of functionalities for each set of abstract and con-crete grammars as detailed in Section 3.3 and an interactive syntax directed

45

Page 56: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

editor coming with a GUI. In particular, we have a parser for full OCL incorpo-rating extensive semantic checks.

Development Style. The declarative way, in which knowledge about specific gram-mars is stored in GF, permits a modern, incremental style of development:rapid design-implementation-test cycles, addition of new features on demand,and availability of a working prototype almost from the start, are a big asset.

5.2 Limitations

GF gives a number of functionalities for free, so that applications can be builtsimply by writing GF grammars. The result, however, is not always what onewould expect from a production-quality system. Software built with GF is morelike a prototype that needs to be optimized (more accurately: the grammars canbe retained, but the framework-level algorithms must be extended). In the caseof specification authoring, we encountered the following limitations:

Parsing. The generic GF parsers are not optimized for parsing formal languageslike OCL, for which more efficient algorithms exist. More seriously, the parserhas to be customized to avoid the circularity problem due to instance coercions(Section 4.1).

Compositionality. Texts generated by GF have necessarily the same structureas the corresponding code. One would like to have methods to rephrase andsummarize specifications.

The need for grammars. All new, user-defined concepts (classes and their fea-tures) have to be defined in a GF grammar. It would be better to have thepossibility to create grammars dynamically from UML class diagrams given insome suitable format (for example, in the UML standard exchange format XMI[20]); this can be done in the same way as GF rules are generated from Alfadeclarations [5].

A general limitation, which is a problem for any natural-language interface, is:

Closedness. Only those expressions that are defined in the grammar are recog-nized by the parser.

This means a gap persists between formal specifications and informal legacyspecifications. One could imagine heuristic natural language processing methodsto rescue some of this material, but GF does not have such methods at present.

Finally, an obstacle to the applicability of syntax-directed editors for pro-gramming languages, for which special techniques are required [19], is the phe-nomenon that top-down development as enforced by stepwise refinement is usu-ally incompatible with the direction of the control flow. The latter, however, ismore natural from an implementor’s point of view. This problem does not arisein the context of specifications due to their declarative nature.

6 Related Work

We know of no natural-language interfaces to OCL, but there are some earlierefforts for specifications more generally: Holt and Klein [7] have a system for

46

Page 57: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

translating English hardware specifications to the temporal logic CTL; Coscoy,Kahn and Thery [3] have a translator from Coq code (which can express pro-grams, specifications, and proofs) into English and French. Both of these systemsfunction in batch mode, and they are unidirectional, whereas GF is interactiveand bidirectional. Power and Scott [16] have an interactive editor for multilin-gual software manuals, which functions much like GF (by linearizations from anunderlying abstract representation), but does not have a parser. In all these sys-tems, the grammar is coded directly in the program, whereas GF has a separategrammar formalism. The mentioned systems are fine-tuned for the purposes thatthey are used for, and hence produce or recognize more elegant and idiomaticlanguage. But they cannot be dynamically extended by new forms of expression.An idea from [3] that would fit nicely to GF is optimization by factorization:For example, x is even or odd is an optimization of x is even or x is odd.

The context free grammar of OCL 1.4 [15] is a concrete grammar, whichis not suitable as a basis for an abstract grammar for both OCL and English.Furthermore, it provides no notion of type correctness. A proposal for OCL 2.0[14] addresses these problems: both an abstract and a concrete grammar areincluded, as well as a mechanism for type correctness. However, these gram-mars are partly specified by metamodelling, in the sense that UML and OCLthemselves are used in the formal description of syntax and semantics. It is,therefore, not obvious how to construct a GF grammar directly based on theOCL 2.0 proposal.

A general architecture for UML/OCL toolsets including a parser and typechecker is suggested in [8], but informal specifications are not discussed there.

7 Future Work

Besides overcoming the limitations expressed in Section 5.2, we will concentrateon the following issues:

Integration. For our authoring tool to be practically useful, it must be tightlyintegrated with mainstream software development tools. In the KeY project [1]a design methodology plus CASE-tool is developed that allows seamless integra-tion of object-oriented modeling (OOM) with program development, generationof formal specifications, as well as formal verification of code and specifications.The KeY development system is based on a commercial UML-CASE tool forOOM. Following, for example, [8] we will integrate our tool into KeY and, hence,into the CASE tool underlying KeY. Users of the CASE tool will be able to useour authoring tool regardless of whether they want to do formal reasoning.

Stylistic Improvements. To improve the style of texts, we plan to use techniqueslike factorization [3] and pronominalization [17], which can be justified by type-checked definitions inside GF grammars. To some extent, such improvementscan be even automatized. However, one should not underestimate the difficultyof this problem: it is essentially the same problem as taking a piece of low-levelcode and restructuring it into high-level code.

47

Page 58: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

More and Larger Case Studies. We started to author a combined natural lan-guage/OCL requirements specification of the API of the Java Collections Frame-work based on textual specifications found there.

Further Languages. It is well-known how to develop concrete grammars for othernatural languages than English. Support for further formal specification langagesbesides OCL might require changes in the abstract grammar or could even implyto shift information from the abstract to the concrete grammars. It will beinteresting to see how one can accomodate languages such as Alloy or JML.

Improve Usability. The usability of the current tool can be improved in variousways: the first are obvious improvements of the GUI such as context sensitivepop-up menus, powerful pretty-printing, active expression highlighting, context-sensitive help, etc. A conceptually more sophisticated idea is to enrich the ab-stract grammar with rules that provide further templates for frequently requiredkinds of constraints. For example, a non-terminal memberDeleted could guidethe user in writing a proper postcondition specifying that a member was deletedfrom a collection object. This amounts to encoding pragmatics into the grammar.

Increase Portability. The GUI of GF’s syntax editor is written with the HaskellFudgets library [2]. We plan to port it to Java. This is compatible with the KeYsystem, which is written entirely in Java.

8 Conclusion

We described theoretical foundations, design principles, and implementation ofa tool that supports authoring of informal and formal software requirementsspecifications. Our research is motivated by the gap between completely infor-mal specifications and formal ones, while usage of the latter is becoming morewidespread. Our tool supports development of formal specifications in OCL: itfeatures (i) a syntax-directed editor with (ii) templates for frequently needed el-ements of specifications and (iii) a single source for formal/informal documents;in addition, (iv) parsers, (v) linearizers for OCL and a fragment of English, and(vi) a translator between them are obtained.

The implementation is based on a logico-linguistic framework anchored intype theory. This yields a formal semantics, separation of concrete and abstractsyntax, separation of declarative knowledge and algorithms. It makes the systemeasy to extend and to modify.

In summary, we think that our approach is a good basis to meet the challengesin creating formal specifications outlined in the introduction.

Acknowledgements

We would like to thank Wojciech Mostowski and Bengt Nordstrom for the carefulreading of a draft of this paper, for pointing out inaccuracies, and for suggestionsto improve the paper.

48

Page 59: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

References

1. W. Ahrendt, T. Baar, B. Beckert, M. Giese, E. Habermalz, R. Hahnle, W. Menzel,and P. H. Schmitt. The KeY approach: Integrating object oriented design andformal verification. In M. Ojeda-Aciego, I. P. de Guzman, G. Brewka, and L. M.Pereira, editors, Proc. JELIA, LNAI 1919, pages 21–36. Springer, 2000.

2. M. Carlsson and T. Hallgren. Fudgets—Purely Functional Processes with applica-tions to Graphical User Interfaces. PhD thesis, Department of Computing Science,Chalmers University of Technology, 1998.

3. Y. Coscoy, G. Kahn, and L. Thery. Extracting text from proofs. In M. Dezani-Ciancaglini and G. Plotkin, editors, Proc. Second Int. Conf. on Typed LambdaCalculi and Applications, volume 902 of LNCS, pages 109–123, 1995.

4. M. Dymetman, V. Lux, and A. Ranta. XML and multilingual document authoring:Convergent trends. In COLING, Saarbrucken, Germany, pages 243–249, 2000.

5. T. Hallgren and A. Ranta. An extensible proof text editor. In M. Parigot andA. Voronkov, editors, Logic for Programming and Automated Reasoning, LPAR,LNAI 1955, pages 70–84. Springer, 2000.

6. R. Harper, F. Honsell, and G. Plotkin. A framework for defining logics. JACM,40(1):143–184, 1993.

7. A. Holt and E. Klein. A semantically-derived subset of English for hardwareverification. In Proc. Ann. Meeting Ass. for Comp. Ling., pages 451–456, 1999.

8. H. Hussmann, B. Demuth, and F. Finger. Modular architecture for a toolsetsupporting OCL. In A. Evans, S. Kent, and B. Selic, editors, Proc. 3rd Int. Conf.on the Unified Modeling Language, LNCS 1939, pages 278–293. Springer, 2000.

9. D. Jackson. Alloy: A lightweight object modelling notation. sdg.lcs.mit.edu/

~dnj/pubs/alloy-journal.pdf, July 2000.10. R. Kramer. iContract—the Java Designs by Contract tool. In Proc. Technology of

OO Languages and Systems, TOOLS 26. IEEE CS Press, Los Alamitos, 1998.11. G. T. Leavens, A. L. Baker, and C. Ruby. Preliminary design of JML: A behavioral

interface specification language for Java. Technical Report 98-06i, Iowa State Univ.,Dept. of Computer Science, Feb. 2000.

12. K. R. M. Leino, G. Nelson, and J. B. Saxe. ESC/Java user’s manual. TechnicalNote #2000-002, Compaq Systems Research Center, Palo Alto, USA, May 2000.

13. B. Nordstrom, K. Petersson, and J. M. Smith. Martin-lof’s type theory. In S. Abra-masky, D. Gabbay, and T. Maibaum, editors, Handbook of Logic in Computer Sci-ence, volume 5. Oxford University Press, 2000.

14. Object Modeling Group. Response to the UML 2.0 OCL RfP, Aug. 2001. cgi.

omg.org/cgi-bin/doc?ad/01-08-01.15. Object Modeling Group. Unified Modelling Language Specification, version 1.4,

Sept. 2001. www.omg.org/cgi-bin/doc?formal/01-09-67.16. R. Power and D. Scott. Multilingual authoring using feedback texts. In COLING-

ACL 98, Montreal, Canada, 1998.17. A. Ranta. Type Theoretical Grammar. Oxford University Press, 1994.18. A. Ranta. Grammatical framework homepage, 2000. www.cs.chalmers.se/

~aarne/GF/index.html.19. T. Teitelbaum and T. Reps. The Cornell program synthesizer: a syntax-directed

programming environment. CACM, 24(9):563–573, 1981.20. Unisys Corp. et al. XML Metadata Interchange (XMI), Oct. 1998. ftp://ftp.

omg.org/pub/docs/ad/98-10-05.pdf.21. J. Warmer and A. Kleppe. The Object Constraint Language: Precise Modelling

with UML. Addison-Wesley, 1999.

49

Page 60: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

50

Page 61: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Paper 3.Disambiguating ImplicitConstructions in OCLKristofer Johannisson. Disambiguating Implicit Constructions in OCL. Ac-cepted to the OCL and Model Driven Engineering Workshop, at the UML con-ference in Lisbon, 2004. Online proceedings at http://www.cs.kent.ac.uk/projects/ocl/oclmdewsuml04/description.htm.

Page 62: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement
Page 63: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Disambiguating Implicit Constructions in OCL

Kristofer Johannisson

Department of Computing ScienceChalmers University of Technology and Goteborg University

S-41296 Goteborg, Sweden,[email protected]

Abstract. A rule system for type checking and semantic annotationof OCL is presented. Its main feature is the semantic annotation anddisambiguation of syntax trees provided by an OCL parser, in particularfor implicit property calls and implicit bound variables. It is intended asa component to be plugged in to other systems which handle OCL. Animplementation of the system is available.

1 Introduction

A suitable structure for a computer program which handles the Object Con-straint Language (OCL [7, 8]) is that of a compiler (cf. [10]) with at least threecomponents:

1. a component which parses OCL specifications into syntax trees.2. a component which performs some kind of semantic analysis of OCL syntax

trees, e.g. type checking.3. a component which does something interesting with the result of the seman-

tical analysis, e.g. transforms it into a proof obligation in Dynamic Logic tobe fed into a theorem prover [1], or generates assertions to be inserted intoJava source code [10].

In this paper we focus on part 2: we present a rule system for type checkingand semantic annotation of OCL. The input to the system is a syntax tree froman OCL parser, and a representation of the user UML model. The result is asyntax tree annotated with type information and other semantic distinctionswhich disambiguates the output of the parser. This makes the job of part 3easier than if it would have to work directly with the parser output.

The disambiguation concerns in particular the “property call” syntactic struc-ture, the semantics of which has many special cases: calls to properties definedin the user UML model or the OCL library, variable binding constructions (e.g.forAll or collect), and meta-level constructions (e.g. allInstances). With re-spect to property calls one must also consider various implicit or special forms:self can be left out, variable bindings can be left out, collect can be left out,and the return type of associations of multiplicity one (as opposed to other mul-tiplicities) can be considered to be a collection type or a basic type. All these

53

Page 64: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

cases share the same basic syntactic structure, which motivates a disambiguatingstep between part 1 and part 3.

Our system was originally developed as a part of a project for linking OCLspecifications to informal specifications in natural language [9], which is in turna part of the KeY project [1]. In this context, part 3 would be the translation tonatural language. However, we here present our work as a standalone system.

1.1 Paper Outline

Some background, including related work, is given in Section 2. Then we definethe language of annotated OCL in Section 3. Section 4 describes the annotatingrule system itself. This is followed by brief discussion of the implementation inSection 5. Section 6 concludes and gives directions for future work. We assumethat the reader is reasonably familiar with the OCL specification, version 1.x or2.0 [7, 8].

2 Background

2.1 Implicit Features of OCL

In this paper we will use the term “implicit” for various features of OCL. We willmake this use precise by the definition of annotated OCL and the annotatingrules in the sections following, but we give some informal examples below to getstarted:

Implicit self refers to property calls to self where self has been left out ([7]Sect. 6.3.3). E.g. attr is an implicit form of self.attr.

Implicit bound variables refers to the use of variable binding collection op-erations of the OCL library where one does not explicitly bind variables([7] Sect. 6.6.1). E.g. collection->collect(attr) is an implicit form ofcollection->collect(x|x.attr).

Implicit property calls refers to property calls where either self or an im-plicit bound variable is “left out” ([7] Sect. 6.6.7). E.g. collection->collect(attr) is an implicit form of either collection->collect(self.attr) or collection->collect(x|x.attr) (and is, as notedin [7], potentially semantically ambiguous).

Implicit collect means that collect has been left out ([7] Sect. 6.6.2.1). E.g.collection.attr is an implicit form of collection->collect(x|x.attr).

2.2 Related Work

Semantic analysis of OCL is of course in some sense performed in any OCLtool. Our rule system is in particular inspired by [4] and [3], which define OCLtype derivation systems and operational semantics, and also prove various resultssuch as subject reduction and type uniqueness. Of the two, the more recent [3]

54

Page 65: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

handles more OCL constructions, and it also gives comparisons to many othersystems.

In comparison to [3] we do not provide an operational semantics or proofsabout our system. Our perspective is more that of giving a formal descriptionof a concrete implementation which has to handle “real” OCL files. This meansthat we in our system must include constructions such as implicit self, implicitbound variables and collection operations which can be expressed using iterate— which are not included in [3]. Also, in contrast to our system, [3] does notemploy any disambiguating annotations besides types. For instance, it handlesimplicit collect by normalization, which means an annotated implicit collectexpression cannot be easily distinguished from an annotated expression usingexplicit collect.

2.3 From OCL to Natural Language

As a part of ongoing work of linking formal specifications to informal ones [9],we wish to translate OCL into Natural Language (NL), e.g. English or German.

The system in [9] is based on grammars in the Grammatical Framework (GF)formalism [12]. These grammars give an abstract syntax of specifications, whichcan be presented in NL (English and German) as well as OCL.

GF grammars are written in the same style as programs in typed functionallanguages. In our GF grammars, there are types and functions correspondingto the classes and properties from the OCL library and user UML model. Thegrammars have to be dynamically extended with new types and functions foreach new user UML model. Since there is no subtyping in GF, we use explicit co-ercions (typecasts). Variable binding constructions are modelled as higher orderfunctions.

In other words, the GF syntax trees provide a typed and semantic repre-sentation of OCL specifications. Now, if we want to transform the syntax treesprovided by a standard OCL parser into GF trees, it makes sense to do this ina modular way as described in the introduction: First we add types and disam-biguating annotations to the syntax trees from the parser. Then it will be fairlystraightforward to transform annotated syntax trees into GF syntax trees.

Although the system we present in this paper was originally a module in theOCL to NL translation, we think that it is general enough to be useful for otherpurposes, and therefore to be presented as something of its own.

3 Annotated OCL

We will give a grammar for unannotated OCL, based on OCL version 1.5 [7], andthen define annotated OCL by extending this grammar — adding new categories,and new rules to existing categories.

There are a number of OCL 1.5 features that we do not support: replacingself with a named variable, named constraints, the def stereotype (i.e. “globallet-definitions”), the types OclExpression and OclState, qualified associations,

55

Page 66: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

associations classes, enumerations, and implicit flattening of collections. Themain reason for these limitations, and for not supporting the new features inOCL 2.0, is practical: our system is intended as a part of the KeY [1] system,which has no yet moved beyond OCL 1.5.

In comparison to [3], the most important OCL features we add are implicitproperty calls and implicit bound variables. We also handle packages as well aspre- and postconditions, but we will not discuss them in this paper. On the otherhand, [3] includes features we do not support, e.g. OCL 2.0 tuples and undefinedvalues.

3.1 Unannotated OCL

The grammar for unannotated OCL, which describes the syntax trees providedby the OCL parser (for the sake of presentation the grammar used in the actualimplemented parser has been somewhat simplified here).

OclPackage ::= package PathName Constraint {Constraint} endpackagePathName ::= Ident{:: Ident}Constraint ::= context Context ConstrBody {ConstrBody}Context ::= Ident | Ident :: Ident ([FormalParam {,FormalParam}])

[: Class]FormalParam ::= Ident : ClassClass ::= PathName | CollKind(Class)CollKind ::= Collection | Set | Bag | SequenceConstrBody ::= (inv|pre|post): [LetExp {LetExp} in] ExprLetExp ::= let Ident [([FormalParam {,FormalParam}])] [: Class]

= ExprExpr ::= Expr InfixOp Expr |

PrefixOp Expr |Literal |if Expr then Expr else Expr endif |PropCall |Expr ApplOper PropCall

Literal ::: IntLit | RealLit | StringLit | true | false |CollKind {[CollItem{,CollItem}]}

CollItem ::= Expr | Expr .. ExprInfixOp ::= + | - | / | * | = | <> | < | > | <= | >=PrefixOp ::= - | notApplOper ::= . | ->PropCall ::= PathName[@pre][PropCallParams]PropCallParams ::= ([Declarator |][Expr{,Expr}])Declarator ::= Ident{,Ident}[:PathName][;Ident:Class=Expr]

We leave the categories IntLit, RealLit and StringLit abstract. We will focus ourdisambiguation efforts on the implicit property call rule Expr ::= PropCall, andon the PropCall and PropCallParams categories.

56

Page 67: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

3.2 Types

We will use the Class category to annotate expressions with their types. Wealso want to annotate properties (or property calls) with their types, and wetherefore add a category PropType:

PropType ::= TA→ T | T

Q→ {T →} T | TSt.→ T

A property can be either an attribute (or association), a query, or a singleton(of multiplicity one) association . Consider the properties in this example classdiagram:

Personage : IntegerisFemale : Booleanname() : String {query}income(Date) : Integer {query}

mother1

child0..*

These properties would be typed as follows:

age : Person A→ Integer

isFemale : Person A→ Boolean

name : PersonQ→ String

income : PersonQ→ Date → Integer

child : Person A→ Set(Person)

mother : Person St.→ Person

3.3 Annotating Expressions

Aside from annotating an expression with its type, we also want to disambiguatethe rule Expr ::= PropCall in the unannotated grammar. This syntactic struc-ture could be a variable, a class literal, or an implicit property call to self or animplicit bound variable. We also introduce annotation for a singleton associationconsidered as a set.

Expr ::= Expr : Class |Var(Ident) |ClassLit(Class) |implicit((self | Ident),PropCall) |{Expr}1

We use a bold font to distinguish semantic annotations from unannotated OCL.We use underline for indicating implicit bound variables.

57

Page 68: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

We will also annotate expressions with explicit coercions (typecasts) by thefollowing rule:

Expr ::= [Expr]ClassClass

If expression e has type T1, and we know that T1 is a subtype of T2, then [e]T1T2

represent the expression e coerced (upcasted) into type T2. We will slightly abusenotation by considering [e]T1

T2to be the same as just e when T1 is the same class

as T2.

3.4 Annotating Property Calls

Property calls will be annotated with their types, i.e. with a PropType. Wealso make some disambiguation: a property call can be a “normal” property call(in which case we use no annotation except the PropType), a variable bindingoperation (e.g. iterate or select) or an implicit collect:

PropCall ::= PropCall : PropType |(iterate | forAll | exists | one | isUnique | any | sortedBy |select | reject | collect) [@pre] PropCallParams |implCollect(PropCall)

In the unannotated grammar, there is no reserved word for the OCL libraryiterate construction. The identifier iterate could refer either to the OCLlibrary construction, or to a user-defined property. In the annotated grammar,iterate will refer to the OCL library construction, and iterate to a user-definedproperty (if there is such a property in the UML model).

We also extend Declarator to enable implicit binding of variables (again usingunderlining to distinguish explicit from implicit binding):

Declarator ::= Ident : Class

3.5 Annotated Example

This unannotated OCL constraint says that the age of a person must be non-negative:

context Person inv: age >= 0

Using the rule system in Sect. 4, we add type annotations on the property ageand the integer literal 0. Also, we disambiguate the property call age by animplicit self annotation. Finally, since age and 0 both have type Integer,but the comparison operator >= works on the supertype Real, we add explicitcoercions from Integer to Real:

context Person inv:

[implicit(self : Person, age : Person A→ Integer) : Integer]IntegerReal>=

[0 : Integer]IntegerReal

58

Page 69: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Since we add annotation in almost every node of the syntax tree, even thissmall example becomes unwieldy when annotated. Of course, the annotatedsyntax is intended as input to programs, not as a convenient notation to be reador written manually.

3.6 Semantics of Annotated OCL

While we have not defined a formal semantics for our annotated OCL language,we hope that the informal semantics is clear (as far as the semantics in [7, 8] isclear). However, one way of achieving a formal semantics would be to drop theannotations in a systematic way, with the goal of ending up in the unannotatedOCL fragment of [3], which has an operational semantics. From this perspective,we have then given a semantics to the OCL fragment of [3] extended with implicitproperty calls and implicit bound variables.

An informal outline of how to de-annotate annotated OCL is the following:

– the type annotations of Expr with Class and PropCall with PropType aresimply dropped, and so are explicit coercions

– the annotations implCollect and { . . . }1 are also just dropped, since im-plicit collect and singleton associations are handled by [3]

– Var and ClassLit expressions should be transformed into their counterpartin [3]

– The implicit annotation has to be normalized. E.g. implicit(self, age)should be changed into self.age, and implicit(x, age) into x.age.

– iterate is just changed into iterate. The other variable-binding collectionoperations (forAll, exists, . . . ) have to be rewritten in terms of iterate,according to the definitions in [7, 8].

– Implicit bound variables and declarators are made into normal variables anddeclarators (i.e. just remove the underlining). Before doing this the implicitbound variables should be renamed to avoid name clashes.

Essentially we remove all annotations except the ones for implicit property callsand implicit bound variables — which are not handled by [3]. These are insteadreplaced with the corresponding “explicit” constructions.

4 Rule System

The task of the rule system is to take a syntax tree produced by an OCL parser,typecheck it, and annotate it with semantic information. To do this, informationabout the user UML model is also required. Inspired by [3], the system will usejudgements of the form

E ` t . t′

where E is an environment, t is a syntax tree, and t′ is an typechecked, annotatedsyntax tree. Note that the rule system annotates implicit constructions, it doesnot replace them with explicit ones, as is done with e.g. implicit collect in [3].

59

Page 70: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

We do not give a complete and fully formal description of our system — wefocus mostly on the the disambiguation of constructions involving PropCall.

In the rules we will use variables ranging as follows: x, y, i ∈ Ident, e ∈Expr, T ∈ Class,PT ∈ PropType, p ∈ PathName, C ∈ CollKind

4.1 Environments

An environment may contain a signature and a context. Depending on the syn-tactic category we are annotating, there may also be other components. A sig-nature Σ contains class names, operations, properties, and a subtyping relation,i.e. information from the user UML model and the definition of the OCL libraryclasses. A context Γ contains typing of variables and properties defined in let-definitions, and the (name of) the current package. We omit the many details ofhow theories and contexts are represented, but use the constructions below toaccess and update their respective components:

T1<:ΣT2 type T1 conforms to T2 in signature ΣtΣ{T1, . . . , Tn} the least common supertype of types T1, . . . , Tn in ΣT ∈ classesΣ T is a class in Σ(x : T ) ∈ Γ x has type T in context ΓΓ, (x : T ) update type of x in Γ to TpackageΓ current package in ΓSince a context Γ contains the current package, we always assume that an

unqualified, non-OCL-library class T belongs to the current package in a lookup(x : T ) ∈ Γ or update Γ, (x : T ).

To find out what property an identifier refers to in a given environment, weuse the following partial functions ([Class] is the type of lists of Class):

lookupAttrΣ,Γ : Ident → Class → PropTypelookupPropΣ,Γ : Ident → Class → [Class] → PropType

Again, we omit all details on how these functions are defined, but note that theyare partial: there might not be a matching property for a given identifier, or theresult might be ambiguous (e.g. in case of multiple inheritance).

We define two functions recType and retType on PropType:

recType (T1 ( A→ | Q→ | St.→) . . . T2) = T1

retType (T1 ( A→ | Q→ | St.→) . . . T2) = T2

4.2 Property Call Parameters

We go bottom-up and start with the category PropCallParams, where we handlevariable bindings. The environment Σ;Γ ;T ;B consists of signature and context,a type T , and a boolean B. T is used for typing bound variables (if there areany), if B is true then we insert an implicit bound variable in case there are noexplicit ones.

60

Page 71: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

If there are no bound variables, and B is false, then we just annotate theparameters:

Σ;Γ ` e1 . e′1 : T1 · · · Σ;Γ ` e1 . e′

n : Tn

Σ;Γ ;T ; False ` (e1, . . . , en) . (e′1 : T1, . . . , e

′n : Tn)

When there is a declarator (or an implicit bound variable), there must beexactly one parameter. There may or may not be typings on the bound variables.

Σ;Γ, (x1 : T ), . . . , (xn : T ) ` e . e′ : T1

Σ;Γ ;C(T );B ` (x1, . . . , xn|e) . (x1, . . . , xn|e′ : T1)

T<:ΣT ′

Σ;Γ, (x1 : T ′), . . . , (xn : T ′) ` e . e′ : T1

Σ;Γ ;C(T );B ` (x1, . . . , xn : T ′|e) . (x1, . . . , xn|e′ : T1)

Σ;Γ, (x : T ) ` e . e′ : T1

Σ;Γ ;C(T ); True ` (e) . (x : T| e′ : T1)

Side condition: x is fresh.If there is an accumulator, there must be exactly one bound variable (because

accumulators are only used with iterate):

Σ;Γ ` ey . e′y : T2

Σ ` T2<:ΣTy

Σ;Γ, (x : T ), (y : Ty) ` e . e′ : T1

Σ;Γ ;C(T );B ` (x; y : Ty = ey|e) . (x; y : Ty = [e′y : T2]T2

Ty|e′ : T1)

T<:ΣT ′

Σ;Γ ` ey . e′y : T2

Σ ` T2<:ΣTy

Σ;Γ, (x : T ′), (y : Ty) ` e . e′ : T1

Σ;Γ ;C(T );B ` (x : T ′; y : Ty = ey|e) . (x : T ′; y : Ty = [e′y : T2]T2

Ty|e′ : T1)

4.3 Property Calls

A property call (PropCall) pc is annotated in an environment Σ;Γ ;T ; e; (.|->),meaning that it occurred in an expression context e(.|->)pc where e : T . Accord-ing to the grammar, every property has an optional @pre. This does not affectthe typing or annotation, however, so in this section we give all rules without@pre.

We distinguish between normal property calls (attributes, associations andproperties from the user UML model or the OCL library), variable binding

61

Page 72: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

constructions (e.g. iterate), implicit collect, and meta-level operations (e.g.allInstances). All property calls are annotated with their PropType, variablebinding constructions and implicit collect are also annotated as special con-structions. The variable binding constructions could be seen as higher orderfunctions, but in the PropType annotations we ignore this. We have chosen notto give the meta-level operations special annotation, but they do require specialrules for type checking.

In variable binding constructions, there may or may not be variables in theDeclarator, and the variables may or may not have typings. These cases arehandled in the annotation of PropCallParams above, and make no difference inthe PropCall rules of this section. Therefore, we give the rules only for whenthere are variables explicitly given, and the variables have typings.

Attributes/associations

lookupΣ;Γ (a, T ) = T1A→ T2

Σ;Γ ;T ; e; . ` a . a : T1A→ T2

For singleton associations we have the same rule except that A→ is replaced withSt.→.

Queries

Σ;Γ ;T ; False ` (e1, . . . , en) . (e′1 : T1, . . . , e

′n : Tn)

lookupΣ;Γ (q, T, [T1, . . . , Tn]) = T ′ Q→ T ′1 → · · · → T ′

n+1

Σ;Γ ;T ; e; ao ` q(e1, . . . , en) .

q([e′1 : T1]T1

T ′1, . . . , [e′

n : Tn]Tn

T ′n) : T ′ Q→ T ′

1 → · · · → T ′n+1

Note that n might be 0 here.

Iterate

Σ;Γ ; Collection(T); False ` (x:T1;acc:T2=e2|e3) . (x:T1;acc:T2=e′2|e

′3)

Σ;Γ ; Collection(T); e1; -> ` iterate(x:T1;acc:T2=e2|e3) .

iterate(x:T1;acc:T2=e′2|e

′3) : Collection(T)

Q→ T2 → T2

Other variable binding collection operations The rules for forAll, exists,one, isUnique, any, sortedBy, select, reject and collect are quite similar.We here give the ones for forAll and collect to exemplify.

In the informal descriptions in [7] (Sect. 6.6.3) and [8] (Sect. 7.6.3) it is explic-itly said that for forAll, binding of several variables at once in the declarator is

62

Page 73: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

allowed, and there is no mention that other operations would allow it. This is ap-parently contradicted by [8] (Sect. 11.9) which lists certain constructions as onlyallowing one bound variable (the others would then implicitly allow several).

Σ;Γ ; Collection(T); True ` (x1, . . . , xn : T1|e2) . (x1, . . . , xn : T1|e′2 : T2)

Σ ` T2<:ΣBoolean

Σ;Γ ; Collection(T); e1; -> ` forAll(x1, . . . , xn : T1|e2) .

forAll(x1, . . . , xn : T1|[e′2 : T2]T2

Boolean) :

Collection(T)Q→ Boolean → Boolean

The collect construction is defined for Set, Bag, and Sequence, but not forCollection, so in this rule C ∈ {Set, Bag, Sequence}. If C is Sequence, D isSequence, otherwise D is Bag.

Σ;Γ ;C(T ); True ` (x : T1|e2) . (x : T1|e′2 : T2)

Σ;Γ ;C(T ); e1; -> ` collect(x : T1|e2) .

collect(x : T1|e′2 : T2) : C(T )

Q→ T2 → D(T2)

Implicit collect An expression of collection type followed by a dot and aproperty call might be an implicit collect property call. As with collect,C ∈ {Set, Bag, Sequence}:

Σ;Γ ;C(T ); e1; -> ` collect(pc) . collect(i : T|i.pc′) : T2

Σ;Γ ;C(T ); e1; . ` pc . implCollect(pc′) : T2

Meta-level operations The meta-level operations involve OclType and classliterals.

Σ;Γ ; OclAny; False ` (e1) . (ClassLit(T ) : OclType)

Σ;Γ ; OclAny; e; . ` oclAsType(e1) .

oclAsType(ClassLit(T ) : OclType) : OclAnyQ→ OclType → T

Σ;Γ ; OclType;ClassLit(T ); . ` allInstances() .

allInstances() : OclTypeQ→ Set(T)

Σ;Γ ; OclType; e; . ` allSupertypes() .

allSupertypes() : OclTypeQ→ Set(OclType)

4.4 Expressions: Implicit property calls

The rule Expr ::= PropCall has to be disambiguated. It represents either avariable, a class literal or an implicit property call.

63

Page 74: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Variables Mark variables as being variables.

x : T ∈ ΓΣ;Γ ` x . Var(x) : T

Class literals Mark class literals as being class literals.

packageΓ = pp::i ∈ classesΣ

Σ;Γ ` i . ClassLit(i) : OclType

i1:: . . . ::in ∈ classesΣ

Σ;Γ ` i1:: . . . ::in . ClassLit(i1:: . . . ::in) : OclType

Implicit property call to self or bound variables Here we try putting aself or an implicit bound variable in front of the property call. Implicit propertycalls can be ambiguous if self and an implicit bound variable (or two implicitbound variables) conform to the same type, in which case typechecking shouldfail. This is expressed as side-conditions on the two rules.

Σ;Γ ` self.pc . e.pc′ : T

Σ;Γ ` pc . implicit(self, pc′) : T

Side condition: There are no x, pc′, T such that Σ;Γ ` pc . implicit(x, pc′) : T .We assume here that implicit bound variables have non-collection types,

hence we can safely use a dot (and not an arrow) when annotating the theproperty call.

x : T1 ∈ ΓΣ;Γ ` x.pc . e.pc′ : T2

Σ;Γ ` pc . implicit(x, pc′) : T2

Side conditions: There are no pc′, T such that Σ;Γ ` pc . implicit(self, pc′) : T ,and there is exactly one x ∈ Γ such that there exists e, pc′, T such that Σ;Γ `x.pc . e.pc′ : T

4.5 Expressions: Explicit property calls

We did annotation of the category PropCall separately in in Sect. 4.3. This meansthat we annotate an explicit property call expression e(.|->)pc by annotatingthe property call pc, and then explicitly coercing e into the correct type. Notethat when annotating pc in the context e(.|->)pc, the environment is extendedwith the type of e, e itself, and also the . or ->. Here ao ∈ {., ->}.

Σ;Γ ` e . e′ : TΣ;Γ ;T, e′, ao ` pc . pc′ : PT

Σ;Γ ` e ao pc . [e′ : T ]TrecType(PT) ao pc′ : retType(PT )

64

Page 75: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Singleton associations The only special case is e->pc when e is a singletonassociation property call, in which case we need to treat e as a singleton set(a, a′ ∈ PropCall):

Σ;Γ ` e . e′ : Te′ = e′′.a or e′ = implicit(e′′, a)

a = a′ : T1St.→ T2

Σ;Γ ; Set(T); {e′}1, -> ` pc . pc′ : PT

Σ;Γ ` e -> pc . [{e′}1 : Set(T)]Set(T)recType(PT) -> pc′ : retType(PT )

In contrast to [3], we do not allow any non-collection expression to be con-sidered as a set, but only direct property calls to singleton associations.

4.6 Expressions: other cases

The rules for infix, prefix, literal and if-then-else expression should be unsurpris-ing. We give only the rule for if-then-else here, as an example.

Σ;Γ ` e1 . e′1 : T1

Σ ` T1<:ΣBooleanΣ;Γ ` e2 . e′

2 : T2

Σ;Γ ` e3 . e′3 : T3

tΣ{T2, T3} = T4

Σ;Γ ` if e1 then e2 else e3 endif .

if [e′1 : T1]T1

Boolean then [e′2 : T2]T2

T4else [e′

3 : T3]T3T4

endif : T4

4.7 Other Categories

We do not present the rules for the categories LetExp, ConstrBody, Constraint orOclPackage. These rules mainly deal with adding let-definitions, self, result,formal parameters and the name of the current package to the context.

4.8 KeY extensions

Our system is developed as part of the KeY system [1], in which OCL is used forspecifications of JavaCard programs. To handle the JavaCard concepts of nullvalues and exceptions, the constructions null and excThrown are used in KeYOCL specifications.

We add a class Null to the OCL library package, an expression null oftype Null, and let Null be a subtype of all other types. We consider null as areserved word in both unannotated and annotated OCL.

Σ;Γ ` null . null : Null

Null and null seem to work just as Void and undef in [3].

65

Page 76: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

We handle excThrown as a new property of the OCL library class OclAny, tak-ing an argument of type OclType, and returning Boolean. Assuming a user UMLmodel containing JavaCard packages, excThrown(java::lang::Exception) canbe used as a predicate stating that an exception of type java::lang::Exceptionhas been thrown. This is an implicit self property call, and since self alwaysconforms to OclAny it can be used in any context. We use the following rule forthe property call excThrown:

Σ;Γ ` e1 . ClassLit(T1) : OclTypeΣ ` T1<:Σjava::lang::Exception

Σ;Γ ;T ; e; . ` excThrown(e1).

excThrown(ClassLit(T1) : OclType) : OclAnyQ→ OclType → Boolean

5 Implementation

The system for annotating OCL syntax trees is implemented in the functionallanguage Haskell, hence it can be used as a component of Haskell programs.We also provide a command line interface taking as input two text files: onecontaining OCL and another containing the UML model (for which we havedefined a simple custom format). The output is a textual representation of thetypechecked and annotated OCL file, or a message giving a type error.

The textual representation used in the output of the command line interfaceis by default normal OCL syntax, but extended with our annotations. To useour implementation with another, non-Haskell program, a format which is moresuitable for parsing might be preferred. We have defined an example of such asimple interchange format, which is supported by the command line tool and forwhich we provide a Java parser.

The parsers used in our system are generated from context free grammarsusing the BNF converter (BNFC, [6]), a front-end to standard lexer and parsergenerators for Haskell, Java, C++, and C. The grammar used for generatingthe OCL parser is based on the grammars given in [7] and [5]. Using the LALRgrammar in [2] instead is being investigated.

Using BNFC makes it simple to experiment with e.g. adjustments to theOCL grammar, or to define tailor-made interchange formats for the commandline interface between our Haskell based system and other systems in Java, C++or C.

The implementation is available on the web [11].

6 Conclusion

We have presented a rule system for type checking and semantic annotation ofOCL, with disambiguation of implicit property calls and implicit bound vari-ables as the main feature. It is intended to function as a component of a largersystem, sitting in between an OCL parser and some other component performing

66

Page 77: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

transformations on OCL specifications. So far it has only been tested in ongoingwork in the KeY project [1] with translating OCL specifications to Natural Lan-guage. Besides adding features of OCL 1.5 and 2.0 currently missing from thesystem, this is the most important line of future work: to try to link it to otherparts of the KeY project which handle OCL, e.g. the transformation of OCL toDynamic Logic, and ongoing work with partial evaluation of OCL.

Acknowledgements: We thank Aarne Ranta for discussions on drafts of thispaper, Daniel Larsson for feedback on the implementation, and the anonymousreviewers for suggestions on improving the paper.

References

1. Wolfgang Ahrendt, Thomas Baar, Bernhard Beckert, Richard Bubel, Martin Giese,Reiner Hahnle, Wolfram Menzel, Wojciech Mostowski, Andreas Roth, SteffenSchlager, and Peter H. Schmitt. The KeY tool. Software and System Modeling,2004. Online First issue, to appear in print.

2. David Akehurst and Octavian Patrascoiu. OCL: Implementing the Standard. InOCL2.0-”Industry standard or scientific playground?” - Proceedings of the UML’03workshop, page 19. Electronic Notes in Theoretical Computer Science, November2003.

3. Marıa Victoria Cengarle and Alexander Knapp. Ocl 1.4/1.5 vs. 2.0 expressions:Formal semantics and expressiveness. In Softw. Syst. Model., 2004.

4. Tony Clark. Type checking UML static diagrams. In Robert France and BernhardRumpe, editors, UML’99—The Unified Modeling Language. Beyond the Standard.Second International Conference, Fort Collins, CO, USA, October 28-30. 1999,Proceedings, volume 1723 of LNCS, pages 503–517. Springer, 1999.

5. Frank Finger. Dresden OCL toolkit homepage, 2002. http://dresden-ocl.

sourceforge.net/.6. Markus Forsberg and Aarne Ranta. The BNF converter: A high-level tool for

implementing well-behaved programming languages. In NWPT’02 proceedings,Proceedings of the Estonian Academy of Sciences, 2003.

7. Object Managment Group. OCL 1.5 specification, 2003. http://www.omg.org/

cgi-bin/apps/doc?formal/03-03-13.pdf.8. Object Managment Group. OCL 2.0 specification, 2003. http://www.omg.org/

cgi-bin/apps/doc?ptc/03-10-14.pdf.9. Reiner Hahnle, Kristofer Johannisson, and Aarne Ranta. An authoring tool for

informal and formal requirements specifications. In R.-D. Kutsche and H. Weber,editors, Fundamental Approaches to Software Engineering, number 2306 in LNCS,2002.

10. Heinrich Hussmann, Birgit Demuth, and Frank Finger. Modular architecture fora toolset supporting OCL. In Andy Evans, Stuart Kent, and Bran Selic, editors,Proc. 3rd Int. Conf. on the Unified Modeling Language, LNCS 1939, pages 278–293.Springer, 2000.

11. Kristofer Johannisson. OCL tool implementation homepage, 2004. http://www.

cs.chalmers.se/~krijo/ocltc/.12. Aarne Ranta. Grammatical Framework: A Type-theoretical Grammar Formalism.

The Journal of Functional Programming, 14(2):145–189, 2004.

67

Page 78: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

68

Page 79: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Paper 4.Translating Formal SoftwareSpecifications to NaturalLanguage — A Grammar-BasedApproachDavid A. Burke and Kristofer Johannisson. Translating formal specifications tonatural language — a grammar-based approach. In Proceedings of LACL 2005,edited by Philippe Blache and Edward Stabler, number 3492 in Springer LNAI,2005 (forthcoming).

Page 80: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement
Page 81: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Translating Formal Software Specifications toNatural Language

A Grammar-Based Approach

David A. Burke and Kristofer Johannisson

Department of Computing Science, Chalmers University of Technology and GoteborgUniversity, SE-41296 Goteborg, Sweden, [email protected]

Abstract. We describe a system for automatically translating formalsoftware specifications to natural language. The system produces naturallanguage which is acceptable to a human reader, and it supports by-hand optimization by users who are not experts of our system. Thetranslation system is implemented using the Grammatical Framework,a grammar formalism based on Martin-Lof’s type theory. We show thatthis grammar-based approach scales well enough to handle a non-trivialcase study: translating the Object Constraint Language specifications ofthe Java Card API into English.

1 Introduction

The goal of this work is to automatically translate formal software specificationsinto natural language. Our motivation is a wish to link formal specifications (asneeded for formal methods) to informal ones (as found in software engineeringpractice). Our work is a part of the KeY project [1], which integrates formalsoftware specification and verification into the industrial software engineeringprocesses.

We have implemented a system, earlier described in [2], using the Grammat-ical Framework (GF), a grammar formalism based on Martin-Lof’s type theory[3, 4]. In this paper we show that our grammar-based approach scales to such adegree that we can handle a non-trivial case study.

The case study consists of specifications for the Java Card API [5], writtenin the Object Constraint Language, which have been translated into Englishby our system. To improve the quality of the translation, we have extendedour system with formatting and automatic generation of grammar modules fordomain-specific vocabulary, which can then be modified without requiring GFexpertise. We have also added various simple stylistic improvements inspired bytechniques familiar from Natural Language Generation [6]. As far as possible, allthese improvements are implemented in a declarative way in the GF grammars;some of them also require manipulation of syntax trees by a separate program,external to the grammars.

71

Page 82: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

1.1 Paper Overview

We start with background on the GF formalism, formal specifications, the KeYproject, and the Java Card API specification case study in Sect. 2. In Sect. 3 wethen give a motivating example from the case study, showing an example formalspecification, as well as English translations before and after our improvements.

Sect. 4 explains the overall architecture of the translation system, whileSect. 5 is concerned with grammar engineering: how to design the grammarbased system to meet our goals.

Sect. 6 describes related work, mainly Natural Language Generation. Somefigures on the size of the case study are given in Sect. 7, and we then concludein Sect. 8.

2 Background

2.1 The Grammatical Framework

The GF Formalism. The Grammatical Framework (GF) is a formalism for defin-ing grammars [3]. A GF grammar consists of one part which describes abstractsyntax, and another part which describes concrete syntax. The abstract syntaxpart is formulated in a version of Martin-Lof’s type theory [4], and can be seenas a description of how to construct abstract syntax trees. The concrete syntaxthen consists of linearization rules telling how to present these trees as expres-sions of a particular language. This is a distinguishing feature of GF as comparedto many other grammar formalisms: grammars are written from the perspectiveof linearization rather than parsing. In fact, we can consider the GF formalismas a linearization (or generation) oriented typed functional language.

The concrete syntax is based on record types, strings and finite parametertypes, enabling the representation of e.g. inflection tables and discontinuousconstituents. A central notion in GF is compositionality : the linearization of atree is always expressed in terms of the linearization of its subtrees, we haveno access to the subtrees themselves in a linearization rule. This restriction isimportant for the implementation of GF.

By having multiple concrete syntaxes for the same abstract syntax we achievemultilinguality : we can present the same tree in several languages in parallel, andwe can translate (within the language fragment described by the grammar) byparsing using one concrete syntax and linearizing with another. Compositionalityimposes a restriction of structural similarity on the languages sharing the sameabstract syntax, however, this restriction is to some degree countered by theexpressiveness of the concrete syntax.

The GF System. The GF system [7] provides functionality such as parsing andlinearization for grammars written in the GF formalism. The system also includesa syntax editor [8] in which the user can load a GF grammar and then edit theabstract syntax trees described by the grammar. The trees are all the timepresented in the languages defined by the concrete syntaxes of the grammar. By

72

Page 83: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

editing the abstract syntax tree and observing the results in a familiar language,a user can then interactively produce texts in foreign languages.

The GF Resource Grammar Library. An important part of the GF project is theresource grammar library [9], which provides an API of types and functions forcommon linguistic structures. There are resource grammars available for English,Finnish, French, German, Italian, Russian and Swedish, which to a large extentshare the same interface.

Grammar Engineering. A typical GF application grammar describes a well-defined fragment of natural language for a restricted domain, e.g. in our casesoftware specifications. The resource grammar library provides a division oflabour: the author of an application grammar can be a domain expert, whodoes not need to be familiar with linguistic details. His or her task is to come upwith an abstract syntax which models the domain, and to link the abstract syn-tax to concrete language by using the resource grammars. The linguistic expertis in turn responsible for the implementation of the resource grammars, whereno knowledge of a particular domain is needed.

2.2 Formal Specifications and the KeY Project

The KeY Project. The KeY project [1] attempts to integrate formal softwarespecification and verification into the industrial software engineering processes.The starting point is a commercial CASE (Computer Aided Software Engineer-ing) tool, which is augmented by capabilities for formal specification and verifi-cation. The ultimate goal is to make the verification process transparent for theuser with respect to the informal object-oriented model.

Formal and Informal Specifications. Formal methods require formal specifica-tions, but in software engineering practice, informal specifications are commonlyused. We cannot expect everyone who needs to deal with specifications—e.g.customers, managers, or software engineers—to master a formal notation (cf.[10] p. 131 “. . .most customers don’t understand formal specifications and arereluctant to accept it as a system contract”). This motivates the need for a sys-tematic link between formal and informal specifications: to support authoringof specifications as well as synchronizing and maintaining formal and informalversions of specifications, and to present the specifications to different audiencesusing different levels of formality.

The Object Constraint Language. The Object Constraint Language (OCL) isa formal specification language used to specify precise requirements for object-oriented software systems [11]. It is a sub-standard of the Unified ModellingLanguage (UML) [12]. An OCL specification is always given in the context ofsome particular UML model,1 and it consists of a boolean expression which1 In this paper, a UML model is simply a class diagram, containing classes, attributes,

methods and associations. Side-effect free methods are called queries.

73

Page 84: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

is used as an invariant of a class, or as a pre-condition or post-condition of amethod. The attributes, queries and associations from the UML model, as well asa library of predefined types (e.g. integers, strings and collections) are availablefor constructing OCL expressions. For example, given a class OwnerPIN, withattributes tryCounter and maxTries, we can specify the requirement “the trycounter is at most the maximum number of tries” using the OCL:

context OwnerPINinv: self.tryCounter <= self.maxTries

As we see in this example, each OCL expression is given in the context of aparticular class, and the expression self refers to an instance of that class.

2.3 The Java Card API Specification

Java Card technology [5] allows software developers to write Java programs thatrun on smart cards and other devices with very limited memory and processingcapabilities. The Java Card API (Application Programming Interface) is a setof library classes used in Java Card programs. It is a subset of the standard JavaAPI and is specifically designed for smart card programming.

Due to the size and nature of the applications that use Java Card, formalmethods could be useful in verifying the correctness of these programs. Withthis in mind, OCL constraints have been defined for the Java Card API in [13](based on JML specifications in [14]). These OCL specifications provided thebasis for a case study using our translation tool. The specifications of 37 Javaclasses were fully translated, and examples from this are used throughout thispaper. Details on the case study are available in [15] and on the web [16].

3 Motivating Example

In this section we will take an OCL specification from the Java Card API casestudy (Fig. 1) and show how it gets translated into natural language. For com-parison, we start with a translation produced with an earlier version of oursystem, Fig. 2. Then we discuss some possible improvements of this translation,which leads to the translation in Fig. 3, which is the output from the currentsystem. The machinery behind the improvements will be explained later.

3.1 The OCL Specification

We consider the OCL specification for the method check of the class OwnerPIN.OwnerPIN stores the PIN code of a smart card, and keeps track of the maximalnumber of attempts allowed to present the correct PIN before the card is locked.The purpose of the method check is to compare a given PIN number withthe PIN value in the OwnerPIN class itself. If they match and the PIN is notblocked, it sets the validated flag and resets the try counter to its maximum.If it does not match, it decrements the try counter and, if the counter has

74

Page 85: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

reached zero, blocks the PIN. The try counter is specified in the first element ofthe attribute triesLeft, which is an array. The validated flag can be accessedusing the isValidated() method. The PIN comparison can be done using thearrayCompare() method which is defined in the Util class of the JavaCard API.Fig. 1 shows the OCL specification of check (including a definition of a helperattribute tryCounter).

context OwnerPINdef: let tryCounter = self.triesLeft->at(1)

context OwnerPIN::check(pin: Sequence(Integer),offset: Integer, length: Integer): Boolean

post: self.tryCounter = 0 implies result = falsepost: (self.tryCounter > 0 and pin <> null and offset >= 0 and length >= 0

and offset+length <= pin->size()and Util.arrayCompare(self.pin, 0, pin, offset, length) = 0

) implies (result = true and self.isValidated() and tryCounter = maxTries)post: (self.tryCounter > 0 and not (pin <> null and offset >= 0 and length >= 0

and offset+length <= pin->size()and Util.arrayCompare(self.pin, 0, pin, offset, length) = 0)

) implies (not self.isValidated() and self.tryCounter = tryCounter@pre-1 and(( not excThrown(java::lang::Exception) and result = false)

or excThrown(java::lang::NullPointerException)or excThrown(java::lang::ArrayIndexOutOfBoundsException)))

Fig. 1. OCL specification from the Java Card API

3.2 A First Attempt

In Fig. 2 we show the translation of the OCL specification produced by an earlierversion of our system. The English text is basically correct, but it is clumsy andvery hard to read.

for the class OwnerPIN introduce the following definition : the tryCounter is defined as the element

at index 1 of the triesLeft of the ownerPIN for the operation check ( pin : Seq(Integer) , offset :

Integer , length : Integer ) : Boolean of the class javacard::framework::OwnerPIN the following holds

: the following postconditions should hold : (*) if the tryCounter of the ownerPIN is equal to 0 , the

result is equal to false (*) if the tryCounter of the ownerPIN is greater than 0 and pin is not equal

to null and offset is at least 0 and length is at least 0 and offset plus length is at most the size of

pin and the query arrayCompare ( the pin of the ownerPIN , 0 , pin , offset , length ) to Util is

equal to 0 , the result is equal to true and the query isValidated ( ) holds for the ownerPIN and the

tryCounter of the ownerPIN is equal to the maxTries of the ownerPIN (*) if the tryCounter of the

ownerPIN is greater than 0 and it is not the case that pin is not equal to null and offset is at least 0

and length is at least 0 and offset plus length is at most the size of pin and the query arrayCompare

( the pin of the ownerPIN , 0 , pin , offset , length ) to Util is equal to 0 , it is not the case that

the query isValidated ( ) holds for the ownerPIN and the tryCounter of the ownerPIN is equal to

the tryCounter of the ownerPIN at the beginning of the Operation minus 1 and it is not the case

that an exception is thrown and the result is equal to false or a nullPointerException is thrown or

an arrayIndexOutofBoundsException is thrown

Fig. 2. Translation of OCL specification (before)

75

Page 86: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

3.3 An Improved Translation

Fig. 3 shows an improved version of the translation: the output of the currentversion of our system. Below we go through the improvements made. Each oneis quite simple in itself, but the end result is in our opinion a text of acceptablequality, which shows that our approach works for non-trivial specifications.

for the class OwnerPIN introduce the following definition :

– the try counter is defined as the element at index 1 of the triesLeft attribute

for the operation check ( pin : Sequence(Integer) , offset : Integer , length : Integer ) :Boolean of the class javacard::framework::OwnerPIN ,the following post-conditions should hold :

– if the try counter is equal to 0 then this implies that the result is equal to false– if the following conditions are true

• the try counter is greater than 0• pin is not equal to null• offset is at least 0• length is at least 0• offset plus length is at most the size of pin• the query arrayCompare ( the pin , 0 , pin , offset , length )1 on Util is

equal to 0then this implies that the following conditions are true• the result is equal to true• this owner PIN is validated• the try counter is equal to the maximum number of tries

– if the try counter is greater than 0 and at least one of the following conditions is not true• pin is not equal to null• offset is at least 0• length is at least 0• offset plus length is at most the size of pin• the query arrayCompare ( the pin , 0 , pin , offset , length )2 on Util is

equal to 0then this implies that the following conditions are true• this owner PIN is not validated• the try counter is equal to the previous value of the try counter minus 1• at least one of the following conditions is true

∗ an exception is not thrown and the result is equal to false∗ a null pointer exception is thrown∗ an array index out of bounds exception is thrown

1 Compares the specified source array, beginning at the specified position, with the destinationarray beginning at the specified position from left to right. A result of 0 indicates that the arraysare equal.

2 Compares the specified source array, beginning at the specified position, with the destinationarray beginning at the specified position from left to right. A result of 0 indicates that the arraysare equal.

Fig. 3. Translation of OCL specification (after)

Formatting. Two of the most important problems in Fig. 2 are (1) the specifica-tion is just a big piece of text, where the structure is very hard to discern, and

76

Page 87: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

(2) it is hard or impossible to determine the scope of the and:s and or:s, whichmakes the specification ambiguous. To address these problems we introduce for-matting: line breaks are inserted, keywords are printed in bold and argumentsto the method are italicized. Furthermore, lists of constraints, as well as se-quences of and/or statements are made into bullet lists. The formatting consistsof HTML or LATEX tags in the text, what we see in Fig. 3 is the LATEX version.

Negation. The scope of the negations is hard to determine, and negating sen-tences as in e.g. “it is not the case that an exception is thrown” is a clumsyconstruction. Using “an exception is not thrown” instead solves both these prob-lems.

Making Use of the Context. In Fig. 2 the text “of the ownerPIN” is used veryfrequently. Since the specification is given as postconditions in the context of amethod of the class OwnerPIN, we should be able to just leave out all occurrencesof “of the ownerPIN”, resulting in a much less repetitive text. (In OCL we areallowed to do the same thing, by leaving out self.) This can be seen as a simplecase of referring expressions generation [6].

Domain-Specific Vocabulary. Based on the type and capitalization of identifiers,we can improve the translation of domain-specific vocabulary. For instance, theclass OwnerPIN can be automatically translated as “owner PIN”, instead of just“ownerPIN” as in Fig. 2.

The attributes triesLeft and maxTries are similarly translated as “the triesleft” and “the max tries” by default. Although these translations are quite ade-quate, we can improve this even further by making manual changes. Thus, usingour system we can by hand change the translation to “the triesLeft attribute”and “the maximum number of tries”, for these attributes.

Two methods are used in this OCL constraint: isValidated and array-Compare. For isValidated, which returns a boolean, we introduce some simpleheuristics which by default translates it to “. . . is validated” instead of “the queryisValidated() holds”. The second method used, arrayCompare, gets linearizedto “the query arrayCompare ( the pin, 0 , pin , offset , length ) onUtil”. Unfortunately this method is not so easily translated into simple English.The task it carries out does not fit nicely as part of the translated constraint. Tosolve this problem, we use the ‘note’ facility provided in the grammar. We canby hand add a note for the method, and this will then be displayed as a tool-tipwhen HTML formatting is used or, as in this case, as a footnote when LATEXformatting is used (a current limitation is the needless duplication of footnotes).

4 System Overview

The system is built around a GF grammar for specifications: there is an abstractsyntax giving rules for how to form abstract syntax trees of specifications, as wellas three concrete syntaxes to present abstract syntax trees in OCL, English andGerman, respectively.

77

Page 88: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Given this grammar, the GF system provides us with a syntax editor wherewe can edit specifications in OCL, English and German in parallel. We also getparsers and linearizers for OCL and (fragments of) English and German, whichwe can use for translating e.g. OCL specifications into English, by first parsingOCL into an abstract syntax, and then linearizing the tree into English.

However, our system does not just consist of a GF grammar and the func-tionality provided directly by GF. What makes things more complex is that (1)parts of the grammar are dynamically generated depending on the context (thegrammar is not closed), and (2) we use a separate program for parsing OCLspecifications and (3) turning them into GF abstract syntax trees. These exter-nal programs (as well as the GF system itself) are implemented in the functionallanguage Haskell. Fig. 4 shows the overall structure of the system.

There are two current prototypes of our system: one which allows syntaxediting of specifications in OCL and English inside KeY, another which allowsbatch-translation from OCL to English (yet to be integrated more closely withKeY). The concrete German grammar has so far only been used for small ex-amples [17]. The system is available for download [16].

Syntax editing of specifications is briefly described in [2], except for the in-tegration with the KeY system. In this paper we focus on taking existing OCLspecifications into English.

OCL text UML model

GF syntax tree

GF grammar modules

(generated)

Syntax Tree

GF grammar modules (static)

parsing / typechecking

GF

OCL English German

Fig. 4. From OCL to Natural Language

4.1 External Programs

Grammar Generation. An OCL specifications uses domain specific vocabularyas defined by a UML model. When a user adds e.g. a class or an attribute tothe model, he also extends the language of specifications of that model. Wetherefore generate GF grammar modules from the UML model to dynamicallyextend the grammar with domain specific vocabulary. This is described in more

78

Page 89: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

detail in Sect. 5.2. The general idea of dynamically extending a grammar withuser-defined concepts is also used in [18].

External Parser. Given an OCL specification and a UML model, the OCL isfirst parsed using a standard context free parser, and then type checked withrespect to the UML model, resulting in an annotated syntax tree of the OCLspecification. This step is described in detail in [19].

The original motivation for adding an external parser was that the parserderived by GF for our particular grammar had termination problems.2 However,there are also more general reasons for using an external parser: (1) Efficiency:An external, context free parser for a formal language—in our case OCL—ismore efficient than a parser derived from a GF grammar. (2) Modularity: TheGF abstract syntax does not have to handle all particularities of OCL. Forinstance, OCL has various implicit forms which require disambiguation (see e.g.[19]). This can be done by the external parser and typechecker.

Transformation Into GF Trees. The annotated trees returned by the externalparser are transformed into GF abstract syntax trees. The context free structurefor most parts maps into the GF abstract syntax in a straightforward way.However, we also perform some structural transformations in order to improvethe quality of the natural language, e.g. the transformation described in Sect. 5.3.These transformations could probably be avoided by extending the GF abstractand concrete syntax instead, but we believe that expressing the transformationsin the Haskell programming language instead of the GF formalism is in this casea simpler and more modular solution.

5 Grammar Engineering

We start this section by giving a very brief introduction to our GF grammar,to give a general idea of what writing an application grammar for specificationsamounts to (Sect. 5.1). We then explain how the improvements described in themotivating example section are implemented in the GF grammar. There is notenough room for describing everything, so we give two representative examples:dynamically extending the grammar with domain specific concepts (Sect. 5.2),and formatting (Sect. 5.3). Throughout this section, we make use excerpts fromthe grammars without explaining all details of the GF formalism.

5.1 An Application Grammar for Specifications

Representing Specifications: Abstract Syntax. In a typical GF application gram-mar, the abstract syntax part is used for defining a semantic domain, without2 As explained in [2], this is because our grammar makes use of dependent types in

such a way that the derived GF parser, which in a first step disregards dependenttypes, contained cyclic rules. In the meantime, this problem has been given a generalsolution in [20] (the implementation of which is in progress).

79

Page 90: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

any linguistic considerations. We define categories (types) and functions whichgives the rules for how to form trees in these categories. In our case, we are in-terested in the domain of (OCL) specifications. We consider OCL specificationsas expressions formed using the attributes and queries of the classes in the UMLmodel (including the predefined OCL types). For each class in the UML model,there will be a corresponding GF function c as well as a (dependent) categoryInstance c representing expressions of that class. As an example, this is howthe size query of the OCL library class String (which returns the length of astring as an integer) is represented in GF abstract syntax judgements:

cat Class;cat Instance (c:Class);fun StringC, IntegerC : Class;fun size : Instance StringC -> Instance IntegerC;

This defines the GF function size as taking trees of type Instance StringC andreturning something of type Instance IntegerC. Note that with the dependentcategory Instance, we have introduced type-checking into the grammar: onlytrees representing type correct specifications can be built. As already mentioned,this leads to complications for the GF derived parser, see [2] for an explanationof this.

There are many choices to be made on exactly how to model specifications inabstract syntax, most of which we do not discuss here. One example, however,is introducing a category Sent for representing sentences. It is for instance usedfor the equality operator, with which we can state that any two instances x andy of the same class c are equal:

cat Sent;fun equal : (c:Class) -> (x,y : Instance c) -> Sent;

The introduction of Sent is motivated by the fact that in natural language, wewant to distinguish between expressions and sentences. In OCL, however, thereis no such distinction – sentences just correspond to an expression of booleantype. This is an example of an interlingua problem: if a semantic distinction ismade in one language, it has to be introduced into the abstract syntax, even ifit is not present in the other languages.

Using the Resource Grammars: Concrete Syntax. In the concrete syntax, wegive linearization rules presenting abstract syntax trees in English and German(we will not discuss the concrete syntax for OCL). To each category C in theabstract syntax, we must associate a record type: the linearization category ofC. For each function in the abstract syntax, we define a linearization rule whichbuilds a record in the corresponding linearization category. For instance, wemight start with the abstract category Class, to be treated like common nounphrases in concrete syntax:3

3 This example is simplified: in the real grammar, Class has a more complex lineariza-tion category which represents something more than just common noun phrases.

80

Page 91: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

param Number = Sg | Pl;lincat Class = {s : Number => Str};lin IntegerC = {s = table {Sg => "integer"; Pl => "integers"}};

Here we define a parameter type for number. The linearization category of Classis a record type which has one field s, which is a string inflected in number. Thenwe define the linearization of IntegerC as the noun “integer” (in singular andplural form). To complete the concrete syntax, we would then have to go ondefining lincat:s and lin rules for the rest of the abstract syntax types andfunctions, along with the required record and parameter types. However, weinstead make use of the GF resource grammar library [17].

The resource grammars provide an API of linguistically motivated recordand parameter types, along with utility functions to be used in linearizationrules. For instance, there is a parameter type Number, as well as a record typeCN for common noun phrases. Using the resource grammars raises the level ofabstraction in the concrete syntax: instead of dealing directly with issues ofe.g. inflection or word order, we can use the linguistic structures provided bythe resource API. As long as we use only the API, all type-correct uses of theresource grammar preserve grammaticality. Since the API is available for sevenlanguages including English and German, we can often reuse the same concretesyntax. Without explaining any details, an example of our use of the resourcegrammar is the following English linearization rule for the size method:

lin size x = DefOneNP (AppFun (funOfCN(useN (nNonhuman "length"))) x);

The functions used on the right hand side in the linearization rule are part ofthe resource grammar API. The linearization of the tree size x using this rulewill be “the length of x”. To provide the German linearization “die Lange vonx”, the rule is almost the same:

lin size x = DefOneNP (AppFun (funVonCN (useN (nFrau "Lange"))) x);

5.2 Domain-Specific Vocabulary

In order to translate from OCL to English, the grammar needs to contain in-formation about the UML model upon which the OCL is based. A grammargeneration program therefore generates GF modules based on the UML model.The automatic generation does not always produce the most suitable translation,therefore it is also possible for the user to manually improve the translation bymodifying the generated grammars, in particular the linearization rules in theconcrete syntax.

To aid in the construction of the domain-specific concrete module, a re-source module has been defined, which contains many operations that are usefulwhen linearizing classes, attributes etc. We call this the API for Domain-specificVocabulary. This API provides a layer of abstraction which hides some of thecomplexity of the rest of the grammar, making it easier to generate the lin-earizations for the domain entities. It also makes subsequent hand modificationspossible without full knowledge about GF and the resource grammars.

81

Page 92: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Using the API. Although the API uses concepts taken from the OCL grammarand from the resource grammars, the interface provided is simple enough to notrequire a deep knowledge of the underlying grammars. We will consider OCLclasses as an example. The following operation is provided for constructing thelinearization of a class (ClassL is the linearization category of Class):

oper mkClass : CN -> Str -> ClassL;

Classes are defined as consisting of a common noun phrase (CN) that correspondsto the class name as it will appear in natural text, and an identifier (a string),which is the actual name of the class in the UML diagram. The class identifieris used when formally specifying the class name.

Common ways of constructing the CN are included in the API, such as con-structing a CN from a String, or adding an adjective to the CN. Irregular waysof constructing a CN can be found in the resource grammar modules. Take forexample the class OwnerPIN, which is linearized using operations defined in theAPI.

lin OwnerPIN = mkClass (adjCN "owner" (strCN "PIN")) "OwnerPIN";

This will result in the class name being represented as a common noun phrasein natural text: “the maximum PIN size of the owner PIN is greater than 0” whilethe class identifier is used in a more formal setting: “for the class OwnerPIN thefollowing invariants hold :”.

Grammar Generation. The grammar generator uses some heuristics to derive areasonable linearization for a domain entity (a class, an attribute, a method oran association) from its name and type. Given a UML model, it produces anabstract syntax module with one function for each domain entity, and a con-crete module with corresponding English linearizations. A concrete module withOCL linearizations is also generated. The concrete English module makes usethe API for domain-specific vocabulary, and the resource grammars. Since GFsupports separate compilation of modules, only these generated modules needto be recompiled whenever the UML model changes, not the whole grammar.

The heuristics is based on types, and on splitting an identifier into wordsbased on capitalization. E.g., the identifier OwnerPIN is split into the strings“owner” and “PIN”. Since we also know the type, i.e. in this case that OwnerPINis the name of class, we build a noun “owner PIN” as described just above.Another simple rule is special handling of boolean properties that start with“is”, e.g. isValidated becomes a sentence saying “. . . is validated”.

The heuristics for grammar generation obviously depends on the naturallanguage used for identifiers, in this case English. A good heuristics for Germanwould be more complex, e.g. it would require access to a lexicon for determiningthe gender of nouns. The heuristics also requires a consistent convention for wordboundaries in identifiers (e.g. is it isValidated or is validated?).

82

Page 93: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

Modifying the Grammar. The generated grammar does not always succeed inproducing the best translation. Using the API it is possible to make hand-modifications to the generated grammar without too much difficulty. For ex-ample, the attributes triesLeft and maxTries are translated as “the tries left”and “the max tries” by default using the generated judgements we see below.

lin maxTries = mkSimpleProperty (adjCN "max" (strCN "tries"));lin triesLeft = mkSimpleProperty (adjCN "tries" (strCN "left"));

Although these translations are quite adequate, we can improve them bymaking manual changes to the generated grammar using some of the operationsprovided in the API. Thus, using the judgements below, we can construct the text“the triesLeft attribute” and “the maximum number of tries”, for these attributes.

lin maxTries = mkSimpleProperty (ofCN (adjCN "maximum"(strCN "number")) (strCN "tries"));

lin triesLeft = mkSimpleProperty (attrCN "triesLeft");

5.3 Grammar-Based Formatting

The use of formatting in the translated text has a dramatic effect on the readabil-ity of the output. As we see in the motivating example in Fig. 3, the formattingincludes e.g. breaking the text into paragraphs, using different fonts for headingsand argument variables, and presenting various structures in the form of bulletlists.

Most of this formatting is done completely on the level of concrete syntax: Aninterface module has been defined that contains operations required to performformatting tasks, without specifying an implementation. This interface can thenbe implemented in many different ways using different instances. Currently threeinstances exist, allowing the possibility to have no formatting, HTML formattingor LATEX formatting. These instances do not define their own pretty-printingrules, instead they simply use formatting tags leaving the actual layout to behandled by the LATEX and HTML rendering engines. The linearization rules ofthe concrete syntax then makes use of the operations specified by the formattinginterface.

Using Lists for Aggregation. There is one exception to the rule that all formattingis done just in concrete syntax: formatting lists of conjunctions and disjunctionsas bullet lists. This requires changes also in the abstract syntax, as well assupport from the external program which transforms the result of the contextfree OCL parser into GF abstract syntax.

To treat lists of conjunctions (the machinery for disjunctions is just the same)in a special way we simply introduce a new category AndList in the abstractsyntax, along with functions for creating such lists, and converting them intosentences:

83

Page 94: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

fun oneAnd : Sent -> Sent -> AndList;fun consAnd : Sent -> AndList -> AndList;fun andList2Sent : AndList -> Sent;

The base case oneAnd takes two sentences and builds an AndList, containingjust one conjunction. The function consAnd prepends a sentence to an existingAndList. Once the list is built, andList2Sent allows us to consider it as asentence.

A list containing just one conjunction, i.e. a tree oneAnd x y would be lin-earized just as “x and y”, while using consAnd should result in a bullet list,saying “the following conditions are true: . . . ”. However, this is dealt with in theconcrete syntax (which is omitted here), the abstract syntax just provides therequired structure. In fact, we use the same kind of abstract syntax for sums andproducts, where we are not interested in formatting. In that case, the problemis to translate e.g. an OCL expression 2+3 as “2 plus 3”, but 1+2+3 as “the sumof 1, 2 and 3”.

When translating OCL to English, we must find OCL expressions where listsof conjunctions occur, and make sure that they are treated as AndLists. Thiscan be seen as simple aggregation problem [6]. As mentioned above, this step isnot performed inside the grammars, but in the transformation from context freeOCL syntax trees (as returned by the external parser) to GF abstract syntaxtrees.

6 Related Work

Natural Language Generation (NLG) is described in [6] as producing under-standable natural language text from a non-linguistic representation of infor-mation. This very general description also fits GF linearization: Linearizationcan be considered as a two-step procedure, where the linearization rules go fromnon-linguistic abstract syntax to linguistically motivated resource grammar con-structions. The resource grammar implementation then takes the step to surfacestrings in natural language. However, while linearization is therefore clearly morethan just linguistic realization (cf. the discussion in [6] on realization as the in-verse of parsing), it is much simpler than a typical NLG system. Linearizationrules (and the resource grammars) are written in GF concrete syntax, a re-stricted functional language, and linearization rules are always compositional.In contrast, [6] describes a typical NLG system architecture as a pipeline con-sisting of the phases text planning, sentence planning and linguistic realization,along with separate intermediate representation formats.

When using our system for translating OCL to English (as opposed to syn-tax editing), there is also the external OCL parser / typechecker and grammargeneration, and the architecture is more similar to that of a compiler than aNLG system. We also do some transformations to the GF syntax trees in anexternal program. Some of these transformations could be described in terms ofNLG concepts, e.g. aggregation (Sect. 5.3). They are also similar to some of the

84

Page 95: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

ideas in [21], which describes generation of natural language text from formalproofs as a process resembling code generation in a compiler.

7 The Case Study in Numbers

The Java Card API case study consists of OCL specifications of 37 classes, theword count of the English translation is close to 17000. The generated grammarmodules of domain-specific vocabulary contain about 1100 concepts, i.e. 1100 ab-stract syntax functions, each one with a corresponding linearization rule. 361 ofthese concepts are actually being used in the translated specifications. By-handmodifications were made to the linearization rules of 73 of these 361 concepts,i.e. 20% of the used domain-specific concepts needed modifications. 18 of thesemodifications are of a trivial nature and could probably be automated if oneintroduces domain-specific heuristics for grammar generation, which leaves 15%of the used domain-specific concepts that require non-trivial modifications.

8 Conclusion

We have presented a tool for translating formal OCL specifications into natu-ral language based on GF grammars. By adding a domain-specific vocabularyAPI, formatting, and other stylistic improvements, we achieve a translation ofa non-trivial case study of OCL specifications which is acceptable to a humanreader. Relatively few by-hand modifications using the API were necessary; themodifications are made on the grammar level, but do not require linguistic or GFexpertise. Although we add external programs, the compositional and declara-tive GF formalism remains the centre of our work: the external programs areused either to generate GF grammar modules, or to manipulate GF abstractsyntax trees. The tool and the Java Card API case study are available on theweb [16].

8.1 Future Work

Important lines of future work include: (1) Further improvements to the naturallanguage, e.g. by more sophisticated use of aggregation and referring expressionsgeneration. (2) A more formal evaluation of the quality of the generated naturallanguage. (3) Integration into the KeY system, most importantly providing auser interface for manipulating domain-specific vocabulary, based on the APIwe have defined.

Acknowledgements

We thank Reiner Hahnle, Aarne Ranta and the anonymous referees for valuablesuggestions on how to improve the paper.

85

Page 96: Formal and Informal Software Speci cations€¦ · Formal and Informal ... speci cations by providing a link between the formal language ... a programmer who is supposed to implement

References

1. Ahrendt, W., Baar, T., Beckert, B., Bubel, R., Giese, M., Hahnle, R., Menzel, W.,Mostowski, W., Roth, A., Schlager, S., Schmitt, P.H.: The KeY tool. Software andSystem Modeling 4 (2005) 32–54

2. Hahnle, R., Johannisson, K., Ranta, A.: An authoring tool for informal and for-mal requirements specifications. In Kutsche, R.D., Weber, H., eds.: FundamentalApproaches to Software Engineering. Number 2306 in LNCS (2002)

3. Ranta, A.: Grammatical Framework: A Type-theoretical Grammar Formalism.The Journal of Functional Programming 14 (2004) 145–189

4. Martin-Lof, P.: Intuitionistic Type Theory. Bibliopolis, Napoli (1984)5. Sun Microsystems: Java card homepage (2004) http://java.sun.com/products/

javacard/.6. Reiter, E., Dale, R.: Building applied natural language generation systems. Journal

of Natural Language Engineering 3 (1997) 57–877. Ranta, A.: Grammatical Framework homepage (2005) www.cs.chalmers.se/

~aarne/GF.8. Khegai, J., Nordstrom, B., Ranta, A.: Multilingual syntax editing in GF. In

Gelbukh, A., ed.: CICLing-2003, Mexico City, Mexico. LNCS, Springer (2003)9. Ranta, A.: The GF resource grammar library (2004) http://www.cs.chalmers.

se/~aarne/GF/lib/resource/.10. Sommerville, I.: Software Engineering. Seventh edn. Addison Wesley (2004)11. The Object Management Group: Object constraint language specification (2004)

http://www.omg.org/docs/formal/03-03-13.pdf.12. The Object Management Group: Unified modelling language homepage (2004)

http://www.uml.org.13. Larsson, D., Mostowski, W.: Specifying Java Card API in OCL. In Schmitt, P.H.,

ed.: OCL 2.0 Workshop at UML 2003. Volume 102C of ENTCS., Elsevier (2004)3–19

14. Meijer, H., Poll, E.: Towards a full formal specification of the Java Card API. InAttali, I., Jensen, T., eds.: Smart Card Programming and Security. Number 2140in LNCS, Springer (2001) 165–178

15. Burke, D.A.: Improving the natural language translation of formal software speci-fications. Master’s thesis, Chalmers University of Technology, SE-412 96 Goteborg,Sweden (2004)

16. Johannisson, K.: OCL to natural language tool homepage (2004) http://www.cs.chalmers.se/~krijo/gfspec/.

17. Daniels, H.J.: Eine deutsche Grammatik fur OCL. Studienarbeit (2003) http:

//www.cs.chalmers.se/~krijo/gfspec/.18. Hallgren, T., Ranta, A.: An extensible proof text editor. In Parigot, M., Voronkov,

A., eds.: Logic for Programming and Automated Reasoning, LPAR. LNAI 1955,Springer (2000) 70–84

19. Johannisson, K.: Disambiguating implicit constructions in OCL (2004) Onlineproceedings of OCL and Model Driven Engineering Workshop at UML 2004, http://www.cs.kent.ac.uk/projects/ocl/oclmdewsuml04/description.htm.

20. Ljunglof, P.: Expressivity and complexity of the Grammatical Framework.PhD thesis, Chalmers University of Technology, Goteborg University, SE-412 96Goteborg, Sweden (2004)

21. Coscoy, Y., Kahn, G., Thery, L.: Extracting text from proofs. In Dezani-Ciancaglini, M., Plotkin, G., eds.: Proc. Second Int. Conf. on Typed LambdaCalculi and Applications. Volume 902 of LNCS. (1995) 109–123

86


Top Related