data, syntax and semantics - better together toolkit

Data, Syntax and Semantics

An Introduction to Modelling Programming Languages

J V Tucker

Department of Computer ScienceUniversity of Wales Swansea

Singleton ParkSwansea SA2 8PP

Wales

K Stephenson

QinetiQSt Andrews Road

Malvern WR14 3PSEngland

Copyright J V Tucker and K Stephenson c©2006

This is an almost complete first draft of a text-book. It is the text for asecond year undergraduate course on the Theory of Programming Languagesat Swansea. Criticisms and suggestions are most welcome.

20th March 2006

“. . . yet muste you and all men take heed, that . . . in al mennes workes, you be notabused by their autoritye, but evermore attend to their reasons, and examine themwell, ever regarding more what is saide, and how it is proved, then who saieth it:for autoritie often times deceaveth many menne.”

Robert RecordeThe Castle of Knowledge, 1556

Contents

1 Introduction 11.1 Science and the aims of modelling programming languages . . . . . . . . . . . . 21.2 Some Scientific Questions about Programming Languages . . . . . . . . . . . . . 31.3 A Simple Imperative Programming Language and its Extensions . . . . . . . . . 6

1.3.1 What is a while Program? . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Core Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Where do we find while programs? . . . . . . . . . . . . . . . . . . . . . . . . . 121.4.1 Gallery of Imperative Languages . . . . . . . . . . . . . . . . . . . . . . 131.4.2 Pseudocode and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.3 Object-Oriented Languages . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.4 Algebraic, Functional and Logic Languages . . . . . . . . . . . . . . . . . 17

1.5 Additional Constructs of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5.2 Importing Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5.3 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5.4 Atomic Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5.5 Control Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.5.6 Exception and Error handling . . . . . . . . . . . . . . . . . . . . . . . . 221.5.7 Non-determinism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.5.8 Procedures, Functions, Modules and Program Structure . . . . . . . . . . 23

1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.6.1 Kernel and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Origins of Programming Languages 292.1 Historical Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2 The Analytical Engine and its Programming . . . . . . . . . . . . . . . . . . . . 302.3 Development of Universal Machines . . . . . . . . . . . . . . . . . . . . . . . . . 342.4 Early Programming in the Decade 1945-1954 . . . . . . . . . . . . . . . . . . . . 362.5 The Development of High Level Programming Languages . . . . . . . . . . . . . 39

2.5.1 Early Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.5.2 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.5.3 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.5.4 Declarative languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.6 Specification and Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.7 Data Types and Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

i

ii CONTENTS

I Data 49

3 Basic Data Types and Algebras 533.1 What is an Algebra? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.2 Algebras of Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2.1 Standard Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.2.2 Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2.3 Equivalence of Booleans and Bits . . . . . . . . . . . . . . . . . . . . . . 593.2.4 Three-Valued Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3 Algebras of Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3.1 Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3.2 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3.3 Decimal versus Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.3.4 Partial Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.4 Algebras of Integers and Rationals . . . . . . . . . . . . . . . . . . . . . . . . . 673.4.1 Algebras of Integer Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 683.4.2 Algebras of Rationals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.5 Algebras of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.5.1 Measurements and Real Numbers . . . . . . . . . . . . . . . . . . . . . . 703.5.2 Algebras of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.6 Data and the Analytical Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.7 Algebras of Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.7.1 Constructing Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.7.2 Manipulating Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 Interfaces and Signatures, Implementations and Algebras 834.1 Formal Definition of a Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.1.1 An Example of a Signature . . . . . . . . . . . . . . . . . . . . . . . . . 854.1.2 General Definition of a Signature . . . . . . . . . . . . . . . . . . . . . . 86

4.2 Examples of Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.2.1 Examples of Signatures for Basic Data Types . . . . . . . . . . . . . . . 894.2.2 Signature for Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.2.3 Signature for Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924.2.4 Storage Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.2.5 Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.3 Formal Definition of an Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.3.1 General Notation and Examples . . . . . . . . . . . . . . . . . . . . . . . 95

4.4 Examples of Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.4.1 Examples of Algebras for Basic Data Types . . . . . . . . . . . . . . . . 964.4.2 Storage Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.4.3 Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.4.4 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.4.5 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.5 Algebras with Booleans and Flags . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.5.1 Algebras with Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.5.2 Algebras with an Unspecified Element . . . . . . . . . . . . . . . . . . . 103

CONTENTS iii

4.6 Generators and Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.7 Subalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.7.1 Examples of Subalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.7.2 General Definition of Subalgebras . . . . . . . . . . . . . . . . . . . . . . 110

4.8 Expansions and Reducts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.9 Importing Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.9.1 Importing the Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.9.2 Importing a Data Type in General . . . . . . . . . . . . . . . . . . . . . 1184.9.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5 Specifications and Axioms 1275.1 Classes of Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.1.1 One Signature, Many Algebras . . . . . . . . . . . . . . . . . . . . . . . 1295.1.2 Reasoning and Verification . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.2 Classes of Algebras Modelling Implementations of the Integers . . . . . . . . . . 1315.3 Axiomatic Specification of Commutative Rings . . . . . . . . . . . . . . . . . . . 134

5.3.1 The Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.3.2 Deducing Further Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355.3.3 Solving Quadratic Equations in a Commutative Ring . . . . . . . . . . . 138

5.4 Axiomatic Specification of Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 1405.5 Axiomatic Specification of Groups and Abelian Groups . . . . . . . . . . . . . . 143

5.5.1 The Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1435.5.2 Groups of Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 1455.5.3 Matrix Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1465.5.4 Reasoning with the Group Axioms . . . . . . . . . . . . . . . . . . . . . 147

5.6 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.7 Current Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6 Examples: Data Structures, Files, Streams and Spatial Objects 1576.1 Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6.1.1 Signature/Interface of Records . . . . . . . . . . . . . . . . . . . . . . . . 1606.1.2 Algebra/Implementation of Records . . . . . . . . . . . . . . . . . . . . . 161

6.2 Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.2.1 Signature/Interface of Dynamic Arrays . . . . . . . . . . . . . . . . . . . 1646.2.2 Algebra/Implementation of Dynamic Arrays . . . . . . . . . . . . . . . . 165

6.3 Algebras of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1686.3.1 A Simple Model of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

6.4 Time and Data: A Data Type of Infinite Streams . . . . . . . . . . . . . . . . . 1726.4.1 What is a stream? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1736.4.2 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1736.4.3 Streams of Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1756.4.4 Algebra/Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1766.4.5 Finitely Determined Stream Transformations and Stream Predicates . . . 1766.4.6 Decimal Representation of Real Numbers . . . . . . . . . . . . . . . . . . 179

6.5 Space and Data: A Data Type of Spatial Objects . . . . . . . . . . . . . . . . . 1836.5.1 What is space? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

iv CONTENTS

6.5.2 Spatial objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1876.5.3 Operations on Spatial Objects . . . . . . . . . . . . . . . . . . . . . . . . 1896.5.4 Volume Graphics and Constructive Volume Geometry . . . . . . . . . . . 191

7 Abstract Data Types and Homomorphisms 2057.1 Comparing Decimal and Binary Algebras of Natural Numbers . . . . . . . . . . 208

7.1.1 Data Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2097.1.2 Operation Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . 209

7.2 Translations between Algebras of Data and Homomorphisms . . . . . . . . . . . 2117.2.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.2.2 Homomorphisms and Binary Operations . . . . . . . . . . . . . . . . . . 2147.2.3 Homomorphisms and Number Systems . . . . . . . . . . . . . . . . . . . 2157.2.4 Homomorphisms and Machines . . . . . . . . . . . . . . . . . . . . . . . 220

7.3 Equivalence of Algebras of Data: Isomorphisms and Abstract Data Types . . . . 2217.3.1 Inverses, Surjections, Injections and Bijections . . . . . . . . . . . . . . . 2227.3.2 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2247.3.3 Abstract Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

7.4 Induction on the Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 2277.4.1 Induction for Sets and Predicates . . . . . . . . . . . . . . . . . . . . . . 2287.4.2 Course of Values Induction and Other Principles . . . . . . . . . . . . . . 2307.4.3 Defining Functions by Primitive Recursion . . . . . . . . . . . . . . . . . 230

7.5 Naturals as an Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . 2327.6 Digital Data Types and Computable Data Types . . . . . . . . . . . . . . . . . 235

7.6.1 Representing an Algebra using Natural Numbers . . . . . . . . . . . . . . 2367.6.2 Algebraic Definition of Digital Data Types . . . . . . . . . . . . . . . . . 2387.6.3 Computable Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

7.7 Properties of Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2397.8 Congruences and Quotient Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 241

7.8.1 Equivalence Relations and Congruences . . . . . . . . . . . . . . . . . . . 2417.8.2 Quotient Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

7.9 Homomorphism Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

8 Terms and Equations 2518.1 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

8.1.1 What is a Term? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2528.1.2 Single-Sorted Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2558.1.3 Many-Sorted Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

8.2 Induction on Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2658.2.1 Principle of Induction for Single-Sorted Terms . . . . . . . . . . . . . . . 2658.2.2 Principle of Induction for Many-Sorted Terms . . . . . . . . . . . . . . . 2668.2.3 Functions on Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2678.2.4 Comparing Induction on Natural Numbers and Terms . . . . . . . . . . . 270

8.3 Terms and Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2718.3.1 Examples of Trees for Terms . . . . . . . . . . . . . . . . . . . . . . . . . 2718.3.2 General Definitions by Structural Induction . . . . . . . . . . . . . . . . 272

8.4 Term Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

CONTENTS v

8.5 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2778.5.1 What is an equation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2778.5.2 Equations, Satisfiability and Validity . . . . . . . . . . . . . . . . . . . . 2808.5.3 Equations are Preserved by Homomorphisms . . . . . . . . . . . . . . . . 282

8.6 Term Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2848.7 Homomorphisms and Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

8.7.1 Structural Induction, Term Evaluation and Homomorphisms . . . . . . . 2888.7.2 Initiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2908.7.3 Representing Algebras using Terms . . . . . . . . . . . . . . . . . . . . . 291

9 Abstract Data Type of Real Numbers 2979.1 Representations of the Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . 298

9.1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2999.1.2 Method of Richard Dedekind (1858) . . . . . . . . . . . . . . . . . . . . . 3009.1.3 Method of Georg Cantor (1872) . . . . . . . . . . . . . . . . . . . . . . . 3029.1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

9.2 The Real Numbers as an Abstract Data Type . . . . . . . . . . . . . . . . . . . 3049.2.1 Real Numbers as a Field . . . . . . . . . . . . . . . . . . . . . . . . . . . 3059.2.2 Real Numbers as an Ordered Field . . . . . . . . . . . . . . . . . . . . . 3069.2.3 Completeness of the ordering . . . . . . . . . . . . . . . . . . . . . . . . 309

9.3 Uniqueness of the Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 3119.3.1 Preparations and Overview of Proof of the Uniqueness of the Reals . . . 3119.3.2 Stage 1: Constructing The Rational Ordered Subfields . . . . . . . . . . 3139.3.3 Stage 2: Approximation by the Rational Ordered Subfield . . . . . . . . 3199.3.4 Stage 3: Constructing the Isomorphism . . . . . . . . . . . . . . . . . . . 322

9.4 Cantor’s Construction of the Real Numbers . . . . . . . . . . . . . . . . . . . . 3259.4.1 Equivalence of Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . 3269.4.2 Algebra of Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . 3289.4.3 Algebra of Equivalence Classes of Cauchy Sequences . . . . . . . . . . . . 3339.4.4 Cantor Reals are a Complete Ordered Field . . . . . . . . . . . . . . . . 338

9.5 Computable Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3459.6 Representations of Real Numbers and Practical Computation . . . . . . . . . . . 347

9.6.1 Practical Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3479.6.2 Floating Point Representations . . . . . . . . . . . . . . . . . . . . . . . 349

II Syntax 353

10 Syntax and Grammars 35710.1 Alphabets, Strings and Languages . . . . . . . . . . . . . . . . . . . . . . . . . . 358

10.1.1 Formal Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35810.1.2 Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36010.1.3 Natural Language Examples . . . . . . . . . . . . . . . . . . . . . . . . . 36010.1.4 Languages of Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . 36310.1.5 Programming Language Examples . . . . . . . . . . . . . . . . . . . . . . 36410.1.6 Designing Syntax using Formal Languages . . . . . . . . . . . . . . . . . 366

vi CONTENTS

10.2 Grammars and Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36710.2.1 Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36710.2.2 Examples of Grammars and Strings . . . . . . . . . . . . . . . . . . . . . 36810.2.3 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37110.2.4 Language Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37210.2.5 Designing Syntax using Grammars . . . . . . . . . . . . . . . . . . . . . 373

10.3 Specifying Syntax using Grammars: Modularity and BNF Notation . . . . . . . 37310.3.1 A Simple Modular Grammar for a Programming Language . . . . . . . . 37410.3.2 The Import Construct and Modular Grammars . . . . . . . . . . . . . . 37710.3.3 User Friendly Grammars and BNF Notation . . . . . . . . . . . . . . . . 378

10.4 What is an Address? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38110.4.1 Postal Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38110.4.2 World Wide Web Addresses . . . . . . . . . . . . . . . . . . . . . . . . . 388

11 Languages for Interfaces, Specifications and Programs 39711.1 Interface Definition Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

11.1.1 Target Syntax: Mathematical Definition of Signatures . . . . . . . . . . . 39911.1.2 A Simple Interface Definition Language for Data Types . . . . . . . . . . 39911.1.3 Comparisons with Target Syntax . . . . . . . . . . . . . . . . . . . . . . 403

11.2 A Modular Interface Definition Language for Data Types . . . . . . . . . . . . . 40411.2.1 Signatures with Imports . . . . . . . . . . . . . . . . . . . . . . . . . . . 404

11.3 Extensions of a Kernel and Flattening . . . . . . . . . . . . . . . . . . . . . . . 40711.3.1 Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40711.3.2 Dependency Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40811.3.3 Flattening Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40911.3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40911.3.5 Comparison with Target Syntax . . . . . . . . . . . . . . . . . . . . . . . 411

11.4 Languages for Data Type Specifications . . . . . . . . . . . . . . . . . . . . . . . 41111.4.1 Target Syntax: Data Type Specifications . . . . . . . . . . . . . . . . . . 41111.4.2 Languages for First-Order Specifications . . . . . . . . . . . . . . . . . . 41511.4.3 Languages for Equational Specifications . . . . . . . . . . . . . . . . . . 41811.4.4 Comparison with Target Syntax . . . . . . . . . . . . . . . . . . . . . . . 420

11.5 A While Programming Language over the Natural Numbers . . . . . . . . . . . 42111.5.1 Target Syntax: Simple Imperative Programs . . . . . . . . . . . . . . . . 42111.5.2 A Grammar for while Programs over Natural Numbers . . . . . . . . . . 42211.5.3 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43011.5.4 Comparison with Target Syntax . . . . . . . . . . . . . . . . . . . . . . . 432

11.6 A While Programming Language over a Data Type . . . . . . . . . . . . . . . . 43311.6.1 While Programs for an Arbitrary, Fixed Signature . . . . . . . . . . . . . 43311.6.2 Comparison with Target Syntax . . . . . . . . . . . . . . . . . . . . . . . 439

11.7 A While Programming Language over all Data Types . . . . . . . . . . . . . . . 44011.7.1 A Grammar for While Programs over all Data Types . . . . . . . . . . . 44011.7.2 Comparison with Target Syntax . . . . . . . . . . . . . . . . . . . . . . . 446

CONTENTS vii

12 Chomsky Hierarchy and Regular Languages 44912.1 Chomsky Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

12.1.1 Examples of Equivalent Grammars . . . . . . . . . . . . . . . . . . . . . 45212.1.2 Examples of Grammar Types . . . . . . . . . . . . . . . . . . . . . . . . 453

12.2 Regular Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45712.2.1 Regular Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45712.2.2 Examples of Regular Grammars . . . . . . . . . . . . . . . . . . . . . . . 457

12.3 Languages Generated by Regular Grammars . . . . . . . . . . . . . . . . . . . . 46312.3.1 Examples of Regular Derivations . . . . . . . . . . . . . . . . . . . . . . 46312.3.2 Structure of Regular Derivations . . . . . . . . . . . . . . . . . . . . . . . 465

12.4 The Pumping Lemma for Regular Languages . . . . . . . . . . . . . . . . . . . . 46812.5 Limitations of Regular Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . 472

12.5.1 Applications of the Pumping Lemma for Regular Languages . . . . . . . 472

13 Finite State Automata and Regular Expressions 47913.1 String Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

13.1.1 Rules of the Recognition Process . . . . . . . . . . . . . . . . . . . . . . 48013.1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48013.1.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48413.1.4 Generalising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484

13.2 Nondeterministic Finite State Automata . . . . . . . . . . . . . . . . . . . . . . 48513.2.1 Definition of Finite State Automata . . . . . . . . . . . . . . . . . . . . . 48613.2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48713.2.3 Different Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 488

13.3 Examples of Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49013.3.1 Automata to Recognise Numbers . . . . . . . . . . . . . . . . . . . . . . 49013.3.2 Automata to Recognise Finite Matches . . . . . . . . . . . . . . . . . . . 496

13.4 Automata with Empty Move Transitions . . . . . . . . . . . . . . . . . . . . . . 50013.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50013.4.2 Empty Move Reachable . . . . . . . . . . . . . . . . . . . . . . . . . . . 50213.4.3 Simulating Empty Move Automata . . . . . . . . . . . . . . . . . . . . . 502

13.5 Modular Nondeterministic Finite State Automata . . . . . . . . . . . . . . . . . 50513.5.1 Recognising Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50513.5.2 Program Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507

13.6 Deterministic Finite State Automata . . . . . . . . . . . . . . . . . . . . . . . . 51113.6.1 The Equivalence of Deterministic and Nondeterministic Finite State Automata511

13.7 Regular Grammars and Nondeterministic Finite State Automata . . . . . . . . . 51513.7.1 Regular Grammars to Nondeterministic Finite State Automata . . . . . . 51513.7.2 Nondeterministic Finite State Automata to Regular Grammars . . . . . . 52213.7.3 Modular Regular Grammars and Modular Finite State Automata . . . . 52413.7.4 Pumping Lemma (Nondeterministic Finite State Automata) . . . . . . . 527

13.8 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52813.8.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52813.8.2 Building Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . 530

13.9 Relationship between Regular Expressions and Automata . . . . . . . . . . . . . 53113.9.1 Translating Regular Expressions into Finite State Automata . . . . . . . 531

viii CONTENTS

13.10Translating Finite State Automata into Regular Expressions . . . . . . . . . . . 538

14 Context-Free Grammars and Programming Languages 54514.1 Derivation Trees for Context-Free Grammars . . . . . . . . . . . . . . . . . . . . 546

14.1.1 Examples of Derivation Trees . . . . . . . . . . . . . . . . . . . . . . . . 55014.1.2 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55514.1.3 Leftmost and Rightmost Derivations . . . . . . . . . . . . . . . . . . . . 561

14.2 Normal and Simplified Forms for Context-Free Grammars . . . . . . . . . . . . 56214.2.1 Removing Null Productions . . . . . . . . . . . . . . . . . . . . . . . . . 56314.2.2 Removing Unit Productions . . . . . . . . . . . . . . . . . . . . . . . . . 56914.2.3 Chomsky Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57314.2.4 Greibach Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

14.3 Parsing Algorithms for Context-Free Grammars . . . . . . . . . . . . . . . . . . 58514.3.1 Grammars and Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 58614.3.2 A Recognition Algorithm for Context-Free Grammars . . . . . . . . . . . 58814.3.3 Efficient Parsing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 589

14.4 The Pumping Lemma for Context-Free Languages . . . . . . . . . . . . . . . . . 59014.4.1 The Pumping Lemma for Context-Free Languages . . . . . . . . . . . . . 59014.4.2 Applications of the Pumping Lemma for Context-Free Languages . . . . 592

14.5 Limitations of context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . 59414.5.1 Variable Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59414.5.2 Floyd’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59714.5.3 Sort Declaration Property . . . . . . . . . . . . . . . . . . . . . . . . . . 59814.5.4 The Concurrent Assignment Construct . . . . . . . . . . . . . . . . . . . 599

15 Abstract Syntax and Algebras of Terms 1 60715.1 What is Abstract Syntax? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60915.2 An Algebraic Abstract Syntax for while Programs . . . . . . . . . . . . . . . . . 612

15.2.1 Algebraic Operations for Constructing while programs . . . . . . . . . . 61215.2.2 A Signature for Algebras of while Commands . . . . . . . . . . . . . . . 616

15.3 Representing while commands as terms . . . . . . . . . . . . . . . . . . . . . . . 61815.3.1 Examples of Terms and Trees . . . . . . . . . . . . . . . . . . . . . . . . 61815.3.2 Abstract Syntax as Terms . . . . . . . . . . . . . . . . . . . . . . . . . . 620

15.4 Algebra and Grammar for while Programs . . . . . . . . . . . . . . . . . . . . . 62215.4.1 Algebra and Grammar for while Program Commands . . . . . . . . . . . 62215.4.2 Algebra and Grammar for while Program Expressions . . . . . . . . . . . 62615.4.3 Algebra and Grammar for while Program Tests . . . . . . . . . . . . . . 630

15.5 Context Free Grammars and Terms . . . . . . . . . . . . . . . . . . . . . . . . . 63015.5.1 Outline of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63015.5.2 Construction of a signature from a context free grammar . . . . . . . . . 63115.5.3 Algebra T (ΣG) of language terms . . . . . . . . . . . . . . . . . . . . . . 63215.5.4 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63415.5.5 Algebra of while commands revisited . . . . . . . . . . . . . . . . . . . . 636

15.6 Context Sensitive Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637

CONTENTS ix

III Semantics 643

16 Input-Output Semantics 64716.1 Some Simple Semantic Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 648

16.1.1 Semantics of a Simple Data Type . . . . . . . . . . . . . . . . . . . . . . 64816.1.2 Semantics of a Simple Construct . . . . . . . . . . . . . . . . . . . . . . 64916.1.3 Semantics of a Simple Program . . . . . . . . . . . . . . . . . . . . . . . 651

16.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65316.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

16.3.1 Algebras of Naturals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65616.3.2 Algebra of Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656

16.4 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65716.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65916.4.2 Substitutions in States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659

16.5 Operations and tests on states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65916.5.1 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66016.5.2 Example Expression Evaluation . . . . . . . . . . . . . . . . . . . . . . . 66016.5.3 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

16.6 Statements and Commands: First Definition . . . . . . . . . . . . . . . . . . . . 66216.6.1 First Definition of Input-Output Semantics . . . . . . . . . . . . . . . . . 66216.6.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66416.6.3 Non-Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665

16.7 Statements and Commands: Second Definition using Recursion . . . . . . . . . . 66716.8 Adding Data Types to Programming Languages . . . . . . . . . . . . . . . . . . 668

16.8.1 Adding Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 66916.8.2 Adding Infinite Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . 669

17 Proving Properties of Programs 67117.1 Principles of Structural Induction for Programming Language Syntax . . . . . . 672

17.1.1 Principle of Induction for Expressions . . . . . . . . . . . . . . . . . . . . 67317.1.2 Principle of Structural Induction for Boolean Expressions . . . . . . . . . 67317.1.3 Principle of Structural Induction for Statements . . . . . . . . . . . . . . 67417.1.4 Proving the Principles of Structural Induction . . . . . . . . . . . . . . . 675

17.2 Reasoning about Side Effects Using Structural Induction . . . . . . . . . . . . . 67617.3 Local Computation Theorem and Functions that Cannot be Programmed . . . . 678

17.3.1 Local Computation and Expressions . . . . . . . . . . . . . . . . . . . . 67817.3.2 Local Computation and while Programs . . . . . . . . . . . . . . . . . . 68017.3.3 Local Computation and Functions . . . . . . . . . . . . . . . . . . . . . . 683

17.4 Invariance of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68317.4.1 Example of the Natural Numbers . . . . . . . . . . . . . . . . . . . . . . 68417.4.2 Isomorphic State Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 68617.4.3 Isomorphism Invariance Theorem . . . . . . . . . . . . . . . . . . . . . . 68817.4.4 Isomorphism Invariance and Program Equivalence . . . . . . . . . . . . . 69317.4.5 Review of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695

17.5 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69517.5.1 Performance of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695

x CONTENTS

17.5.2 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69617.5.3 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69617.5.4 Performance of Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 697

18 Operational Semantics 70118.1 Execution Trace Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702

18.1.1 Execution Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70318.1.2 Operational Semantics of while programs . . . . . . . . . . . . . . . . . . 705

18.2 Structural Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 71118.2.1 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71118.2.2 Structural Operational Semantics of while Programs . . . . . . . . . . . 712

18.3 Algebraic Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 71618.3.1 Producing Execution Traces . . . . . . . . . . . . . . . . . . . . . . . . . 71718.3.2 Deconstructing Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71918.3.3 Algebraic Operational Semantics of while programs . . . . . . . . . . . . 719

18.4 Comparison of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72118.5 Program Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722

19 Virtual Machines 72519.1 Machine Semantics and Operational Semantics . . . . . . . . . . . . . . . . . . . 725

19.1.1 Generating Machine State Execution Traces . . . . . . . . . . . . . . . . 72619.2 The Register Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727

19.2.1 Informal Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72719.2.2 Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72819.2.3 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72919.2.4 Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73019.2.5 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73119.2.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736

19.3 Constructing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73819.3.1 Operations on Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 73819.3.2 Building Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740

19.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740

20 Compiler Correctness 74520.1 What is Compilation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746

20.1.1 Input-Output Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 74720.1.2 Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749

20.2 Structuring the Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75020.2.1 Defining a Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75020.2.2 Algebraic Model of Compilation . . . . . . . . . . . . . . . . . . . . . . . 75120.2.3 Structuring the States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752

20.3 Proof Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75220.3.1 Other Formulations of Correctness . . . . . . . . . . . . . . . . . . . . . 75220.3.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75320.3.3 Recursion and Structural Induction . . . . . . . . . . . . . . . . . . . . . 753

20.4 Comparing while Statements and Register Machine Programs . . . . . . . . . . 754

CONTENTS xi

20.5 Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75520.5.1 Data Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75520.5.2 Memory Allocation and Structuring the Register Machine States . . . . . 75620.5.3 State Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

21 Compiler Verification 76321.1 Comparing while Statements and Register Machine Programs . . . . . . . . . . 76321.2 Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764

21.2.1 Data Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76421.2.2 Memory Allocation and Structuring the Register Machine States . . . . . 76521.2.3 State Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768

21.3 Defining the Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76921.3.1 Compiling the Identity Statement . . . . . . . . . . . . . . . . . . . . . . 77021.3.2 Compiling Assignment Statements . . . . . . . . . . . . . . . . . . . . . 77021.3.3 Compiling Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77121.3.4 Compiling Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77221.3.5 Compiling Iterative Statements . . . . . . . . . . . . . . . . . . . . . . . 77421.3.6 Summary of the Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . 77621.3.7 Constructing Compiled Programs . . . . . . . . . . . . . . . . . . . . . . 77721.3.8 Compiled Statements Behaviour . . . . . . . . . . . . . . . . . . . . . . . 777

21.4 Correctness of Compilation: Statements . . . . . . . . . . . . . . . . . . . . . . 77921.4.1 Requirements for Compiled Expressions . . . . . . . . . . . . . . . . . . . 77921.4.2 Requirements for Compiled Boolean Expressions . . . . . . . . . . . . . . 78021.4.3 Correctness of Statements . . . . . . . . . . . . . . . . . . . . . . . . . . 780

22 Further reading 79322.1 On the study of programming languages . . . . . . . . . . . . . . . . . . . . . . 79322.2 The theory of programming languages . . . . . . . . . . . . . . . . . . . . . . . 794

22.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79522.2.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79522.2.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79622.2.4 Specification and correctness . . . . . . . . . . . . . . . . . . . . . . . . . 79622.2.5 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796

22.3 Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79722.3.1 Abstract Data Types, Equational Specifications and Term Rewriting . . 79722.3.2 Fixed Points and Domain Theory . . . . . . . . . . . . . . . . . . . . . . 79722.3.3 λ-Calculus and Type Theory . . . . . . . . . . . . . . . . . . . . . . . . . 79722.3.4 Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79722.3.5 Computability theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797

Bibliography 798

xii CONTENTS

List of Figures

1 Structure of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

1.1 General form of a while program . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2 The keywordwhile language as a kernel language . . . . . . . . . . . . . . . . . 25

2.1 A program trace for solving simultaneous equations on the Analytical Engine . 332.2 Some stages in the development of electronic universal machines . . . . . . . . . 35

3.1 A right-angled triangle with hypotenuse of length√

2 . . . . . . . . . . . . . . . 70

4.1 A is a subalgebra of B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.2 An impression of the idea of importing . . . . . . . . . . . . . . . . . . . . . . . 1154.3 Architecture of the algebra AReals with Integer Rounding . . . . . . . . . . . . . . . . 121

5.1 One interface, many implementations. One signature, many algebras. . . . . . . 1295.2 A subclass K of the Σ -algebras satisfying properties in T . . . . . . . . . . . . . 1305.3 Classes of integer implementations. . . . . . . . . . . . . . . . . . . . . . . . . . 1325.4 Cyclic arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.5 Integral domain specification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.1 A typical interactive system computing by transforming streams . . . . . . . . 1586.2 A dynamic 1-dimensional array . . . . . . . . . . . . . . . . . . . . . . . . . . . 1646.3 Model of a finite array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1666.4 Examples of models of discrete and continuous time . . . . . . . . . . . . . . . 1736.5 Time modelled as cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746.6 System with finite determinacy principle . . . . . . . . . . . . . . . . . . . . . . 1776.7 Spaces and their coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1856.8 The effect of spatial object operations . . . . . . . . . . . . . . . . . . . . . . . 1936.9 Operations on heart data visualising fibres . . . . . . . . . . . . . . . . . . . . . 1966.10 Tree structure of a CVG term and the scene defined by it . . . . . . . . . . . . 1976.11 Requirement for coordinate systems to be equivalent . . . . . . . . . . . . . . . 202

7.1 Relating decimal to binary representations of the naturals . . . . . . . . . . . . 2107.2 Relating binary to decimal representations of the naturals . . . . . . . . . . . . 2117.3 Commutative diagram illustrating the Operation Equation . . . . . . . . . . . . 2127.4 Preservation of operations under a homomorphism . . . . . . . . . . . . . . . . 2147.5 Preservation of relations under a Boolean-preserving homomorphism . . . . . . 2147.6 Function f continuous on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2177.7 Function f discontinuous at x = . . . ,−1 , 0 , 1 , 2 , . . . . . . . . . . . . . . . . . . . 217

xiii

xiv LIST OF FIGURES

7.8 Machine S1 simulates machine S2 . . . . . . . . . . . . . . . . . . . . . . . . . . 2217.9 A function and its inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2247.10 Isomorphisms preserve operations . . . . . . . . . . . . . . . . . . . . . . . . . . 2257.11 The class of all ΣNaturals-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 2357.12 Tracking functions simulating the operations of A . . . . . . . . . . . . . . . . . 2377.13 The image im(φ) of a homomorphism φ . . . . . . . . . . . . . . . . . . . . . . 2407.14 First Homomorphism Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

8.1 Examples of basic formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2548.2 Tree representation of some terms of the algebra T (ΣNaturals1 , ∅) . . . . . . . . . 2728.3 Tree representation of some terms of the algebra T (ΣNaturals1 ,X ) . . . . . . . . 2728.4 Tree representation of some terms of the algebra T (ΣNat+Tests ,X ) . . . . . . . . 2728.5 The tree Tr(f (t1 , . . . , tn)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2738.6 The many-sorted tree Tr s(f (t1 , . . . , tn)) . . . . . . . . . . . . . . . . . . . . . . 274

9.1 Dedekind’s continuity axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

10.1 The upper-case alphabet of Attic Greek of the Classical Period. . . . . . . . . . 36210.2 Designing a language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36710.3 Possible derivations from the grammar G0∗1∗

. . . . . . . . . . . . . . . . . . . 36910.4 Possible derivations from the grammar Ganbn

. . . . . . . . . . . . . . . . . . . 37010.5 Possible derivations from the grammar Ga2n

. . . . . . . . . . . . . . . . . . . . 37010.6 Specifying a language using a grammar . . . . . . . . . . . . . . . . . . . . . . 37310.7 Component grammars used to construct the grammar Gwhile . . . . . . . . . . . . 37710.8 Structure of addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38210.9 Structure of http addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

11.1 Architecture of grammar for signatures without imports . . . . . . . . . . . . . 40211.2 Architecture of signatures with imports . . . . . . . . . . . . . . . . . . . . . . 40511.3 Signatures and modular signatures . . . . . . . . . . . . . . . . . . . . . . . . . 40711.4 Dependency graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41011.5 Modular grammar for first order formulae specifications . . . . . . . . . . . . . 41511.6 Modular grammar for equational logic specifications . . . . . . . . . . . . . . . 41911.7 General structure of Euclid’s Algorithm . . . . . . . . . . . . . . . . . . . . . . 42211.8 Architecture of while programs over the natural numbers . . . . . . . . . . . . 42311.9 Architecture of while programs over a fixed signature . . . . . . . . . . . . . . 43511.10Architecture of while programs over any data type . . . . . . . . . . . . . . . . 441

12.1 The Chomsky hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45112.2 Equivalent grammars to generate the language La2n

= a i | i is even . . . . . . 45412.3 Derivation of aabbcc in the context-sensitive grammar G anbncn

. . . . . . . . . . 45612.4 Hierarchy of languages L(Gan

), L(Ganbn

) and L(Ganbncn

) . . . . . . . . . . . . 45612.5 Derivation tree Tr for the string z = uvw . . . . . . . . . . . . . . . . . . . . . 471

13.1 Recognising the letter ‘s’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48113.2 Recognising the letter ‘t’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48113.3 Recognising the string ‘st’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48113.4 Recognising and distinguishing between the strings ‘sta’ and ‘sto’ . . . . . . . . 482

LIST OF FIGURES xv

13.5 Recognising and distinguishing between the strings ‘star’ and ‘stop’ . . . . . . . 48313.6 Recognising and distinguishing between the strings ‘start’ and ‘stop’ . . . . . . 48313.7 Recognising an arbitrary string a1a2 · · · an . . . . . . . . . . . . . . . . . . . . . 48413.8 Automaton for ‘start’ and ‘stop’ following the general technique . . . . . . . . . 48413.9 A typical finite state automaton . . . . . . . . . . . . . . . . . . . . . . . . . . 48613.10Finite state automaton that accepts the strings ‘start’ and ‘stop’ . . . . . . . . 48813.11Automaton to recognise the numbers 1 to 3 . . . . . . . . . . . . . . . . . . . . 49013.12Automaton to recognise the numbers 1 to 3 with a single final state . . . . . . 49113.13Automaton to recognise the numbers 1 to 3 with merged transitions . . . . . . 49113.14Automaton to recognise the numbers 0 , 1 , . . . , 999 . . . . . . . . . . . . . . . . 49213.15Automaton M D∗

to recognise strings of digits of length zero or more . . . . . . 49313.16Automaton M D+

to recognise strings of digits . . . . . . . . . . . . . . . . . . . 49413.17Automaton M N+ recognises non-zero natural numbers . . . . . . . . . . . . . . 49513.18Automaton M N to recognise numbers . . . . . . . . . . . . . . . . . . . . . . . 49513.19Automaton to recognise strings ab, aabb, aaabbb and aaaabbbb . . . . . . . . . 49713.20Automaton recognises infinite strings but without matching . . . . . . . . . . . 49813.21Automaton recognises infinite strings with matching but in disturbed order . . 49913.22Automaton with empty moves to recognise the language a ibj | i , j ≥ 0 . . . . 50113.23Automaton without empty moves to recognise the language a ibj | i , j ≥ 0 . . 50313.24Automaton M Z to recognise integers by re-using the automata M N and M N+ . 50613.25Flattened automaton M Flattened Z to recognise integers . . . . . . . . . . . . . . 50713.26Modular automaton to recognise program identifiers . . . . . . . . . . . . . . . 50913.27Flattened automaton M Flattened Identifier to recognise program identifiers . . . . . 51013.28Deterministic automaton to recognise the language a ibj | i , j ≥ 0 . . . . . . . 51313.29Translation of rules of the form A→ a . . . . . . . . . . . . . . . . . . . . . . . 51713.30Translation of rules of the form A→ Ba or A→ aB . . . . . . . . . . . . . . . 51713.31Simulation automaton for GRegular Number . . . . . . . . . . . . . . . . . . . . . . 52113.32Path chosen through the automaton to recognise the string z = uvw ∈ L . . . . 52713.33Empty set finite state automata . . . . . . . . . . . . . . . . . . . . . . . . . . 53213.34Empty string finite state automata . . . . . . . . . . . . . . . . . . . . . . . . . 53213.35Single terminal symbol finite state automata . . . . . . . . . . . . . . . . . . . 53313.36Concatenation operation on finite state automata . . . . . . . . . . . . . . . . . 53413.37Union operation on finite state automata . . . . . . . . . . . . . . . . . . . . . 53513.38Iteration operation on finite state automata . . . . . . . . . . . . . . . . . . . . 53613.39Automaton to recognise the language a ibj | i , j ≥ 0 . . . . . . . . . . . . . . 53913.40Regular expressions associated with paths through the automaton . . . . . . . . 540

14.1 Tree for the derivation of the string aaabbb from the grammar G ab . . . . . . . 54614.2 Representation of the application of a production rule A→ X1X2 · · ·Xn . . . . 54714.3 Derivation sub-tree for production rule A→ a1 . . . an . . . . . . . . . . . . . . . 54714.4 Derivation sub-tree for production rule A→ u0A1u1 · · · un−1Anun . . . . . . . . 54814.5 Derivation trees for Examples 10.2.2(1) . . . . . . . . . . . . . . . . . . . . . . 55014.6 Derivation trees for Examples 10.2.2(2) . . . . . . . . . . . . . . . . . . . . . . 55114.7 Derivation trees for Examples 10.2.2(3) . . . . . . . . . . . . . . . . . . . . . . 55214.8 Derivation trees for generating Boolean expressions . . . . . . . . . . . . . . . . 55214.9 Derivation tree for the program statement x :=y; y :=r; r :=x mod y . . . . . . . 554

xvi LIST OF FIGURES

14.10Two derivation trees for the string ab . . . . . . . . . . . . . . . . . . . . . . . 55614.11Derivation trees produced by the ambiguous BNF DanglingElse . . . . . . . . . 55714.12Derivation tree produced by the unambiguous BNF MatchElseToClosestUnmatchedIf 56014.13Derivation trees for the string true and false and not true from original grammar and its CNF variant 57914.14Derivation trees for the string true and false and not true from original grammar and its GNF variant 58514.15Derivation tree Tr for the string z = uvwxy . . . . . . . . . . . . . . . . . . . . 59214.16Architecture of while programs over the natural numbers with declarations . . 595

15.1 Parsing abstract syntax and unparsing concrete syntax . . . . . . . . . . . . . . 61015.2 Possible relationships between abstract syntax, concrete syntax and semantics . 61015.3 Abstract syntax tree for the program x :=y; y :=r; r :=x mod y . . . . . . . . . 61115.4 The algebra WP(Σ ) of while commands with definitions of operations . . . . . 61715.5 The tree representation of the term x2 :=3; skip . . . . . . . . . . . . . . . . . 61915.6 The tree representation of the term while not x0 = 2 do x0 :=x0 + 1 od . . . . 61915.7 Pretty-printing from abstract syntax to concrete syntax . . . . . . . . . . . . . 62015.8 Derivations and algebraic trees and terms of sample phrases . . . . . . . . . . . 635

16.1 Extending the kernel while language . . . . . . . . . . . . . . . . . . . . . . . . 64916.2 The store modelled by state σ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658

18.1 Execution trace for Euclid’s algorithm . . . . . . . . . . . . . . . . . . . . . . . 71118.2 Derivation trace for Euclid’s algorithm using structural operational semantics . 71618.3 Semantic kernel of a language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71718.4 Semantic kernel of while programs . . . . . . . . . . . . . . . . . . . . . . . . . 719

19.1 Form of register machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72719.2 Data registers ρD ∈ RegD of the VR . . . . . . . . . . . . . . . . . . . . . . . . 72919.3 Test registers ρB ∈ RegB of the VRM . . . . . . . . . . . . . . . . . . . . . . . . 73019.4 Virtual machine instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731

20.1 The correctness of a compiler for a single program P ∈ ProgS . . . . . . . . . . 74820.2 Compiler correctness for input-output semantics . . . . . . . . . . . . . . . . . 74920.3 Compiler correctness for operational semantics . . . . . . . . . . . . . . . . . . 750

21.1 Execution of compiled sequenced statements . . . . . . . . . . . . . . . . . . . . 77221.2 Execution of compiled conditional statements . . . . . . . . . . . . . . . . . . . 77321.3 Execution of compiled iterative statements . . . . . . . . . . . . . . . . . . . . . 775

List of Tables

14.1 Machine characterisations of languages . . . . . . . . . . . . . . . . . . . . . . . 587

16.1 An example of real -sorted and Bool -sorted states. . . . . . . . . . . . . . . . . 65916.2 New state after substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660

xvii

xviii LIST OF TABLES

PREFACE xix

Preface

Data, syntax and semantics are among the Big Ideas of Computer Science. The conceptsare extremely general and can be found throughout Computer Science and its applications.Wherever there are languages for specifying, designing, programming or reasoning, one findsdata, syntax and semantics.

A programming language is simply a notation for expressing algorithms and performingcomputations with the help of machines. There are many different designs for programminglanguages, tailored to the computational needs of many different types of users. Programminglanguages are a primary object of study in Computer Science, influencing most of the subjectand its applications.

This book is an introduction to the mathematical theory of programming languages. It isintended to provide a first course, one that is suitable for all university students of ComputerScience to take early in their education; for example, at the beginning of their second year,or, possibly, in the second half of their first year. The background knowledge needed is a firstcourse in imperative programming and in elementary set theory and logic. The theory willhelp develop their scientific maturity by asking simple and sometimes deep questions, and byweaning them off examples and giving them a taste for general ideas, principles and techniques,precisely expressed. We have picked a small number of topics, and attempted to make the bookself-contained and relevant.

The book contains much basic mathematical material on data, syntax and semantics. Thereare some seemingly advanced features and contemporary topics that may not be common in theelementary text-book literature: data types and their algebraic theory, real numbers, interfacedefinition languages, algebraic models of abstract syntax, use of algebraic operational semantics,connections with computability theory, virtual machines and compiler correctness. Where ourmaterial is standard (e.g., grammars), we have tried to include new and interesting examplesand case studies (e.g., internet addressing).

The book is also intended to provide a strong foundation for the further study of thetheory of programming languages, and related subjects in algebra and logic, such as: algebraicspecification; initial algebra semantics; term rewriting; process algebra; computability anddefinability theory; program correctness logic; λ-calculus and type theory; domains and fixedpoint theory etc. There are a number of books available for these later stages, and the literatureis discussed in a final chapter on further reading.

The book is based on the lectures of J V Tucker (JVT) to undergraduates at the Universityof Leeds and primarily at the University of Wales Swansea. In particular, it has developedfrom the notes for a compulsory second year course on the theory of programming languages,established at Swansea in 1989. These notes began their journey from ring binder to book-shopin 1993, when Chris Tofts taught the course in JVT’s stead, and provided the students witha typescript of the lecture notes. Subsequently, as JVT continued to teach the course, KarenStephenson (KS) maintained and improved the notes, and assisted greatly in the seeminglyendless process of revision and improvement. She also contributed topics and conducted anumber of successful experiments on algebraic methods with the students. KS became a co-author of the book in 2000. Together, we revised radically the text and added new chapters onregular languages, virtual machines and compiler correctness.

JVT’s interest and views on the theory of programming languages owe much to J W de Bakker,J A Bergstra and J I Zucker. Ideas for this book have been influenced by continuous conversa-

xx PREFACE

tions and collaborations starting at the Mathematical Centre (now CWI), Amsterdam in 1979.However, its final contents and shape has been determined by our own discussions and our workwith several generations of undergraduate students.

We would like to thank the following colleagues at Swansea: Dafydd Rees, Chen Min, JensBlanck, Andy Gimblett, Neal Harman, Chris Whyley, Oliver Kullmann, Peter Mosses, . . . fortheir reading of selected chapters, suggestions and advice. A special debt is owed to MarkusRoggenbach for a reflective and thorough reading of the text. We are grateful to the followingstudents who formed a reading group and gave useful criticism of earlier drafts of the text: CarlGilbert, Kevin Hicks, Rachel Holbeche, Tim Hutchison, Richard Knuszka, Paul Marden, IvanPhillips and Stephan Reiff. We are grateful to Hasib Kabir Chowdhury who, as a student, reada later version of the manuscript and made many suggestions.

Our colleagues and students have made Swansea a warm and inspiring environment in whichto educate young people in Computer Science.

The ambiguities and errors that remain in the book are solely our responsibility.

J V TuckerPerriswood, Gower, 2003

K StephensonMalvern, 2003

A GUIDE FOR THE READER xxi

A Guide for the ReaderThe educational objectives of the book are as follows:

Educational Objectives

1. To study some theoretical concepts and results concerning abstract data types; pro-gramming language syntax; and programming language semantics.

2. To develop mathematical knowledge and skills in mathematically modelling computingsystems.

3. To introduce some of the intellectual history, technical development, organisation andstyle of the scientific study of programming languages.

At the heart of the book is the idea that the theory presented is intended to answer somesimple scientific questions about programming languages. A long list of scientific questions isgiven in Section 1.2. To answer these questions we must mathematically model data, syntaxand semantics, and analyse the models in some mathematical depth. In addition to meeting theobjectives above, we hope that our book will help the students learn early in their intellectualdevelopment the following:

Intellectual Experiences

1. That the mathematical theory gives interesting and definitive answers to essential ques-tions about programming.

2. That the theory improves the capacity for practical problem solving.

3. That the answers to the scientific questions posed involve a wide range of technical ma-terial, that was invented and developed by many able people over most of the twentiethcentury, and has its roots in the nineteenth century.

4. That the theory contains technical ideas that may be expected to be useful for decadesif not centuries.

5. That intellectual curiosity and the ability to satisfy it are ends in themselves.

It will be helpful to summarise the structure and contents of the book, part by part, to easeits use by the reader, whether student or teacher.

Subjects

Data, syntax and semantics are fundamental and ubiquitous in Computer Science. Thisbook introduces these three subjects through the theoretical study of programming languages.The subjects are intimately connected, of course, and our aim is to integrate their study throughthe course of the whole book. However, we have found it natural and practical to divide thetext into three rather obvious parts, namely on data, syntax and semantics. We begin with ageneral overview of the whole subject, and a short history of the development of imperativeprogramming languages.

xxii A GUIDE FOR THE READER

Part I is on data. It is an introduction to abstract data types. It is based on the algebraictheory of data and complements their widespread use in practice. We cover many examples ofbasic data types, and model the interfaces and implementations of arbitrary data types, andtheir axiomatic specifications. We focus on the data types of the natural numbers and realnumbers, data in time and space, and terms.

Part II is on syntax. It introduces the problem of designing and specifying syntax usingformal languages and grammars, and it develops a little of the theory of regular and contextfree grammars. Again, there are plenty of examples to motivate and illustrate the languagedefinition methods and their mathematical theory. Our three main case studies are simple butinfluential: addresses, interface definition languages and imperative programming languages.We round off the topic by applying data type theory to abstract syntax and the definition oflanguages.

Part III is on semantics. It introduces some methods for defining the semantics of imper-ative programs using states and their transformation. It also deals with proving propertiesof programs using structural induction. We conclude with a study of compiler correctness.We define a simple virtual machine and compiler from an imperative language into its virtualassembler and prove by structural induction that it is correct.

In the book explanations are detailed and examples are abundant. There are plenty ofexercises, both easy and hard, including some longer assignments.

The final chapter of each part contains advanced undergraduate material that brings to-gether ideas and results from earlier chapters. Occasionally, sections in other chapters will alsocontain such advanced material. We will mark chapters and sections with advanced materialusing *.

Options

Rarely are the chapters of a scientific textbook read in order. Lecturers and students havetheir own ideas and needs that lead them to re-order, select or plunder the finely gaugedwritings of authors. This textbook presents a coherent and systematic account of the theory ofprogramming languages, but it is flexible and can be used in several ways. We certainly hopethe book will be plundered for its treasures.

A selection of materials to support courses based on the book can be found on the web site:

http://www...

The structure of the book is depicted in Figure 1, which describes the dependency of one chapterupon another. The chapters may be read in a number of different orders. For example, all ofPart I can be read independently of Part II, but Chapter 16 of Part II depends on Chapters 3–6of Part I.

The complete course can be given by reading all the chapters in order. As noted, it is easyto swap the order of Part I and Part II.

A concise course that covers the mathematical modelling of an imperative language witharbitrary data types, but without the advanced topics, can be based on this selection of chap-ters:

Data Chapters 1, 3–7

A GUIDE FOR THE READER xxiii

10

11

grammars

specifcationsdata

structures5

7

terms 8

interfaces 4

algebras

homomorphisms

18

19

20

virtualmachines

compilercorrectness

operationalsemantics

17

16 input-outputsemantics

structuralinduction

I: Data

historyintroduction

III: SemanticsII: Syntax

languages12 regular

languagesfree

context14

Advanced

Core

Welcome

real numbers 21 compilerverification

22furtherreading

15syntax

abstract

2

3

1

examples

6

finite stateautomata

13

9

Figure 1: Structure of the book.

Syntax Chapters 10–14

Semantics Chapters 16 and 17

Shorter and easier courses are also practical:

Data Chapters 1, 3–6

Syntax Chapter 10 and 11

Semantics Chapters 16 and 18

A set of lectures on data types can be based on:

Data Chapters 3–9

Syntax Chapter 15

Semantics Chapter 16 and 17

A set of lectures on syntax can be based on:

Data Chapters 3–4

xxiv A GUIDE FOR THE READER

Syntax Chapters 10–15

A set of lectures on hierarchical structure and correct compilation can be based on:

Syntax Chapter 10 and 11.

Semantics Chapter 16–17 and 19–21.

Prerequisites

Throughout, we assume that readers have a sound knowledge and experience of

(i) programming with an imperative language;

(ii) sets, functions and relations;

(iii) number systems and induction;

(iv) propositional and predicate logic; and

(v) axioms and deduction.

However, our chapters provide plenty of reminders and opportunities for revising these essentialsubjects. Students who take the trouble to revise these topics thoroughly, here at the start,or even as they are needed in the book, will progress smoothly and speedily. Occasionally, forcertain topics, we will use or mention some more advanced concepts and results from abstractalgebra, computability theory and logic. Hopefully, these ideas will be clear and only add tothe richness of the subject.

Notation

We will follow the discipline of mathematical writing, adopting its standard notations, formsof expression and practices. It should be an objective for the student of Computer Science tomaster the elements of mathematical notation and style.

Notation in Computer Science is very important and always complicated by the need forconcepts to have three forms of notation:

Mathematical Notation is designed to facilitate mathematical expression and analysis of generalideas and small illustrative examples;

Descriptive Notation is designed to facilitate reading and comprehension of large examples andcase studies; and

Machine Notation is designed to facilitate processing by machine.

Several of our key concepts (algebras, grammars, programs, etc.) will be equipped with twostandard notations aimed at the Mathematical Notation and Descriptive Notation.

A GUIDE FOR THE READER xxv

Exercises and Assignments

At the end of each chapter there is a set of problems and assignments designed to improve andtest the student’s understanding and technique. Some have been suggested by students. Theproblems illustrate, complete, explore, or generalise the concepts, examples and results of thechapter.

Revision

We end this Guide with an assignment that invites students to revise the prerequisites and witha piece of advice: do this assignment, and do it well.

Prepare a concise list of concepts, notations and results that you believe are the basics of thefive topics mentioned in the Prerequisites. Add to this list as you study the following chapters.

Here is a start.

Programming

variableexpressionassignmentsequencingconditional branchingiterationprocedurearray, list, queue, stack...

Sets

∅ A 6= ∅x ∈ A x 6∈ AA ⊆ B A 6⊆ BA ∪ B A ∩ BA− B P(A)|A|

...

Functions

f : A→ Bg f : A→ C where f : A→ B and g : B → C .f is total.f is partial.f is injective, or one-to-one.f is surjective, or onto.f is bijective, or a one-to-one correspondence.

...

xxvi A GUIDE FOR THE READER

Relations

R ⊂ A× B xRy≡ equivalence relation [a] equivalence class≤ partial order≤ total order

...

Number Systems

N natural numbersZ integersQ rational numbersR real numbers

Logic

p ∧ qp ∨ qp ⇒ q¬p

∀x : R(x )∃x : R(x )

Axiomatic Theory

definitionaxiompostulatelawlemmatheoremcorollaryproofdeductioncounter-exampleconjecture...

Welcome

1

Chapter 1

Introduction

Data, syntax and semantics are three fundamental ideas in Computer Science. They are BigIdeas. That is: they are ideas that are general and widely applicable; they are deep and are madeprecise in different ways; they lead to beautiful and useful theories. The ideas are present in anycomputing application. As its contributions to Science and human affairs mature, ComputerScience will influence profoundly many areas of thinking, making and doing. Thus data, syntaxand semantics are three ideas which are fundamental in Science and other intellectual fields.

Our introduction to these concepts of data, syntax and semantics is through the task ofmodelling and analysing programming languages. Programming languages abound in Com-puter Science and range from large general purpose languages to the small input languages ofapplication packages. Some languages are well known and widely used, some belong to commu-nities of specialists, some are purely experimental and do not have a user group. The numberof programming languages cannot be counted!

In learning a programming language the aim is to read, write and execute programs. Aprogrammer should understand the organisation and the operation of the program, and be ableto predict and test its input-output behaviour, to know, for example, if the program computesa specified function. After gaining fluency in one or more programming languages it is bothnatural and necessary to reflect on the components that make up the languages and enquireabout their individual roles and properties. A multitude of questions can be formulated aboutthe capabilities and shortcomings of a language and its relation with other languages. Forexample, How is this programming language specified? and Is this language more powerfulthan that? To awaken the reader’s curiosity, we will ask many questions like these shortly, inSection 1.2. To understand something of the nature and essential elements of programminglanguages, we will look at their overall structure and separate components rather abstractlyand analytically, and answer some of these questions.

Now, this analysis, comparison and classification of language components requires a largescale scientific investigation. The investigation is organised as the study of three aspects ofprograms:

Data the information to be transformed by the program

Syntax the text of the program

Semantics the behaviour of the program

To create a theory of programming languages we need to discover fundamental concepts, meth-

1

2 CHAPTER 1. INTRODUCTION

ods and results that can reveal the essential structure of programming languages, and cananswer our questions.

In this first chapter we will simply prepare our minds for theoretical investigations. Wewill explore the scientific view of programming languages (Section 1.1) raise plenty of questionsabout programming languages for which we need answers (Section 1.2), and look at some rawmaterial for modelling programming languages (Section 1.3).

We will focus on the theory of programming in the small and have only occasional encoun-ters with programming in the large. Measured against the state of contemporary languagetechnology, our scientific goal seems modest indeed:

To understand the general principles, structure and operation of imperative pro-grams that compute functions on any kind of data.

Yet attaining this goal will bring rich rewards.

1.1 Science and the aims of modelling programming lan-

guages

Computing systems are artificial. They are designed and developed by and for human use. Theyare superseded and are discarded, sometimes with uncomfortable speed. Some are publicallymanaged and some are commercial products. However, the scientific study of computing sys-tems is remarkably similar to the scientific study of physical systems which are God given andtimeless. Roughly speaking, scientific studies have theoretical and practical objectives, pursueanswers to definite questions, and require mathematical models and experimental methods.

Theoretical Computer Science develops mathematical models and theories for the designand analysis of

• data;

• specifications of computational problems with data;

• algorithms for transforming data;

• programs and programming languages for expressing algorithms;

• systems and machine architectures for implementing programming languages.

Simply put, the subject is aimed towards the discovery and application of fundamental princi-ples, models, methods and results, which are intended

• to help understand the nature, scope and limits of computing systems;

• to help software and hardware engineers make computing systems.

Fundamentally, Theoretical Computer Science is speculative. It thrives on curiosity and thefreedom to imagine, idealise and simplify. Theoretical Computer Science creates ideas for newdata types and applications, new ways of specifying, programming and reasoning, and new

1.2. SOME SCIENTIFIC QUESTIONS ABOUT PROGRAMMING LANGUAGES 3

designs for machines and devices. For example, over the past decades, theorists have beenthinking about new ways of doing exact calculations with real numbers, the integration ofsimulated and experimental data in virtual worlds, languages for specifying and programmingdistributed and mobile devices, ways of modelling and proving concurrent systems correct,asynchronous designs for chips, and quantum and biological computers. In time, some theoriesbecome the foundations of mature technologies, whilst others stubbornly resist practical orcommercial exploitation. Theoretical Computer Science also has many problems that retainimportance long after the technology is established. The theory of programming languages isladen with such legacy problems: the theories of types, concurrency, objects, agents are rich indifficulties and mysteries, both conceptual and mathematical.

Theories of programming languages and program construction are a fundamental area ofTheoretical Computer Science. There are many programming constructs and program devel-opment techniques and tools, all of which are the fruits of, or require, theoretical investigation.In our time, it is believed widely that the development of theories is necessary for the practicalcraft of program construction to become a mathematically well-founded engineering science.Independently of such a goal, we believe that the development of theories is necessary to satisfyour curiosity and to understand what is being done, what could be done, and how to do better.

The scientific approach of the theory of programming languages places three intellectualrequirements on the reader:

Intellectual Aims

1. To ask simple questions.

2. To make and analyse simple mathematical models.

3. To think deeply and express ideas precisely.

Now we will formulate a number of questions and set the scene for some mathematicaltheories to answer them. Our theories will show how it is possible

• to model mathematically any kind of data;

• to model mathematically the syntax of a programming language;

• to model mathematically the semantics of a programming language.

1.2 Some Scientific Questions about Programming Lan-

guages

Let us begin by posing some simple questions about a programming language. The questionswill make us reflect on our ideas about programming languages and programs. They require us

to think generally and abstractly.

They also require us

to explore our existing knowledge, experience and prejudices.


This is part of the raw material from which theories are made. Most of the questions we poseare insufficiently precise to allow a definite answer. Thus, one of our most important tasks willbe

to make questions precise, using mathematical models, and hence turn them intotechnical problems that can be solved definitively.

By no means all of the questions will be given an answer in this book: readers are invited toreturn to this section from time to time to see which questions they can formulate and answerprecisely.

Let L and L′ be any programming languages, real or imaginary. Try to answer the followingseemingly vague questions.

Data

What is data and where does it come from? How do we choose operations on data? How do wecompare the choice of different sets of operations? How do we know we have enough operationson data? What exactly are data types? What is an interface to a data type? What is animplementation of a data type? How do we specify data types like integers, reals or strings,independently of programming languages? How do we model data types for users’ applications?To what extent is a data type independent of its implementation? How do we compare twoimplementations of a data type? Can any data type be implemented using an imperativelanguage? Are the representations of the natural numbers based upon decimal, binary, octaland Roman notations equivalent? How accurate as approximations are implementations of thereal numbers? What are the effects of approximating infinite sets of data by finite subsets?Are error messages necessary for data types? What are the base data types of L? What datatypes can be implemented in L? Can any data type be implemented in L?

Syntax

What is syntax and why is there so much of it? Is any notation a syntax? How do wechoose and make a syntax using symbols and rules? How do we check for errors in syntax?How do we transform program syntaxes, as in substitution and expansion, compilation andinterpretation? How do we specify and transform texts for pretty printing, internet publicationand slide projection? What are the syntactic categories such identifiers, types, operations,declarations, commands, and procedures, classes? What are scope and binding rules? Whatare the benefits of prefix, infix, postfix, or mixfix notations? How do we define exactly thesyntax of L? How do we specify the set of legal programs of L? Is there an algorithm thatchecks that the syntax of a program of L is correct?

Semantics

What is semantics? Why do we need to define behaviour formally? How do we choose one fromthe many possible semantics for a construct? Is there a “right” semantics for every construct?What is input-output behaviour? What are deterministic and nondeterministic behaviours?Can every partial function be extended to a total function by adding an undefined flag? Howdo we model the behaviour of any program containing while, goto, and recursion? Can one usetests that return a “don’t know” flag? What happens when a program raises an exception? How

1.2. SOME SCIENTIFIC QUESTIONS ABOUT PROGRAMMING LANGUAGES 5

do procedures work? What is encapsulation and information hiding? What is parallel executionand concurrent communication? What are type systems, classes and objects, inheritance andpolymorphism? What is a program library? How do we define exactly the semantics of L?How do we specify the meaning of the data types and commands of L? How do we specifythe operation or dynamic behaviour of programs of L? How do we specify the input-outputbehaviour of programs required by users? What is a program trace? What is the relationshipbetween the number of steps in a computation and its run time? What exactly does it meanfor two programs of L to be equivalent?

Expressiveness and Power

How expressive or powerful is L? Can L implement any desired data type, function or specifi-cation? Which specifications cannot be implemented in L? Is L equally expressive as anotherprogramming language L′? There are four possibilities:

• L and L′ are equivalent;

• L can accomplish all that L′ can and more;

• L′ can accomplish all that L can and more; or

• L can accomplish some tasks that L′ cannot, and L′ can accomplish some tasks that Lcannot.

What exactly does it mean for two languages L and L′ to be equivalent in expressiveness orpower? Which are the most expressive, imperative, object-oriented, functional or logic pro-gramming languages? Are parallel languages more expressive than sequential languages? IsL a universal language, i.e., can it implement any specification that can be implemented insome other programming language? Do universal languages exist? What is the smallest set ofimperative constructs necessary to make a universal language?

Program Properties

What properties of the programs of L can be specified? Are there properties that cannot bespecified? What exactly are static and dynamic properties? What properties of the programsof L can be checked by algorithms? What properties are decidable, or undecidable, when theprogram is being compiled? What properties are decidable, or undecidable, when the programis being executed? What is the relationship between the expressiveness or power of the languageL and its decidable properties? What properties of programs in L can be proved? Given aprogram and an identifier, can we decide whether or not

(i) the identifier appears in the program?

(ii) given an input, the identifier changes its value in the resulting computation?

(iii) the identifier changes its value in some computation?

Given a program and an input, can we decide whether or not the program halts when executedon the input? Given two programs, can we decide whether or not they are equivalent?


Correctness

How do we specify the purpose of a program in L? To what extent can we test whether ornot a program of L meets its specification? How do we prove the correctness of a programwith respect to its specification? Given a specification method, are there programs that arecorrect with respect to a specification but which cannot be proved to be correct? Is therean algorithm which, given a specification and a program, decides whether or not the programmeets its specification?

Compilation

What exactly is compilation? How do we structure a compiler from the syntax of L to thesyntax of L′? What does it mean for a compiler from L to L′ to be correct? Is correctnessdefined using a class of specifications, or by comparing the semantics of L and L′? How do weprove that a compiler from L to L′ is correct?

Efficiency

Are the programs of L efficient for a class of specifications? If L and L′ are equally expressivewill the programs written in them execute equally efficiently? Can L be compiled efficientlyon a von Neumann machine and is the code generated efficient? Are imperative programminglanguages more efficient than logic and functional languages? Are programs only involvingwhile and other so-called structured constructs, less efficient than those allowing gotos? Areparallel languages more efficient than sequential languages?

Attempting to answer the questions should reveal what the reader knows about program-ming languages, before and after studying this book.

1.3 A Simple Imperative Programming Language and

its Extensions

Clearly, the questions of Section 1.2 are a starting point for a major scientific investigation, onethat is ongoing in the Computer Science research community. We will examine only some ofthe questions by analysing in great detail an apparently simple imperative language.

Definition (Imperative Language) An imperative language is based on the idea of placingdata in a store and having commands or instructions that act on the store, creating and testingdata, and moving it around the store.

In summary,

Imperative Program = Data + Store + Algorithm

The language we will use for theoretical analysis is called the

language WP of while programs with user-defined data types.

1.3. A SIMPLE IMPERATIVE PROGRAMMING LANGUAGE AND ITS EXTENSIONS 7

The main features of WP are its very general data types and very limited control constructs.The language WP can compute on any data but the most complicated control construct is thewhile loop. The language can be extended in many ways and is valuable as a kernel language,from which equally expressive but more convenient languages can be made by adding newconstructs.

1.3.1 What is a while Program?

In our language WP , a program is created using

(i) some data equipped with some atomic operations and tests;

(ii) some store containing data; and

(iii) some algorithm that uses these atomic operations and tests on the store.

We combine the data and the given operations, and form a programming construct called a

data type.

The statements of a program invoke operations and tests via an interface for a data type. Aprogram is a text composed of atomic statements that are scheduled by control constructs.The atomic statements are assignments that both compute and store the values of expressionsmade from the given atomic operations on data. The control constructs use tests made fromthe given Boolean-valued operations on data.

Thus, we have an apparently simple idea of a program, namely:

Imperative Program = Data Type + Store + Algorithm on Store

In particular, in the language WP of while programs we carefully separate data from the storeand the algorithm.

This means we can study data independently of the core imperative programming constructswhich involve the store. This is an important act, and is an example of a general idea:

Working Principle Separate concerns, isolate concepts and study them independently.

What do the programs of our language WP look like? Here is an example of a while programexpressing a famous ancient algorithm:


program Euclid

signature Naturals for Euclidean Algorithm

sorts nat,bool

constants 0: →nat;true,false: →bool

operations mod: nat× nat→nat;6= : nat× nat→bool

endsig

body

var x,y,r: nat

begin

read (x,y);r:=x mod y;while r 6= 0 do

x:=y;y:=r;r:=x mod y

od;write (y)

end

Euclid’s algorithm computes the greatest common divisor of two natural numbers. Let usexamine its form and constructs (we consider its computational behaviour later, in Chapter 16).

The form of the program is two parts with three components in total.

Interface The first part is an

interface for a data type

which is called a

signature.

Specifically, it is:

(i) a declaration listing names for the data and names for the operations on data.

Body The second part is an

algorithm that uses the data store

which is called a body. Specifically, the body is:

1.3. A SIMPLE IMPERATIVE PROGRAMMING LANGUAGE AND ITS EXTENSIONS 9

(ii) a declaration of variables and their types which specifies a store or memory for the dataof the data type; and

(iii) a list of commands expressing the algorithm, which uses operations from the data typeinterface (i) and variables from the declarations (ii).

The constructs of the program are of two kinds. First, there are commands that

(a) apply operations on data that compute new data from old, and

(b) store and transfer data.

For example, the assignments:z:=x mod y and x:=y

can combine the computation and storage of data. Secondly, there are commands that

(c) apply operations on data that test data and control the order of execution of commands.

For example,while z 6= 0 do . . . od

The program is a list of instructions that we will call

commands or statements.

The list is built by placing the commands in order using the construct of

sequencing

denoted by ; and usually introducing a new line of text. However, the list has some nesting.Notice that the while construct “contains” a block of three sequenced commands.

Finally, notice that the program obeys the conventions that the algorithm uses operationsand variables that are declared inside the program.

Next we will describe the general form of a while program and the constructs that form thecore of the language.

These observations on the structure of Euclid’s algorithm are merely the beginning of oursearch for general principles, theories and tools for building programming languages. On thebasis of this example, possibly complemented by a few more, is it possible to gain a clear pictureof what our while programs look like and to describe what they do?

On the basis of examples, it is possible to recognise, adapt and create many new examplesof while programs. This is how most people learn and hence “know” a programming language!However, it is not possible on the basis of examples, to even recognise all while programs sinceit is not possible to define exactly what is, and what is not, a while program. It is not possibleto give exact rules, derived from or at least supported by general principles, for the compositionof its programs. It is not possible to write an algorithm that would check the correctness of theprogram syntax (as we find in compilers). Clearly, on the basis of examples, it is not possibleto answer the demanding questions we posed in Section 1.2.

Working Principle Examples must be complemented by general concepts and principles.


1.3.2 Core Constructs

Let us review and formulate in more general terms some of the observations we made in theprevious section.

Program Structure The general form of a while program over a data type is summarisedin Figure 1.1.

<name of data type interface>signature

<list of names fordata called sorts,constants, andoperations>

endsig

<name of program>program

<name of body>body

<list of variables and their sorts>variables

begin

end

<algorithm expressed asa list of commandsusing operations from signature andvariables from variable declaration>

Figure 1.1: General form of a while program.

It has the form

Program = Data Type + while Program Body

We concentrate on the elements of the components.More precisely, the principal constructs of the while language WP are as follows:

Data Any set of data, and any operations and tests on the data, can be defined. Data, andoperations and tests on data together form a

data type.

The interface by which the algorithm of a while program invokes a data type is called a signatureand has the form:

1.3. A SIMPLE IMPERATIVE PROGRAMMING LANGUAGE AND ITS EXTENSIONS11

signature a data type

sorts . . . , s , . . . , bool

constants . . . , c :→ s , . . .

operations . . . , f : s1 × · · · × sn → s , . . .. . . , r : s1 × · · · × sm → bool . . .

endsig

Here sorts are names for the sets of data, constants are names for special data and operationsare names for functions and tests. Note that Booleans are needed for the tests.

Variables and State For an imperative language, there are infinitely many variable namesavailable, e.g.,

x0, x1, x2, . . .

also denoted a, b, c, . . . ,m, n, . . . , x, y, z, . . . etc. when convenient. Normally, only finitely manyvariables are used in a program. The variables name “locations” in which to “store” data. Thevariables give rise to a notion of state of a store, memory and, indeed, computation. A stateis a list of data recording what data is to be found in the locations named by the variables.The algorithm changes these states as its statements change data and locations. Thus, acomputation can be seen as a sequence

σ0 , σ1 , . . . , σt , . . .

of states produced by the algorithm step by step in time.

Atomic Statements An expression e, built from the variables and operators of the datatype, can be evaluated and the result taken to be the new value of any variable x by an

Assignment x :=e.

We will also often include a dummy atomic statement which does nothing:

Null skip

For many purposes, skip is equivalent to trivial assignments such as x :=x .

Control Constructs The order of performing operations is controlled by three constructs:Given programs S0 , S1 , S2 and test b, we can construct the new programs:

Sequencing S1 ;S2

Conditional if b then S1 else S2 fi

Iteration while b do S0 od


Statements A while program’s algorithm is built from atomic statements by the repeatedapplication of control constructs.

The theory of this small set of programming constructs will provide answers to many of thequestions in Section 1.2, and be strong enough to serve as a theoretical foundation for muchlarger languages.

1.4 Where do we find while programs?

The answer is: Almost everywhere! The while language is a small language yet it captures theprimary ideas of imperative programming, namely, that data is placed in a store and a sequenceof commands calculate and transform the data in the store.

Before we put imperative programming under the microscope, using the constructs of thewhile language, let us look at how these constructs are expressed in different ways in differentprogramming languages. We consider procedures that express Euclid’s gcd algorithm, whichwe have already written as a while program.

Our while language is a theoretical language designed to express examples clearly and beanalysed theoretically by people. The texts below are from languages that are designed tocode large applications and be processed by machines. In each case we have given functionprocedures rather than programs.

To an experienced eye the variety is not surprising and the differences are negligible becausethe essential structure of each procedure — i.e., the basic operations and loop — is the same.These ideas are laid bare in our while language, though even in theoretical investigations thereare plenty of variations of notations.

However, in using procedures we do simplify comparisons by omitting other parts of a gcdprogram. For example, we have omitted input and output code, which

(i) may belong the program (e.g., written using read and write constructs in Pascal), or

(ii) may not belong to the program (e.g., imported from library classes in Java), or

(iii) may not even belong to the language (e.g., in pure Algol 60, input and output was notspecified and was in the hands of machine-specific routines, though some dialects offeredread and write).

Functional and logic languages implement computational loops in entirely different ways. Thereis no end to the variety of ways of programming. Thus, it is not always trivial to rewrite asimple gcd program in another language.

To an inexperienced eye the variety may be daunting and the differences in appearance maybe surprising. The potential variety of texts is enormous for there are hundreds of languagesin which to write a gcd procedure.

Imagine a hundred gcd procedures, in a hundred languages, with the same essential structurebut different texts. Do we see them all as the really same? No. For it is a fact that, in practice,some we like and some we do not. There is something personal about languages and we canhave extra difficulties in using languages we do not like.

The syntax of a language is like the clothes worn by a person. Clothes combine a purposeand a style, and emphasise practicality or display. They are something we have chosen, or asocial group has chosen for us. Often clothes matter even when they should not.

1.4. WHERE DO WE FIND WHILE PROGRAMS? 13

Finally, before we leave our luxurious island of theoretical programming languages, andswim in the sea of practical programming languages, to collect our specimens of gcd code, wemust take one or two safety precautions.

First, in our while language we can use any data by simply declaring a signature. The gcdalgorithm works on natural numbers. However, in expressing the gcd algorithm in practicallanguages it is easier to use the “standard” data type of the integers. Thus, the proceduresbelow all compute with integers and have been called IGCD for integer greatest common divisor.In the case of integers x and y , which may be negative, we define

igcd(x , y) = gcd(|x|, |y|),

where

|x| =

x if x > 0 ;

−x if x < 0 .

This is the function on integers we expect to program.Second, basic operations on the integers, like the modulus function, not only have different

notations in the different languages, but can produce different answers. This variation belongsto the topic of data type design, of course.

So much for the “standard” data type of the integers! Thus, let us adopt one true standardconvention of practical programming:

caveat emptor

i.e., let the buyer beware. The reader is invited to check how these fragments compute igcdoutside the natural numbers.

1.4.1 Gallery of Imperative Languages

Some general purpose languages of historical interest are:

Fortran 66, Fortran 77, Fortran 90, Algol 60, ALGOL 68, PL/I, Basic, Pascal,Modula-2, Ada, BCPL, B, C

They are all imperative languages. Here are a few samples of gcd procedures taken from thelist.

Fortran 66

FUNCTION IGCD(X, Y)

INTEGER X, Y, R

10 IF (Y.EQ.0) GOTO 20

R = MOD(X, Y)

X = Y

Y = R

GOTO 10

20 IGCD = X

RETURN

END


Algol 60

BEGIN

INTEGER x, y;

INTEGER PROCEDURE igcd(x, y);

VALUE x, y; INTEGER x, y:

BEGIN

INTEGER r;

loop :

IF y NOTEQUAL 0 THEN

BEGIN

r:= x REM y;

x := y;

y: = r;

GOTO loop

END;

END;

igcd := x

END igcd;

Pascal

function igcd (x, y : integer) : integer;

var

r: integer;

begin

while y <> 0 do

begin

r:= x mod y;

x:= y;

y:= r

end;

igcd := x

end igcd

C

int igcd(int x, int y)

while (y)

int r = x%y;

x = y;

y = r;

;

return x;


1.4.2 Pseudocode and Algorithms

Pseudocode is language for people to describe, explain, and discuss algorithms. It is simply ablend of programming constructs and natural language. A great deal of pseudocode is a blendof simple imperative constructs and English. Usually, the constructs are those we will meet inthe while language and its extensions.

Pseudocode is like natural language: it has well understood forms, and a core that can bedescribed and even specified. But since its purpose is communication between people, it canchange and extend itself to fit the problems at hand and the fashions of the day. Pseudocodeoriginates and remains close to the ideas of Algol 60. One of the primary aims of the designersof Algol 60 was to provide a language for people to express elegantly and communicate clearlyalgorithms (see Chapter 2). This goal they achieved. As new algorithms and programmingideas are discovered new constructs are created. The best of them will end up in Pseudocode,long after their algorithms and languages have fallen into disuse.

1.4.3 Object-Oriented Languages

Object oriented imperative languages use the idea of objects to abstract from the idea of aprogram as combining data, a store and an algorithm. An object can be seen as a basic unit orcomponent of a program, having its own data, store and algorithms. The objects’s algorithmsimplement procedures on the data and store of the object which are called methods. Theprogram is built from a number of objects, specially written or taken from a library. With thisview of program, the construction of new objects from given objects is fundamental and hasled to the development of many different ideas of inheritance and polymorphism.

A key idea is that the single monolithic store associated with a classical imperative programis broken up into independent pieces. Thus, the basic intuition of variables indicating registersin a single global store, where data can be placed and found, is removed from programmingand replaced by a new basic but more abstract conception of objects.

The idea of an object is struggling to find its most general form, applicable to declarativeand other languages. For our imperative purposes, objects simply generalise and unify the ideasdata types and procedures on the store. The objects are instanced by classes.

Some general purpose languages object oriented languages are:

Simula 66, Smalltalk, Eiffel, C++, Java, C#

In object-oriented programming, methods generalise procedures and functions. Thus, wecan add to our gallery of gcd code by rewriting the gcd procedure as a method in differentobject-oriented languages. The examples of gcd methods continue to make the point that theessential ideas of while programs are alive and well in contemporary object-oriented program-ming languages.

The development of object-oriented programming languages has extended imperative pro-gramming. There is a new emphasis on programming in the large, through the ideas of classand operations on libraries of classes.

Through not necessary for the purpose of the gallery, it is interesting to examine the classeswhere these methods reside. Immediately, we see differences. Are they superficial or deep? Thedifferences are magnified by the fact that we are in the early stages of research into theoreticalmodels of object-oriented programming. We do not understand the fundamental ideas fully,


not to mention the needlessly complex features that inhabit practical technologies. Computerscientists have been here before: once upon a time, there was little stability in the matter ofthe key imperative constructs. One thinks of the history of recursive procedures, for example.Some of these aspects will be discussed in Chapter 2.

Let us continue our gallery of gcd code. One object-oriented sample will suffice.

Java

In Java, traditional functions and procedures, characteristic of mathematical operations, arewritten as static methods. Thus a simple re-programming of the gcd procedure in Java is

public static int gcd(int x, int y)

while (x != 0)

int r = x % y;

x = y;

y = r;

return x;

The public refers to the access rules for objects and ensures that the method is generallyavailable if placed in a public class.

How can the gcd static method be inserted into a class?Perhaps, the basic ideas of object-oriented programming would be simply served by the

idea of simply extending the standard Integer class, supplied with the language, with a gcdmethod. For example, we can write:

public class GCDInteger extends Integer

public Integer gcd(Integer a)

int y = a.intValue();

int x = this.intValue();

while (y != 0)

int r = x % y;

x = y;

y = r;

return new Integer(x);

The calculation is made with data of type int, but the data is presented to the class as Integerobjects, thus conversion from Integer to int and back is included.

Unfortunately, the Integer class is designated as final which means that, in Java, classesmay not inherit from it. Hence, the above simple idea and neat text is not allowed.

In fact to see our little gcd as a class, the next step is to create an independent public

class.


public class GCD

private int a;

public GCD(int b)

a = b;

public int gcd(int y)

int x = a;

while (y != 0)

int r = x % y;

x = y;

y = r;

return x;

Why this gcd should be such a palaver makes an interesting question about bindings.

1.4.4 Algebraic, Functional and Logic Languages

Algebraic, functional and logic languages are based on the idea that one computes solely withfunctions and relations on data. The functions and relations can be defined and combined indifferent ways. The model of computation underlying an algebraic, functional or logic program-ming language is

(i) to declare a new function or relation which has certain properties in terms of otherfunctions and relations;

(ii) to evaluate the function and relation using deduction in a formal calculus.

The language is called declarative because functions and relations are declared by propertiesthat specify the functions and relations “axiomatically”. Specifically, there is no abstraction ofa machine, with commands transforming states, involved in programming.

The most important method of definition is to write formulae that a function must satisfysuch as some algebraic equations, recursion equations, or Horn clauses. The formulae constitutea program that defines the function and its implementation solves the unknown function orrelation in the equations or formulae. Algebraic languages (such as OBJ and Maude) arebased on deductions that rewrite terms using algebraic substitutions and equations. Functionallanguages (such as SML and Haskell) are based on deductions in special formal calculi forreducing functional expressions, such as various lambda calculi. Logic languages (such as Prolog)are based on deductions in formal predicate calculi using proof rules, such as Horn clause logic.

In practice, declarative languages can acquire some familiar imperative constructs such asthe assignment and the while loop. They are often considered to be imperfections (as in SML)or advanced constructs (as in Haskell).

One sample gcd in each of the three types of language will be sufficient.


Haskell

The gcd function is predefined in Haskell. However, we can re-code it as follows.

igcd :: Integer -> Integer -> Integer

igcd 0 0 = error "igcd 0 0 is undefined"

igcd x y = gcd (abs x) (abs y)

where gcd x 0 = x

gcd x y = gcd y (rem x y)

Maude

In the algebraic language Maude we can write a functional module.

fmod GCD is

protecting INT .

op GCD : Int Int -> Int .

vars x y : Int .

eq GCD(x, 0) = x .

ceq GCD(x, y) = gcd(y, x rem y) if y =/= 0 .

endfm

Prolog

igcd(X, 0, X).

igcd(X, Y, Z) :-

Y > 0,

R is X mod Y,

igcd(Y, R, Z).

1.5 Additional Constructs of Interest

There are many omissions from the rich collections of constructs used in everyday imperativeprogramming. Hence, there are many possible additional constructs of interest, each giving riseto a language that is an enhancement of WP . Let us consider some missing constructs, onlysome of which we will meet later. We will not stop to explain what these missing constructsdo for, in most cases, it will be “obvious”, until the moment comes to explain them properly.We simply want to review the scope of our language.

1.5.1 Data

The while language has the Booleans as a basic data type and allows us to declare any datatype. This means that all kinds of special data types can be added to extend the language.

1.5. ADDITIONAL CONSTRUCTS OF INTEREST 19

First, in addition to the Booleans, data such as naturals, integers, reals, characters, etc. canbe added as basic types.

Second, there are more specialised data types modelling

time and space.

Time can be made explicit and constructs that schedule data according to a clock can beadded. Discrete time is defined by a clock that simply counts clock cycles, for example,

• clocks:

T = 0 , 1 , 2 , . . . , t , . . .

A sequence of data scheduled in time by a clock is called a

• stream:

a(0), a(1), a(2), . . . a(t), . . .

Similarly, space defined by a

• coordinate system

S ⊆ R3

can be specified, and data from a set A distributed in space by a function φ : S → A is called a

• field:

for x ∈ S , φ(x ).

Both time and space are very abstract notions, and hence, have very many applications. InChapter 6 we will study the operations and tests that will make them important data types.

Third, structures for

storing data

can be added such as records, stacks, lists, queues, trees, files and, for example,

• arrays:

a[i] := e

Notice that data structures are created to store and organise data in ways that are more abstractand suited to algorithms than those of the program store.

Scheduling data in time, distributing data in space, and storing data are examples of

constructing new data types from given data types

For example, we make new data types that schedule real numbers, distribute colours, and storestrings.

The while language will be easily extendable by new data types since we can simply declarethem.


1.5.2 Importing Data Types

The while language over arbitrary user defined data types allows us to study data independentlyfrom the store and algorithms. In fact, the constructs used for declaring the data types in while

programs constitute an independent

interface definition language,

or IDL, for data types. This IDL can be extended in a number of ways.Now, we have just pointed out need for constructs that make new data from old, such as

making streams and fields over given data. Commonly, the new data types “contain” the old.Thus, there is a need for constructs that enable a new data type to import a given data type.To the declaration of data types interfaces we may add the construct

import Name1 , . . . ,Namen

Other ways of expressing import use terms such as

extends, expands, or inherits.

We will add such a construct to the while language later.

1.5.3 Input and Output

The while language does not possess input and output statements such as

read and write.

Such constructs can be added though not without some clear thinking about what is to bemodelled: too much attention to low level ideas about systems can lead to complications. Oneapproach is to see input and output statements as accessing data from streams at particulartimes. Thus the data type of streams, mentioned earlier in Section 1.5.1, can be used tointroduce input and output into the language.

1.5.4 Atomic Statements

The computation of basic operations and the storage of the results can be performed simulta-neously by the

• Concurrent assignment:

x1 ,x2 , . . . ,xn :=e1 ,e2 , . . . ,en

which evaluates the expressions and assigns them to the variables in parallel. For example,values can be swapped by

x1 ,x2 :=x2 ,x1 .

Rather like skip, special statements can be added that stop or terminate a computation:

• halt

or force an endless continuation or the non-termination of a computation:

• diverge


1.5.5 Control Constructs

Other forms of testing and control can be added. For example, given a program S and test b,there are:

• Bounded iteration:

do e times S od

for i :=1 to e do S od

for i :=e downto j do S od

where e is an expression of type natural number that defines the number of repetitionsof S that are performed.

• Unbounded iteration: There are various forms in addition to the while statement such as

repeat S until b end

unless b do S od

• Infinite iteration:

repeat S forever

• Conditional: Multiple conditionals are useful, such as

case or switch statements

A case statement may have the form

case

b1 then S1 ;...bn then Sn

end

which executes the first statement Si where its Boolean test is true, and otherwise doesnothing.

• Branching and jumps: Unrestricted jumps are obtained by adding labels to commandsand the construct

goto label.

A more limited form of jump allows labels and the construct

exit label

inside the while (or other iterative) constructs.


1.5.6 Exception and Error handling

Interrupts can be added in different ways as an:

• Atomic statement:

error name

• Control construct:

exception when b do S od

1.5.7 Non-determinism

Languages made by adding the above constructs are all deterministic. Each construct deter-mines one and only one operation or sequence of operations on the store. Hopefully, most ofthem are instantly recognisable.

Now, we can consider constructs that introduce non-determinism into computations. Per-haps these are new or less obvious to the reader. They each have valuable and essential rolesin program design.

Non-deterministic Atomic Statements

There are constructs that choose data from specific sets such as natural, integer or real numbers,or from any set:

• Non-deterministic assignments choose elements without constraints:

x:=?

or choose elements that satisfy a basic test b or programmed property P :

x:= choose z: b(z)x:= some y such that P(y)

Familiar constructs, such as

x:=random(0,1)

which chooses a real number between 0 and 1 , have this form.

Non-deterministic Control Constructs

There are constructs that make a choice in passing control:

• Non-deterministic selections

choose S1 or S2 ro

which choose between two statements S1 and S2 .

• Conditional guarded commands


if b1 → S12 b2 → S22 . . .2bn → Sn fi

which select one among a number of statements Si whose guards bi are true, and otherwisedo nothing.

• Iterative guarded commands

do b1 → S12 b2 → S22 . . .2bn → Sn od

which select on each iteration one among a number of statements whose guards are true;when no guards are true, the iteration halts. If no guards are true initially, then theconstruct does nothing.

1.5.8 Procedures, Functions, Modules and Program Structure

Programs can be built using many kinds of “subprogram”. The early programmers soon in-vented the idea of the

subroutine, procedure and function procedure

to organise their codes. A procedure is a piece of program that performs a particular task. Itis intended to be independent and can be reused since the procedure can be called to performits task in one or more places in the program, or placed in a library for possible use in otherprograms. Along with procedures comes a more abstract view of program structure. Forthe program is organised around independent tasks and the beginning of the idea of programarchitecture. One might think of procedures as abstractions of program statements, functionprocedures as abstractions of terms, and modules as abstractions of bindings.

To compute on the data type, abstractions from the state and operation of a program canbe added to the language using:

• Function procedures such as:

function y = F(x) is S end

or

func in a out b begin S end

function F(a:data): data begin S end

Abstractions from the control and operation of a program can be added:

• Recursively defined procedures and functions.

Procedures mark the beginning of a huge increase in the complication of a language. Forwith procedures come many options and choices for their operation.

For example, there are several ways of initialising the parameters of a procedure in a pro-gram, including:

call by name, call by value, call by reference.


For example, may procedures use other procedures? Thus, if so, then can procedures callthemselves? In fact recursive procedures are now seen as an excellent programming device butit was not always so, for its practical implementation and theoretical understanding was oncea headache: recursion was inefficient and theoretically complex.

With procedures and ideas about organising programs come options and choices for theircomposition of texts. For example, there is the idea of blocks of text in which variable declara-tions could be made that were relevant only to that block. With this simple syntactic devicecomes options and choices that concern binding and scope rules for local and global variables.

The usefulness of increasing the independence of parts of a program is a powerful themein programming. It leads to better a understanding of the structure of the program throughmodularisation and to the benefits of reuse.

The principles of information hiding in developing a program are intended to do just that.In programming languages a good place to apply information hiding is to data.

Data types can be made independent using programming ideas such as the module whichis a distinct unit of program that might be separately complied. This use of modules wasaccompanied by the development of the theory of abstract data types.

Out of this search for greater independence in a language comes the ideas of the class andobject.

1.6 Conclusion

We have introduced the theoretical investigations on programming languages to come by askingquestions and inspecting some raw material in need of mathematical analysis.

1.6.1 Kernel and Extensions

The while language over arbitrary user defined data types captures the basic ideas of imperativeprogramming and, thus, we can use the while language as a kernel language from which moreelaborate languages can be made, as shown in Figure 1.2.

Definition A kernel language is a language that is small, self-contained and well understood,and can be extended by adding new features that are either constructs added for convenience,but do not increase the expressive power of the language; or constructs added in order toincrease the power of the language.

The while language WP is certainly small and self-contained. In the course of this book, wewill show that it is very well understood. By increasing the “expressive power” of the language,we mean adding constructs that allow us to compute more functions on data.

For example, it is convenient to add case statements and repeat statements to the while

language. However, it is not necessary because these can be programmed by if-then-else andwhile statements. No new functions are computed in the extended languages.

It may be necessary to add arrays and non-deterministic constructs to the while language.It can be proved that, for certain data types, these constructs allow larger classes of functionsto be computed. In particular, the addition of arrays is necessary to obtain a while languagethat is universal over any data type.

1.6. CONCLUSION 25

data,

storage,

control flow,

abstraction, and

modularisation

new constructs for

kernel language

while language over

user defined data types

language extensions

Figure 1.2: The while language as a kernel language.

Definition A language L is universal if it contains a program that takes as input any programP of its language and any data x of the appropriate type, and evaluates P(x ).

The idea of a kernel language and its extensions is theoretically clear and attractive. But, likemany aspects of programming languages, in practice it can be tricky to pin down in particularcases.

At the heart of our analysis of the while language WP and its extensions is its formaldefinition. The answers to the questions of Section 1.2, when WP is substituted for the arbitrarylanguage L, depend on the answers to the questions:

Data How do we define exactly the data types of WP, or of an extension?

Syntax How do we define exactly the legal program texts of WP, or of an extension?

Semantics How do we define exactly the computations by the programs in WP, or of anextension?

As we gain a precise understanding of data, syntax and semantics, we can proceed to formulatethe questions exactly and answer them.

The main objectives are to develop the elements of

• a theory of data that is a foundation for computation;

• a theory of syntax that allows the specification and processing of languages; and

• a theory of semantics that defines the meaning and operation of constructs and explainshow a program produces computations.

These theories are the foundations for understanding extensions to the language and for thenew objectives, such as creating a theory of program specification; and a theory of compilation.


Exercises for Chapter 1

1. Give three Big Ideas in Physics, Chemistry, Biology, Psychology, Economics, History andLiterature.

2. What is the point of making mathematical models of systems in Physics, Biology andEconomics?

3. Attempt to write short answers to the questions posed in Section 1.2 before readingfurther chapters.

4. Write out in English the meaning of the following constructs:

(a) assignment x :=y;

(b) assignment x :=y + z;

(c) assignment x :=√y;

(d) concurrent assignments x1, . . . ,xn:=e1 , . . . em ;

(e) sequencing S1 ;S2 ;

(f ) conditional if b then S1 else S2 fi;

(g) iteration while b do S0 od; and

(h) infinite iteration repeat S0 forever;

where x1, . . . , xn are variables, e1 , . . . , em are expressions, n need not equal m, b is aBoolean test and S0 , S1 , S2 are programs. Try to make your description exact and com-plete. Note any possible curious variations.

5. Using the core constructs of the while language WP (as explained and exemplified inSection 1.3), write programs for the following:

(a) sorting a list of integers using the bubble-sort algorithm;

(b) listing prime numbers using the sieve algorithm of Eratosthenes;

(c) finding an approximation to the square root of a real number; and

(d) drawing a straight line on the integer grid using Bresenham’s algorithm.

Take care to define the interfaces to the underlying data types.

6. Rewrite the while program for gcd as procedures in Modula and Ada.

7. Rewrite the while program for gcd as procedures in Smalltalk, C++ and Eiffel.

8. Rewrite the while programs in Question 5 as Pascal, C and Java programs.

9. Show how to express the following constructs in terms of the core constructs of the while

language WP :

(a) the repeat statement;

(b) the repeat-forever statement;

1.6. CONCLUSION 27

(c) the for statement; and

(d) the case statement.

10. Consider the constructs of the while language. Give an argument to prove that addingthe iteration construct to the assignment, sequencing and conditional constructs increasesthe computational power of the language. That is, one can compute more with the while

statement than without it.

11. Show that we do not need the conditional construct if we have the constructs of assign-ment, sequencing and iteration. That is, the if-then-else construct can be implementedusing the other constructs.

12. Use a non-deterministic assignment to construct a program that, given a real number xand a rational number ε, finds a rational number within a distance ε from x .

13. Discuss the advantages and disadvantages of not allowing labels and goto statements inthe while language? You may wish to read Knuth [1974].

14. Investigate the use of guarded commands in programming methodology by reading Dijkstra [1976].

15. Consider swapping the values of two variables x and y . For any set A of data there is astandard procedure based on using a temporary variable t :

procedure swap (x, y: A);variable t :A ;begin

t := x;x := y;y := t

end

In the case of a particular set of data, such as the integers A = Z, the operations allowother options:

procedure swap int (x, y: Z);variable t: Z;begin

x := x − y;y := x + y;x := y − x

end

Show that if the variables are distinct, i.e., x 6≡ y then swap int does indeed swap x andy. What happens if we call

swap int(x, x: Z)?

16. Make a list of 100 programming languages, complete with their:

(a) full names and acronyms;


(b) designers’ names and locations;

(c) release dates; and

(d) purpose.

Chapter 2

Origins of Programming Languages

“There is nothing quite so practical as a good theory.”

Albert Einstein.

We will continue our introduction to the theories of data, syntax and semantics by lookingat the historical development of our present understanding of programming. We will give animpression of how practical problems in programming require conceptual insights and mathe-matical theories for their solution. Again and again in history, we see mathematical theoriessolve difficult and controversial research problems and transform solutions into standard prac-tices.

We will focus on the ideas, technical themes and problems which have motivated and shapedthe theories in this book, namely:

• Machine architectures and high-level programming constructs.

• Machine independence of programming languages.

• Unambiguous and complete definitions of programming languages.

• Translation and compilation of programs.

• User defined data types and their implementation.

• Specification and correctness of programs.

• Independence of program components.

The theories of data, syntax and semantics make their entrance and establish themselvesin the first four decades of computer programming, i.e., 1945–1985. Each theory was startedby applying concepts and results from mathematical logic and algebra to practical problemsin programming. The mathematical models gave understanding which led to solutions. Oncecreated, each theory was soon independent of its pure mathematical origin and became a partof the foundations of Computer Science. Yet Pure Mathematics remains a vast treasure houseof ideas and methods for the solution of technological problems in Computer Science.

29

30 CHAPTER 2. ORIGINS OF PROGRAMMING LANGUAGES

We begin with the birth of imperative programming in the work of Charles Babbage on theAnalytical Engine. Then we jump to some twentieth century problems, research and develop-ments, associated with the above themes. These will be grouped into decades, the first startingin 1945, the era of the first electronic stored-program universal computers.

2.1 Historical Pitfalls

Right from the start, one must beware that the historical analysis of technical subjects is besetby three temptations:

Temptations

1. The wish to reconstruct the past as a clearly recognisable path leading to the present.

2. The wish to find and create heroic figures whose achievements more or less define thestory.

3. The wish to separate the work from the life and times of the society in which it tookplace.

The three temptations are well known to all historians. The first is called teleological history.The second and third are particular curses of writers on the history of technical subjects (likephysics, mathematics and computing).

All three are temptations to be selective, to be blind to the complexity and messiness ofscientific work, to the slowness of scientific understanding, to the immense amount of work,by many people, that is necessary to produce a high quality and useful subject in science andengineering. To simplify in this way can lead to a history that is so misleading as to be almostworthless.

A distorted history can and will distort our picture of the present and future.

The early development of computers and programming languages illustrate a common stateof affairs in scientific research: there are many objectives, problems, methods, projects, andpeople playing independent and significant roles over long periods of time. Gradually, someconvergence and standardisation takes place, as problems, ideas and people are celebratedor forgotten. Histories of computers and programming languages must reflect the scale anddiversity of activities, in order to be honest and useful to contemporary readers who may becaught up in similarly messy intellectual circumstances.

Of course the problem with history, as with software and much else, is that it is difficult toget big things ”right”! In the sketch that follows, we can only hope to awaken curiosity and togive a health warning about the simplifications we are forced to make.

2.2 The Analytical Engine and its Programming

Our contemporary ideas about programs and programming can clarify and organise our at-tempts to appreciate and reconstruct the ideas of Charles Babbage on programming. Equally,they may obscure or mislead us in reinterpreting and extending his achievements (compare

2.2. THE ANALYTICAL ENGINE AND ITS PROGRAMMING 31

Temptations 1 and 2 above!). There is a small number of draft programs and notes, in con-trast with the large amount of information on the machine. There are no complete programsand there is no detailed specification of all the instructions. There are the descriptions of themachine, and the paper by L Menabrea and the important set of notes by Ada Lovelace of1845.

The evolution of the Analytical Engine began in 1833 when work on the Difference Enginestopped. The first notes of 1834 are workable designs. In late 1837 there is a major revisionwhich was the basis for the development until 1849, when Babbage ceased work until startingup again in 1856.

A clear and concise specification of an early form of the Analytical Engine is found in theletter Babbage wrote to the Belgian statistician Adolphe Quetelet, dated 27 April 1835:

It is intended to be capable of having 100 variables (or numbers susceptible ofchange) placed upon it; each number to consist of 25 figures.

The engine will then perform any of the following operations or any combinationof them: v1 , v2 , . . . , v100 being any number, it will

add vi to vk ;subtract vi from vk ;multiply vi by vk ;divide vi by vk ;extract the square root of vk ;reduce vk to zero.

Hence if f (v1 , v2 , . . . , vn), n < 100 , be any given function which can be formedby addition, multiplication, subtraction, division, or extraction of square root theengine will calculate the numerical value of f . It will then substitute this valueinstead of v1 or any other variable and calculate the second function with respectto v1 .

It appears that it can tabulate almost any equation of finite differences.

Also suppose you had by observation a thousand values of a, b, c, d , and wish tocalculate by the formula

p =√

(a + b)/cd .

The engine would be set to calculate the formula. It would then have the first setof the values of a, b, c, d put in. It would calculate and print them, then reducethem to zero and ring a bell to inform you that the next set of constants must beput into the engine.

Whenever a relation exists between any number of coefficients of a series (providedit can be expressed by addition, subtraction, multiplication, division, powers, orextraction root) the engine will successively calculate and print those terms andit may then be set to find the value of the series for any values of the variable.

Extract of a letter from C Babbage to A Quetelet, 27 April 1835.

Works of Babbage. Volume III. Analytical Engine and Mechanical Notation.

Ed. M Campbell-Kelly, W Pickering, London, 1989. pp12-13.


The idea of the machine contained in this letter can be explored by making a simple math-ematical model based on a few features. In particular, one might first guess that:

(i) the machine simply computes arithmetic expressions based on the five operators and 100variables mentioned;

(ii) the expression is changed;

(iii) each expression is repeatedly evaluated on a finite sequence or stream of values for thevariables; and

(iv) loops do not occur.

What does the remark about tabulating all finite difference equations imply? The specifi-cation is too loose and vague for us to answer semantic questions with confidence.

We will develop a formal model based on these four assumptions in due course.This description of 1835 predates the idea of using Jacquard cards (which has been dated

to the night of 30 June 1836). The idea of the Jacquards, or punched cards, is fundamental.It enriches the conception of the machine and its use, making explicit the breaking down andsequencing of its actions. It is an essential feature of mechanisation to break down processesto actions and operations. A number of types of cards were envisaged, notably variable andoperation cards. In a description of the engine in 1837, the operation cards were to be used foraddition, subtraction, multiplication and division.

In the paper by L Menabrea, there is an emphasis on (what we may call) the algorithmicaspects of the Analytical Engine, and this is further developed in the Lovelace notes. A few“programs” are discussed and, to add to our stock of raw materials for our theories, we willlook at the “program” for solving a pair of simultaneous equations.

Consider the pair

mx + ny = d

m ′x+ n ′y = d ′

of equations, where x and y are the unknowns. By eliminating the variable y in the usual way,we find

x =dn′ − d′nn′m− nm′

and, similarly eliminating x , we find

y =d′m− dm′

mn′ −m′n.

The problem is to program the machine to calculate x and y according to these formulae.The letters

V0 ,V1 ,V2 ,V3 ,V4 and V5

denote the columns reserved for the coefficients and have values

m, n, d ,m ′, n ′ and d ′

respectively. A series of Jacquard cards provides the series of arithmetic operations that evaluatethe expressions on the machine. They involve a number of working variables V6 , . . . ,V14 , andtwo output variables V15 and V16 . The series of operations and their effects on all the columnsof the machines is presented in a table given as Figure 2.1.

2.2. THE ANALYTICAL ENGINE AND ITS PROGRAMMING 33

Figure 2.1: A program trace for solving simultaneous equations on the Analytical Engine.


We leave the reader to ponder on the calculation and the underlying program from this trace.Notice that when the value of a variable column is no longer required, then it is re-initialisedto the value 0 .

From time to time we will look at the treatment of data, program control, and the method-ology of programming the Analytical Engine. Our aim in looking at old ideas is not to writetheir history. We will pick on historical concepts and methods to suggest examples to illuminateand enrich contemporary ideas.

2.3 Development of Universal Machines

The current conception of programming is based on the ideas of a stored-program computer and,in particular, a universal computer. The concept and basic theory of the universal computerwas developed in mathematical logic, starting with A Turing’s work in 1936. However, mostof the early projects on computing machines did not use, nor were aware of, this theory, untilJ von Neumann entered the area.

The development of computers is an interesting subject to study for many obvious reasons.An unobvious reason is that the subject is so complicated. There are many contributions(concerning electronics, mathematics, computer architecture, programming and applications)to examine and classify. There are many historical problems in evidence (recall Section 2.1).

To complement our remarks on the Analytical Engine, we will present a very simplifiedpicture of the development of universal machines based on a classification of their programmingcapabilities; it is summarised in Figure 2.2. We hope it is a useful reference for a proper studyof the origins of programming.

2.3. DEVELOPMENT OF UNIVERSAL MACHINES 35

Automatic Program Sequence

Internal Program

Stored Program

1938

Manual Program Sequence

1948 Clippinger

Fixed Program Storage

ENIAC modification

1936

External Program

1939

1943

Plugged Program

G StibitzModel ICompleted 1940

K ZuseZ3Completed 1941

H AitkenMark ICompleted 1944

J P Eckert &J W MauchleyENIACCompleted 1946

1944

1945

1948

J P Eckert & J W MauchlyCompleted 1949

J von Neumann & H Goldstine

F C Williams & T Kilburn

Completed 1949 M V Wilkes

Variable Program Storage

Figure 2.2: Some stages in the development of electronic universal machines.


2.4 Early Programming in the Decade 1945-1954

• Stored program computers.

• Logical specification of computer architectures.

• Algorithmic and programming languages.

• Automatic programming seen as compilation or translation of programs.

• Problem of programming cost recognised.

• Problem of program correctness recognised.

Programming languages and programs emerged with computers. During the Second WorldWar (1939–1945) electro-mechanical and electronic digital computing machines were producedfor special applications; for example:

Name Responsible Demonstration Year

Model I G Stibitz 1940

Z3 K Zuse 1941

Differential Analyser J Atanasoff & C Berry 1942

ASCC (Harvard Mark I) H Aiken, C D Lake,F E Hamilton & B M Durfee

1943

COLOSSUS T H Flowers & M H A Newman 1943

ENIAC J Mauchly & J P Eckert 1945

Zuse’s Z3 is the earliest electro-mechanical general purpose program controlled computer.The ENIAC (Electronic Numerical Integrator and Computer) is well known as an early elec-tronic computer that was close to being general purpose. It was programmed by setting switchesand plugging in cables. Later it was redeveloped with stored program facilities (the EDVACcompleted in 1952), and led to a commercially available machine the UNIVAC.

By 1949 some electronic universal stored program machines were operational; for example:

Name Responsible Demonstration Year

Prototype Manchester Mark I F C Williams & T Kilburn 1948

EDSAC M Wilkes 1949

IAS Computer J von Neumann & H Goldstine 1951

The Manchester Mark I was the first electronic stored program machine to operate, on21 June 1948; it was developed into the first commercially available machine, the FerrantiMark I, and sold to 11 customers. The EDSAC (Electronic Delay Store Automatic Calculator)which was successful in practical terms and played a significant role in the development of

2.4. EARLY PROGRAMMING IN THE DECADE 1945-1954 37

programming. The IAS Computer was developed 1946-52 and was influential from its inception.The designs of these and other machines were copied and refined by many groups and companieswithin the decade.

Like Babbage’s creations, these early electrical machines are great achievements. For ourpurposes, that of reflecting on the origins of programming, the early machines can be classifiedby the physical properties of their programs. For example, the means for program sequencingand the storage of programs lead us to Figure 2.2.

The design of such machines combined engineering and mathematical ideas and, in the caseof the earliest machines, they were programmed by the people who built them. To establishthe independent status of programming, a logical view of a machine is essential, one thatabstracts from the machine’s physical implementation. Isolating the logical view of a machineand reflecting on programming structures was the work of scientists trained in mathematicallogic.

Mathematical logic has played a completely fundamental role in computer science from itsinception, especially in the field of programming languages. The most important contributionwas Alan Turing’s concept of a mathematical machine and the theorems about its scope andlimits that he proved in 1936. The theorem on the existence of a universal Turing machine ableto simulate all other Turing machines is the discovery of the universal stored program computer.Turing’s discoveries led to the theory of computable functions which played a decisive role in thedevelopment of mathematical logic in the previous decade 1935-1944. Historically, the theoryof computable functions is the start of the theory of programming languages.

As part of the ENIAC project, the logical idea of machine architecture was made explicit inJohn von Neumann’s First draft of a report on the EDVAC of 1945. The independence of thearchitecture from technology was emphasised by describing components not in terms of currenttechnologies but using logical components taken from McCulloch and Pitts pioneering work onlogical models of the nervous system of 1943; the theory of neural networks was inspired bycomputability theory. A full account of the basic features of a computer architecture appears inA Burks, H Goldstine and John von Neumann’s Preliminary discussion of the logical design ofan electronic computing instrument. The report was written in 1946, at the start of the projectto build the IAS Computer and was widely circulated. The report helped promote the termvon Neumann architecture, though many of the ideas were known to others.

Programming these machines was difficult, partly because of the deficiencies of the hardware,and partly because of the codes used to communicate with the machine. To develop computing,the general problem of programming must be tackled:

General Question How can programming be made less time consuming and error prone?

Plus ca change, plus c’est la meme chose!However, in the decade 1945-55 this timeless question led to some timeless theoretical prob-

lems and first steps to their solution.

Problem 1 How can the data and logical control of a program be modelled and understood?


In 1945-46, Konrad Zuse invented a general language for expressing algorithms called thePlankalkul. It was not intended to be machine implementable but to provide a frameworkfor preparing programs. Appropriately, it is referred to as an algorithmic language rather thana programming language. It was based on the data type of booleans and had some type form-ing operations. Sample programs included numerical calculations and testing correctness ofexpressions. See Bauer and Wossner 1972 (Bauer and Wossner [1972]) for a summary.

Herman Goldstine and John von Neumann wrote a series of papers called Planning and cod-ing problems for an electronic computing instrument in 1947-48. (See Goldstine and von Neu-mann 1947, Goldstine and von Neumann [1947].) As can be seen from their classification ofthe programming process into six stages, their view was that programs were needed to solvemainly applied mathematical problems. Among their ideas, they used flow diagrams as a pic-torial way of representing the flow of control in a program containing jump instructions. Thesereports were essentially the only papers on programming widely available until the first text-book on programming, namely Wilkes, Wheeler and Gill 1951 (Wilkes et al. [1951]), based onthe EDSAC.

In addition to understanding the logic of programs, there was the following:

Problem 2 How can we make programming notations that are more expressive of their ap-plications, easier to understand and to translate into machine codes?

There was a need for high level notations and tools to translate higher level notations intomachine codes. Hardware was limited and much needed to be programmed. Since programmingmeant writing programs in terms of primitive codes for specific machines, this led to the ideaautomatic programming. Early on, some recognised that the inadequacies of programmingnotations and tools for programming were the main barriers to the development of computing.Linked to the ideas of programming methodologies and languages was the following:

Problem 3 How can we reason about the behaviour of programs?

Problems of reasoning about programs were noted in the Planning and coding reports of Gold-stine and von Neumann (Goldstine and von Neumann [1947]). At the inaugural conferenceof the EDSAC in 1949, Alan Turing lectured on checking a large routine (see Morris andJones 1984, Morris and Jones [1984]).

2.5. THE DEVELOPMENT OF HIGH LEVEL PROGRAMMING LANGUAGES 39

2.5 The Development of High Level Programming Lan-

guages

• Independent levels of computational abstraction.

• Compiler construction.

• Machine independence of programming languages.

• Formal definition of programming language syntax.

• Mathematical theory of grammars and parsing.

• Functional languages.

In the first decade of electronic computing, programming was the art of encoding algorithmsinto a notation system that reflected rather precisely the design of a particular machine. Pro-grams were truly codes and specific to a machine. The intimate connection between a machineand its programs created the three problems we mentioned in the last section.

In the second decade 1955–1964, programming was transformed by the development ofsome high level languages for programming. They had the following remarkable properties thatenabled computer scientists to make great progress on Problems 1-3:

1. Languages were made from abstract notations that were closer to a contemporary user’salgebraic formulae, or English language descriptions, than a machine’s numerical codes.

2. Languages were designed systematically and possessed compilers to translate their high-level programs into machine code; and

3. Programs in the languages were intended to be independent of particular machines andwere given publicly defined standards.

We will review how high level languages were established in the period. Our account islargely based on the following sources which are recommended for further details. An excel-lent short account of the early technical development of programming languages is Knuth andPardo 1980 (Knuth and Pardo [1977]). Accounts of early programming by John Backus, An-drei Ershov, Maurice Wilkes, Edsger Dijkstra, and F L Bauer can be found conveniently inMetropolis, Howlett and Rota 1980 (Metropolis et al. [1980]). Full accounts of the well-knownlanguages, written by some of their designers, are in Wexelblatt 1981 (Wexelblat [1981]). Theearly development of formal languages is described in Griebach 1981 (Griebach [1981]). A storyof the 1960s is told in Lohr 2002 (Lohr [2002a]).

2.5.1 Early Tools

First, let us emphasise that at the beginning of the decade, programming fundamentally meantencoding. The idea that high level notations — such as flowcharts — were necessary to expressand think about algorithms was eminently sensible. It was expected that such notations wouldaid the process of encoding algorithms into programs by programmers. However, the idea thathigh level notations could be transformed into programs by the machine itself was much more


complicated, radical and controversial. The idea of making such tools for constructing programswas called

automatic programming.

At the time, the word program meant a list of numerical codes for instructions, so the machinetranslation of an abstract description or specification of an algorithm into such a program wasseen as automatic programming. The idea of automatic programming was known but consideredto be practically implausible, and even unnecessary, by many programmers of the day. Thisshould not be surprising if one remembers just what computers of the day demanded (e.g.,complex numerical codes for programs) and provided, or rather did not provide (e.g., no indexregisters, no floating point arithmetic). See the recollections in Backus 1980 (Backus [1980]),for example.

The early attempts at abstract notations that were translated via automatic programmingwere specific to machines and not seen as independent languages. In early research in automaticprogramming, the dominant problem was the translator:

Problem 4 Can an abstract notation be translated automatically into a program for a ma-chine and how efficient is the program it produced?

Early research on automatic programming was varied and independent, since there was littlecommunication between groups of programmers. An early effort was that of Grace MurrayHopper (1906-1992) who wrote a tool called A-0 in 1951–52 for the UNIVAC I. Hopper notedthat, commonly, programmers would share and use subroutines, copying them into their ownprograms, introducing errors. The tool A-0 would, given a “call word”, take and use a routinefrom a library of routines. This she termed a compiler as the tool built or compiled the programfrom components from a library. Of course, this usage differs from the use of the term compilerfor a translator of programs from a source language to a target language.

The A-0 was further developed into new tools (e.g., the A-2) which attracted some attention.Indeed, Grace Hopper’s contributions to automatic programming were immensely influential.In particular, between 1953–55 she developed a high level notation and compiler for dataprocessing applications using English commands. This tool was called FLOWMATIC andwas influential in the development of the business language COBOL, which was the primarylanguage for commercial applications for decades.

During 1949-52, Heinz Rutishauser gave an analysis of how a simple algebraic language,with a for statement as its only non-sequential construct, could be compiled into a theoreticalmachine. Corrado Bohm described a complier in his doctoral thesis of 1951. The first compilerto be implemented and used however seems to be the AUTOCODE system of Alick E Glennie,created for the Manchester Mark I, and in operation in 1952. The notation was tied to themachine. J H Laning and N Zierler’s system for the WHIRLWIND machine was a tool thattranslated algebraic style notations to codes and was in operation in 1953. The algebraicnotation was machine independent but lacked loops and was hardly a language.

2.5.2 Languages

The motivation for the development of FORTRAN by John Backus was simple to state:


• to reduce the time and cost of writing programs.

The development period for FORTRAN was 1954–56 and it was seen as an automatic program-ming system intended for a new machine, the IBM 704, which had floating point in hardware.The language was algebraic in style, using variables and operations in the standard way. Thealgebraic notation reflected the mathematical formulae that were the basis of its intended users’scientific computations. The development team did not see it as a difficult task to design thelanguage. The real task was to generate efficient code.

The advantages of high level languages include improving the productivity of programmersand making possible the machine independence of programs. However, in the period theseaspects were not always appreciated. First, there could be severe penalties in efficiency and,hence, assembler languages were preferred for efficient implementations — a tradition that iskept alive today in some corners of the subject. Second, the high capital cost of machinesand their infrastructure did not make portability of programs a priority. However, by 1958,when Backus surveyed twenty six of the 704 sites, it was clear that the FORTRAN automaticcoding system for the IBM 704 was a convincing solution to Problem 4 and was making a greatcontribution to the development of programming.

Interesting recollections of the times in which FORTRAN was developed are given inBackus 1980 (Backus [1980]), Backus 1981 (Backus [1981]). Since the 1950s, the languagehas been overhauled several times to meet the needs of scientists and engineers and it hassurvived to this day. It has served science and engineering well, and been involved in triumphsand disasters. For example, already in the summer of 1962, a single incorrect character in aFORTRAN program caused the Mariner I spacecraft to misbehave four minutes into a journeyto Venus. It had to be destroyed.

New versions of FORTRAN have lagged behind the latest thinking about programminglanguages. Without new programming ideas for new problems, the language has almost dis-appeared from view in computer science. However, it is important for computer science thatFORTRAN continues. This historic language is so remarkable that one day someone must writeits biography and others will have to rescue, conserve and preserve it.

The progress of automatic programming, and the success of FORTRAN, set the scene fornew language developments with considerable ambition. Now, the problem in designing aprogramming language was this:

Problem 5 How to choose programming constructs, define precisely the set of legal programsof the language, and create a translator that produces efficient programs?

The development of the language ALGOL in the period 1957–60 is, without question, the mostimportant to us. Its selection of constructs, elegant notation, and complete formal definitionof all valid programs made it a great landmark in the history of programming languages. In itsconstruction we see real progress being made in the solution to Problem 5.

The motivation for the design of ALGOL is interesting. By 1956, the first generation ofprogramming languages were born. Programs were texts that were becoming more about algo-rithms and less about machines. The machine was beginning to sink in software. A revolutionin programming had begun.

It was also becoming clear that language diversity was going to add to the complications ofgrowing machine diversity. Languages like FORTRAN, which have become subsequently one


of the oldest of international programming standards were then, of course, a new commercialproduct of IBM, tied to their machines. The idea of standardising languages was attractive forseveral reasons, including productivity, communication and portability.

Initiatives had begun in Germany in 1955, under the auspices of Gesellschaft fur angewandteMathematik und Mechanik, and in the USA in 1957, under the auspices of the Association forComputing Machinery, to seek a language with universal appeal to programmers. In 1958the aims of these learned societies were combined officially and committees established whichdesigned the language ALGOL.

ALGOL was intended as a machine independent high-level language for numerical com-putation. The name ALGOL denotes Algorithmic Language. It was designed in two stages,recorded in reports on the languages ALGOL 58 and ALGOL 60. The design process has beendiscussed in some detail by Alan Perlis and Peter Naur in Wexelblat (Naur and Perlis [1981]).Their accounts are extremely interesting and worthy of study and reflection.

According to Backus 1959 (Backus [1959]), the objectives of the ALGOL designers were tocreate a new language

(i) close to standard mathematical notation and readable with little explanation;

(ii) suitable for describing computing processes in publications; and

(iii) mechanically translatable into machine programs.

With such objectives the language had to be both abstract in relation to machines and assemblyprograms, and algebraic to provide notations well suited to its users’.

The final design ALGOL 60 contained many concepts new to languages and programmers:declarations, identifiers and types; high level constructs that were compound statements thatcould be nested, such as while and if -then-else. It had dynamic arrays. The languageprovided new constructs and methods for structuring and organising programs, such as typedprocedures, call by name and call by value, scope of declarations, blocks in nested commands,local and global variables, begin and end. It also supported recursive procedures. There werenew techniques in its translation, such as the use of stacks and recursive descent parsing.

The subsequent story of the language is interesting, both as a technical achievement and asa standard. It met the three objectives of the designers. In the case of the second and thirdobjectives, the language was immensely successful. However, in the case of the first objective,there are is a limiting factor:

a computer language must be able to express computational issues that are new, havenot been analysed, have no mathematical notation, and need lots of explanation.

This soon became known to the designers.Some research groups saw the publication of ALGOL as a stimulation to produce new

languages, and its influence has been immense. Reactions to the language have been describedby Bemer 1969 (Bemer [1969]).

2.5.3 Syntax

Perhaps the most important contribution of ALGOL was the recognition of the need to definethe language formally, and the invention of a method for doing so, at least in part. The method


is to use a special notation for defining rules for forming syntax that can be augmented withconditions expressed in natural language. It is due to John Backus and is derived from theconcept of

production system

introduced in Computability Theory by Emil Post in 1943; see Backus 1959 (Backus [1959]) andPost 1943 (Post [1943]). Production systems use rules to generate sets of strings. The notationcame to be known as Backus Normal Forms or Backus Naur Forms (BNF), after the proposerand Peter Naur who applied it in the report of the language. The methods had an immediateimpact on the translation of programs for, in defining the structure of the language, theyprovided a basis for organising the structure of a compiler. As Donald Knuth observed in 1962,compilers early in the period were difficult to explain “since the various phases of translationwere jumbled together into a huge sprawling algorithm” (Knuth 1962, Knuth [1962]).

The discovery of formal methods for the definition of the syntax of programming languageshas had a profound and lasting effect on the understanding, design and implementation of allforms of syntax. The key practical and theoretical problems of processing syntax were solvedthrough the use of mathematical models of syntax called

formal languages and grammars.

A formal language is simply a set of strings. A grammar is a set of rules for generating strings.Formal languages and grammars were proposed by Noam Chomsky in 1956 (Chomsky [1956])to provide mathematical foundations for the analysis of natural languages. Like Backus’s BNF,Chomsky’s grammars were based on Emil Post’s production systems.

Linguists had begun to study formally the general properties of languages. For example,sentences were structured by breaking them down into their immediate constituents and intoatomic linguistic entities called morphemes. Zellig Harries had attempted to construct sentencesusing equations as rules for substituting morphemes (Harris 1946, Harries [1946]).

In Chomsky 1959 (Chomsky [1959]), grammars were classified into four types, called type0, 1, 2 and 3, by placing conditions on the form of the rules. Later, this scheme was calledthe Chomsky hierarchy, type 1 grammars were called regular, type 2 grammars were calledcontext free grammars, and type 3 grammars were called context sensitive grammars. Contextfree grammars were proved to be equivalent to BNF notations. A mathematical theory offormal languages had started that explained what sort of languages could or could not bedefined by grammars, and how the complexity of recognising and parsing strings dependedon the complexity of the rules of the grammar. In computability theory, in the 1940s, Posthad already observed that the languages generated by production systems (type 0 grammars)were precisely the computably enumerable sets. (Post 1943 Post [1943]) Post had also provedthe undecidability of the correspondence problem, which soon became an important tool in thetheory of formal languages (Post 1946 Post [1946]).

Another important discovery was that the formal languages defined by the simplest ofChomsky’s grammars (type 3) were precisely the languages

(i) recognised by deterministic and non-deterministic finite state machines,

(ii) denoted by Kleene’s regular expressions, and

(iii) recognised by neural networks.


These equivalent ways of defining the languages of regular grammars opened up the subjectboth theoretically and practically. The regular expressions were invented by S C Kleene in1951 to characterise the computational power of McCulloch-Pitts neural networks (Kleene 1956,Kleene [1956]). Today, these equivalence theorems, which are basic to the theory, appear tous as both elegant and elementary, but at the time they needed several people, working overseveral years, to refine the messy technical notions and construct the translations.

Initially, the theory belonged to, and was developed in, the discipline of general linguisticsand, in particular, the new subject of the automatic machine translation of natural languages.Programming languages are far simpler and more regular than natural languages, and theirmachine translation is more urgent and vital. Thus, mathematical research on formal languagesand grammars proved to be an attractive and rewarding enterprise for computer scientists, moreso than for linguists.

After 1964, formal language theory became one of the core subjects in Computer Science. Ithas proved to be a rich subject with remarkable influence in Computer Science and applicationsoutside (e.g., biosciences). Certainly, the theory of formal languages has transformed ourunderstanding of the concept of a language and its structure. Today, grammars, including BNFand its extensions, are everyday tools for problems involving syntax. They turned the writingof a simple compiler from a controversial research problem into a routine student project. Wewill learn a great deal more about the ideas about syntax that emerged in this decade in thepages that follow.

2.5.4 Declarative languages

The decade 1955–64 was immensely more active than we can adequately describe here, forwe are focusing on the historical background of the theoretical subjects we intend to explain.Yet, it must be mentioned that functional languages were created in the period. Although thefunctional languages are not discussed in this book, their influence is everywhere.

From the beginning, programs were written to manipulate data other than numbers forscientific computations. Code breaking led to COLOSSUS, the first electronic computer. Onseveral early machines games were programmed and played. Research on using computers toautomate reasoning gathered momentum in the decade.

Many involved in research on computing knew some mathematical logic. The subject wasseen as an important source of ideas about practical computation. Boolean algebra, computabil-ity theory, propositional logic, and first order logic provided computer science with theories thatwere far in advance of practical computation.

Information was represented in a formal logical language and deduction made using therules of a logic. In logic, notions of syntax and semantics were well understood, and the logicallanguages were abstract and high-level. Lists were a good representation of logical syntax.To represent logical languages in a program called Logic Theorist, A Newell, J C Shaw andH Simon had created a list processing language IPL that worked on a computer called theJOHNNIAC.

With similar research interests in reasoning, in 1956, John McCarthy considered the ideaof an algebraic list processing language for the IBM 704. This was realised in the languageLISP which was designed and implemented 1958–59. LISP stands for LIS t Processing. LISPprocesses data that are expressions made from atomic expressions and lists. Rules transformexpressions. In addition to the nature of the data, other primary features of the language are


the use of conditional expressions, recursion and functions. The language was heavily influencedby the state of computability theory. Although it borrows notation for functions from the λ-calculus of Church, it is foremost a programming language intended for applications in theemerging new subject of Artificial Intelligence. In describing the language in McCarthy 1960(McCarthy [1960]), the combination of a new theoretical investigation and practical languagetool was presented. The value of the recursive functions for reasoning about programs was alsomuch appreciated in its design.

Later McCarthy wrote elegantly and influentially about the problems of programming andthe prospects for their theoretical investigation in McCarthy 1963 (McCarthy [1963]), an articleworth returning to for students the subject at all levels.

2.5.5 Applications

The decade saw two influential programming language innovations in business and education.The business language COBOL (COmmon Business Oriented Language) was created in

1959–60, under the auspices of US Department of Defense. The first formal meeting of peoplefrom government, users, consultants and manufacturers, in 1959, formulated the goal of a newcommon business language that would

(i) use simple English in programs

(ii) be easy to use, if less expressive or powerful

(iii) help to widen the range of people who could make programs, and

(iv) not be biased by contemporary compilation problems.

Committees were formed to create the language, an account of which is to be found inSammet 1981 (Sammet [1981]). The language COBOL 60 possessed plenty of new or improvedfeatures. Most noteworthy are: its syntax, which is easy to read and easy to document (thanksto Hopper), and its separation of procedures and data from machine and system features. Ofparticular interest is its handling of data possessing by data descriptions that are machineindependent and recursive data definitions (data fields within data fields).

COBOL met the needs of its users for a universal language for data processing. It was acompetitor of the commercial business systems of the manufacturers which had influenced itsdesign (e.g., the Univac FLOWMATIC) and so there was resistance to be overcome. However,the announcement that the US Government would buy or lease only computers with COBOL60 compilers proved helpful to the language! COBOL became a dominant computer language,well standardised, and survives today.

Finally, let us mention that at the end of the period, 1963–64, Thomas Kurtz and JohnKemeny developed BASIC (Beginner’s All-purpose Symbolic Instruction Code) for the purposeof teaching programming.

In this exciting decade, we find many of our fundamental ideas about high level programminglanguages.


2.6 Specification and Correctness

The decade 1965–1974 saw:

• Formal definition of programming language semantics.

• Vienna definition language for operational semantics.

• Denotational semantics of programming languages.

• Axiomatic semantics of programming languages.

• Specification and verification of programs.

• Formal logics for program correctness.

• Automatic verification.

• Data structures developed.

• Automatic programming as interpretation and compilation of specification.

• Modularisation of programs.

• Object-oriented languages.

• Logic programming languages.

• Interfaces with pull down menus, buttons and mice.

2.7. DATA TYPES AND MODULARITY 47

2.7 Data Types and Modularity

The decade 1975–1984 saw:

• Abstract data types.

• Algebraic theory of data types.

• Abstraction and modularisation principles for the formal definition of large systems.

• Domain theory.

• Algebraic description of concurrency.

• Specification languages.

• Modal and temporal logics and model checking.

• Programming environments and software tools.

• Concurrent programming languages.



1. Describe the input-output behaviour of the Analytical Engine of 1835. List the program-ming constructs that are needed to formulate assumptions (i–iv).

2. Write a program for solving simultaneous equations, in a while language, that wouldgenerate a trace similar to that of Lovelace’s program trace in Section 2.2.

3. Examine Lovelace’s notes and

(a) explain the role of the superscript i in the notation iVj;

(b) describe what data could be processed;

(c) describe what control constructs were intended; and

(d) write an account of the calculation of the Bernouilli numbers.

4. Was there a “Dark Age” for Computer Science from the time of Babbage to the 1930s?

5. Write a list of concepts in mathematical logic in the decades 1925-34 and 1935-44 thatare relevant for the history of computer science. Explain these ideas and their history.When and in what ways did they influence the development of programming languages?

6. Trace the history of the following over the same time-scale:

(a) graphics programming;

(b) machine translation of natural languages; and

(c) parallel computation, starting with the COLOSSUS, neural networks and cellularautomata.

7. Complete the account of the list of concepts for the decades 1965-74 and 1975-1984.

8. Write a list of concepts for the decade 1985-94.

Part I

Data

49

51

Introduction

Data is represented, stored, communicated, transformed and displayed throughout ComputerScience. Data is one of the Big Ideas of Computer Science — it is as big as the ideas of energy inphysics, molecules in chemistry and money in economics. Theories about data are fundamentalto the subject. In Part I, we will introduce the study of data in general, using ideas fromabstract algebra. The subject may be called the

algebraic theory of data

and it will be applied to the theory of programming languages in several ways.Indeed, we begin the study of the syntax and semantics of programming languages by

examining the idea of a data type. A data type is a programming construct for defining data.Its purpose is to name and organise data, and the basic operations and tests that we may applyto data, in a program. It consists of an interface and an implementation.

We will introduce the mathematical concepts of

many-sorted signature and many-sorted algebra

to model a data type. A signature contains names for types of data, particular data, operationsand tests on data. It models the interface of the data type. An algebra contains sets of data,particular data, operations and tests on the sets of data. It models an implementation of thedata type. The theory of many-sorted algebras is the basis of the general theory of data inComputer Science which is commonly called the

theory of abstract data types.

The many-sorted algebra is the single most important concept in this book.To start the development of the theory of abstract data types, in Chapter 3 we give a

temporary working definition of an algebra and present several examples of algebras in orderto familiarise the reader with some of the concepts and objectives of the theory. In Chapter 4,we replace the working definition of an algebra with a detailed formal definition of a signatureand an algebra. We show how signatures and algebras model interfaces and implementationsof data types.

In Chapter 5, we introduce the idea of specifying a data type using properties of its opera-tions. We consider some principles of reasoning using axiomatic specifications, and apply themto basic examples of data types.

Next, we meet lots more examples made by general methods for constructing some newalgebras from old; for example, in Chapter 6 we study algebras of records, arrays, files, streamsand spatial objects.

To compare two algebras we need mappings between their data that preserve the constantsand operations. These mappings are called homomorphisms and are introduced in Chapter 7.The comparison of implementations is at the heart of the theory of abstract data types.

To conclude Part I, we use many of the ideas we have introduced in an analysis of the datatype of terms in Chapter 8 and the data type of real numbers in Chapter 9.

Chapter 3

Basic Data Types and Algebras

In any computation there is some data. But to perform a computation with that data there mustalso be some “primitive” or “atomic” operations and tests on data. The algorithm or programthat generates the computation uses the primitive operations and tests in producing someoutput from some input. The concept of data type brings together some data in a particularrepresentation and some operations and tests to create a starting point for computation; roughlyspeaking:

Data Type = Data + Operations + Tests.

To make a precise mathematical model of a data type is easy because we have the theoryof sets, functions and relations from Mathematics.

The theory of sets was an advanced theory created by Georg Cantor (1845–1918) to sortout some problems in the foundations of mathematical practice, largely to do with conceptionsof the infinite. Cantor’s later definition of a set, of 1895, reveals the abstract nature and widescope of set theory:

Definition (Cantor) “A set is a collection into a whole of definite distinct objects of ourperception or thought. The objects are called the elements (members) of the set.”

This degree of abstraction is exactly what we need to create a general mathematical theoryof data. First, data is collected into sets; indeed, for our purposes, the elements of a set canbe simply called data. Next, the operations on data are modelled by functions on those sets,and the tests are modelled by either relations on the sets, or functions on the sets returning aBoolean truth-value. Choosing the latter option, we have a precise mathematical model of adata type called an algebra which is:

Algebra = Sets + Functions + Boolean-valued Functions.

In this chapter we will begin to develop this model by examining in detail some basicexamples of data, and of the operations and tests that algorithms use to compute with thedata. The examples of basic data types we model as algebras simply equip the following setsof data with operations and tests:

53

54 CHAPTER 3. BASIC DATA TYPES AND ALGEBRAS

Boolean truth-values B = tt ,ff Natural numbers N = 0, 1, 2, . . .Integer numbers Z = . . . ,−2,−1, 0, 1, 2, . . .Rational numbers Q = p

q| p, q ∈ Z, q 6= 0

Real numbers R

Characters and strings a, b, c, . . . , x, y, z and a, b, c, . . . , x, y, z∗

Most other data types in computing are implemented using these data and various datastructures. The data structures are programming language constructs for storing and accessingdata, such as records and arrays, or higher level data types, such as streams, graphs and files.We will study data structures in Chapter 6.

In Section 3.1 we introduce a provisional definition of an algebra. We illustrate this definitionwith examples of algebras of Booleans (Section 3.2), natural numbers (Section 3.3), integer andrational numbers (Section 3.4), real numbers (Section 3.5), machine data (Section 3.6) andstrings (Section 3.7).

In these examples, we meet some basic raw material and ask theoretical questions aboutalgebras which lead us, in Chapter 4, to reformulate our definitions. A thorough study of thischapter is also a way to revise some essential logic and set theory.

3.1 What is an Algebra?

Let us begin with a provisional definition of the general idea of an algebra that will help uscollect and reflect upon some basic examples. The provisional definition is inadequate because itlacks the important associated idea of signature; it will be refined to its final form in Section 4.3.

Simply said, an algebra consists of sets of data together with some functions on the sets ofdata. The functions provide some basic operations and tests for working with the data. Analgebra can contain many different kinds of data and any number of functions. For example, tohave tests on data the algebra must contain the set of Booleans. The general idea of an algebradoes not assume there are tests.

Provisional Definition (Many-Sorted Algebra)

A many-sorted algebra A consists of:

(i) A family. . . ,As , . . .

of non-empty sets of data indexed by s ∈ S . The elements . . . , s , . . . of the non-empty setS that index or name the sets containing the data are called sorts. Each set As is calleda carrier set of sort s ∈ S of the algebra.

(ii) A collection of elements from the carrier sets of the form

. . . , a ∈ As , . . .

called the constants of the algebra.

3.1. WHAT IS AN ALGEBRA? 55

(iii) A collection of functions on the carrier sets of the form

. . . , f : As(1 ) ×× · · · × As(n) → As , . . .

for s , s(1 ), . . . , s(n) ∈ S , called the operations of the algebra.

Now, the algebra A is called a

many-sorted algebra

because there can be many carrier sets As of data, named by the many sorts s ∈ S . When theset S of sorts has only one element, S = s, then A is called a

single-sorted algebra

because A contains one carrier set of data, named by one sort.The constants are sometimes treated as functions with no arguments and are called 0-ary

functions or 0-ary operations. In this case, instead of a ∈ As we write

a :→ As or a : As .

An algebra A consists of these three components. It is written concisely as a list of the form

A = (. . . ,As , . . . ; . . . , a, . . . ; . . . , f , . . .).

This is convenient for mathematical reasoning about algebras in general.Alternatively, an algebra may be displayed expansively in the form:

algebra A

carriers . . . ,As , . . .

constants ...a : → As

...

operations ...f : As(1 ) × · · · × As(n) → As

...

This is convenient for introducing and reasoning about particular examples.The idea of an algebra includes tests on data. Suppose a test is defined as a relation on data.

Then we can define functions to check whether or not data satisfies the test, or equivalently, isin the relation. Here is a definition for the current situation.


Definition (Characteristic Function) Let

R ⊆ As(1 ) × · · · × As(n)

be a relation on sets As(1 ), . . . ,As(n). The characteristic function

fR : As(1 ) × · · · × As(n) → tt ,ff

is defined for any a ∈ As(1 ) × · · · × As(n) by

fR(a) =

tt if a ∈ R;

ff if a 6∈ R.

By including the Booleans and the characteristic functions of relations as operations, testsare included in algebras.

The operations of an algebra may be either total or partial functions.

Definition (Total functions) Let

f : As(1 ) × · · · × As(n) → As

be a function. Then f is a total function if, for every input a ∈ As(1 ) × · · · × As(n), f (a) isdefined; we write

f (a) ↓ .If f (a) = b then we may sometimes write

f (a) ↓ b.

Definition (Partial functions) Let

f : As(1 ) × · · · × As(n) → As

be a function. Then f is a partial function if, for some input a ∈ As(1 ) × · · · × As(n), f (a) isnot defined; we write

f (a) ↑ .The general idea of an algebra is easy to grasp. Whenever there is a function f : A→ B we

can construct an algebra(A,B ; f )

with carriers A, B and operation f . Thus, we see that:

The concept of an algebra is as general as that of a function.

Indeed, thinking of a collection of functions, we see that:

By simply grouping together the sets of data, elements and functions, we form an algebra.

Thus, algebras are simply a way of organising a collection of elements and functions on sets.Wherever there are functions on sets, there are algebras.

We will give a series of examples of algebras by choosing a set of data and choosing someelements and functions on the data.

3.2. ALGEBRAS OF BOOLEANS 57

3.2 Algebras of Booleans

We begin with what is, perhaps, the simplest and most useful data type: the Booleans.

3.2.1 Standard Booleans

The set B = tt ,ff of truth values or Booleans, where

tt represents true and ff represents false,

has associated with it many useful functions or operations; usually they are called

logical or propositional connectives.

For example, we assume the reader recognises the propositional connectives

Not : B→ B

And : B× B→ B

Or : B× B→ B

Xor : B× B→ B

Implies : B× B→ B

Equiv : B× B→ B

Nand : B× B→ B

Nor : B× B→ B

which are normally defined by truth tables. Here are the truth tables for Not and And :

b Not(b)ff tttt ff

b1 b2 And(b1 , b2 )ff ff ffff tt fftt ff fftt tt tt

It is also worth noting logical implication Implies and equivalence Equiv which defines equalityon Booleans:

b1 b2 Implies(b1 , b2 )ff ff ttff tt tttt ff fftt tt tt

b1 b2 Equiv(b1 , b2 )ff ff ttff tt fftt ff fftt tt tt

By simply choosing constants and a set of connectives we can make various algebras of Booleanssuch as

(B; tt ,ff ;Not ,And)

(B; tt ,ff ;Not ,Or)

(B; tt ,ff ;And ,Or)

(B; tt ,ff ;And , Implies)

(B; tt ,ff ;Nand)

(B; tt ,ff ;Not ,And ,Or , Implies ,Equiv)

(B; tt ,ff ;Not ,And ,Xor ,Nand ,Nor)


Most of the functions listed are binary operations, i.e., they have two arguments. There areinteresting functions of more arguments.

Lemma (Counting) The number of functions on B with k arguments is 2 2 k

. The number ofalgebras on B of the form

(B;Fk)

where Fk is a set of k-ary operations, is 2 22k

.

Proof. Given two sets X and Y with cardinalities |X | = n and |Y | = m, we may deduce thatthe number of maps X → Y is mn .

Now |B| = 2 and take X = Bk and Y = B. Then |X | = 2 k and |Y | = 2 and we maydeduce that the number of k–ary functions Bk → B is 2 2 k

.Given the set X with |X | = n, there are 2 n possible subsets of X . Thus, the number of

sets Fk of k–ary operations on B, with which we can make algebras, is 2 22k

.

Over the set B, in the case n = 2 , there are 16 possible binary connectives and hence atleast 2 16 algebras to be made from them.

All the usual Boolean connectives can be constructed by composition from the functionsNot and And . So, for theoretical purposes, the most useful algebra is the first that we listed,namely:

algebra Booleans

carriers B

constants tt ,ff : → B

operations Not : B→ B

And : B× B→ B

Notice that all two elements of these algebras are listed as constants. It is sufficient to pickone, say tt , since Not creates the other element, say ff = Not(tt).

There are several common notations for propositional connectives, all of them infix: forexample,

Logical Boolean Programming

And(x , y) x ∧ y x&y x .y x&&y

Or(x , y) x ∨ y x + y x ||yImplies(x , y) x ⇒ y x → y x ⊃ y

Not(x , y) ¬x x !x

3.2.2 Bits

Algebras of Booleans are used everywhere in Computer Science, notably to define and evaluatetests on data in control constructs. They are also used to define and design digital hardware.


In this case of hardware design it is customary to use the set

Bit = 1 , 0

with

1 representing high voltage and 0 representing low voltage.

Many simple designs for digital circuits can be developed from

(Bit ; 1 , 0 ;NotBit ,AndBit).

This algebra is equivalent to the algebra on B

(B; tt ,ff ;NotB,AndB)

where

1 corresponds with tt and 0 with ff .

Then there is an obvious practical sense in which these choices result in equivalent algebras.For instance, the tables defining the operations correspond:

x NotBit(x )0 11 0

x y AndBit(x , y)0 0 00 1 01 0 01 1 1

3.2.3 Equivalence of Booleans and Bits

It seems clear that Booleans and Bits are essentially the same data, though they are differentand have different roles in computing. It is tempting to say that they are the same data indifferent notations or representations. Other notations for truth and falsity are

T ,F t , f true, false

in a choice or type faces,

T, F t, f true, false T,F t, f true, false

What is going on exactly?

Equivalence of Algebras Can we formulate in what precise theoretical sense these differentalgebras are equivalent?

The correspondence is made precise by a function

φ : B→ Bit


defined byφ(tt) = 1 and φ(ff ) = 0

that converts truth values to bits.Conversely, we have a function

ψ : Bit → B

defined byψ(1 ) = tt and ψ(0 ) = ff

that converts bits to truth values.We note that these functions express equivalence because

ψ(φ(tt)) = tt and ψ(φ(ff )) = ffφ(ψ(1 )) = 1 and φ(ψ(0 )) = 0 .

Clearly φ is a bijection or one-to-one correspondence with inverse ψ.However, the equivalence also depends upon relating the operations on truth values and

bits. For instance, for any b, b1 , b2 ∈ B,

φ(NotB(b)) = NotBit(φ(b))

φ(AndB(b1 , b2 )) = AndBit(φ(b1 ), φ(b2 )).

And, conversely, for any x , x1 , x2 ∈ Bit ,

ψ(NotBit(x )) = NotB(ψ(x ))

ψ(AndBit(x1 , x2 )) = AndB(ψ(x1 ), ψ(x2 )).

These two sets of equations show that the conversion mappings φ and ψ preserve the oper-ations of the algebras. Thus,

equivalence can be made precise by a mapping φ and its inverse ψ, both of whichpreserve operations.

This precise sense of equivalence is a fundamental notion, and is called the

isomorphism

of algebras: two algebras are equivalent when they are isomorphic. We will study isomorphismlater (in Chapter 7).

3.2.4 Three-Valued Booleans

We can usefully extend the design of these algebras of Booleans by introducing a special valueu to model an unknown truth value in calculations. Let

Bu = B ∪ u.

How do we incorporate the “don’t know” element u in the definition of our logical connectives?


Strict Interpretation

Consider the simple principle:

if one does not know the truth value of an input to a connective then one does notknow the truth value of the output of the connective.

This leads us to extend the definition of the connectives on B to connectives on Bu as follows.Let f : B× B→ B be a binary connective on the Booleans. We define its extension

f u : Bu × Bu → Bu

on known and unknown truth values by

f u(b1 , b2 ) =

f (b1 , b2 ) if b1 , b2 ∈ B;

u if b1 = u or b2 = u.

By this method the truth tables of Not and And are extended to:

b Notu(b)ff tttt ffu u

b1 b2 Andu(b1 , b2 )ff ff ffff tt ffff u utt ff fftt tt tttt u uu ff uu tt uu u u

This extension of Boolean logic is called Kleene 3-value logic. Applying this principle leadsto adaptations of all the algebras given earlier. For example,

algebra Kleene 3 valued logic

carriers Bu

constants tt ,ff , u : → Bu

operations Notu : Bu → Bu

Andu : Bu × Bu → Bu

This algebra for evaluating Booleans may be used to raise errors and exceptions in program-ming constructs involving tests. However a case can be made for other methods of extendingthe operations.


Concurrent Interpretation

Consider the table for Andu . An alternate decision for the definition of this particular connectiveis given by the observation that in processing the conjunction And u(b1 , b2 ) of two tests b1 andb2 , if b1 is false then for any b2 = tt , ff or u,

Andu(ff , b2 ) = ff

(and similarly for b2 false and b1 = tt , ff or u). This results in a connective that ignores thefact that b2 may be unknown. Here is its truth table:

b1 b2 Candu(b1 , b2 )ff ff ffff tt ffff u fftt ff fftt tt tttt u uu ff ffu tt uu u u

It is well known to implementors of programming languages, and is sometimes called parallelor concurrent and, or simply Cand, the idea being that, in computing b1 and b2 in parallel, assoon as one is found to be false, the computation may halt and return And(ff , b2 ) = ff .

Unlike the concurrent calculation of a truth value, we note that a “sequential” calculationthat requires both inputs to be known for the output to be known would be described by thefirst truth table.

3.3 Algebras of Natural Numbers

The data type of natural numbers is profoundly important for all aspects of the theory ofcomputing.

3.3.1 Basic Arithmetic

We can define many useful algebras by selecting a set of functions on the set

N = 0 , 1 , 2 , . . .

3.3. ALGEBRAS OF NATURAL NUMBERS 63

of natural numbers. Consider the following functions:

Succ : N→ N Pred : N→ N

Succ(x ) = x + 1 Pred(x ) =

x − 1 if x ≥ 1 ;

0 otherwise.

Add : N× N→ N Sub : N× N→ N

Add(x , y) = x + y Sub(x , y) =

x − y if x ≥ y ;

0 otherwise.

Mult : N× N→ N Fact : N→ N

Mult(x , y) = x .y Fact(x ) =

x .(x − 1 ). . . . .2 .1

if x ≥ 1 ;

1 if x = 0 .

Quot : N× N→ N Mod : N× N→ N

Quot(x , y) =

largest k : ky ≤ x

if y 6= 0 ;

0 if y = 0 .

Mod(x , y) = (x mod y)

Exp : N× N→ N Log : N× N→ N

Exp(x , y) = x y Log(x , y) =

largest k : x k ≤ yif x > 1 and y 6= 0;

0 if x = 0 or y = 0;1 if x = 1.

Max : N× N→ N Min : N× N→ N

Max (x , y) =

x if x ≥ y ;

y otherwise.

Min(x , y) =

x if x ≤ y ;

y otherwise.

Selected in various combinations, these functions make the following algebras, each of whichhas interesting properties and applications:

(N;Zero; Succ)

(N;Zero;Pred)

(N;Zero; Succ,Pred)

(N;Zero; Succ,Add)

(N;Zero; Succ,Pred ,Add , Sub)

(N;Zero; Succ,Add ,Mult)

(N;Zero; Succ,Pred ,Add , Sub,Mult ,Quot ,Mod)

(N;Zero; Succ,Add ,Mult ,Exp)

(N;Zero; Succ,Pred ,Add , Sub,Mult ,Quot ,Mod ,Exp,Log)


The first algebra in the list is called the standard model of Presburger arithmetic. It isimportant in computation because many sets and functions on N concern counting and theoperation Succ creates all natural numbers from 0. We say Succ is a constructor. We displayit:

algebra Standard Model of Presburger Arithmetic

carriers N

constants Zero : → N

operations Succ : N→ N

In fact, every function on N that can be defined by an algorithm, can be programmed usinga while program over this algebra.

The algebra has some delightful algebraic properties we will meet in Chapter 7.The algebra of natural numbers with successor, addition and multiplication (the sixth in the

list above) is called the standard model of Peano arithmetic. It is important in logical reasoningbecause many sets and functions are definable by first-order logical languages over these threeoperations. We display it:

algebra Standard Model of Peano Arithmetic

carriers N



Add : N× N→ N

Mult : N× N→ N

Again, we note that the standard notations for their operations are all infix: for example,

Succ(x) x+ 1Add(x, y) x+ yMult(x, y) x.y

There will be many occasions when we use the familiar notation

(N; 0;x+ 1, x+ y, x.y).

3.3.2 Tests

So far, the algebras we have presented have contained one carrier or data set, i.e., they aresingle-sorted algebras. Now we consider algebras with several carrier or data sets, i.e., many-sorted algebras.

3.3. ALGEBRAS OF NATURAL NUMBERS 65

To the operations on natural numbers, we may add the characteristic functions of basicrelations (as defined in Section 3.1). For example:

Eq : N× N→ B

Eq(x, y) =

tt if x = y;

ff otherwise.

Lt : N× N→ B

Lt(x, y) =

tt if x < y;

ff otherwise.

And we may add the characteristic functions of interesting sets of numbers:

Even : N→ B

Even(x) =

tt if x is even;

ff otherwise.

Odd : N→ B

Odd(x) =

tt if x is odd;

ff otherwise.

Prime : N→ B

Prime(x) =

tt if x is prime;

ff otherwise.

There are other operations that involve both Booleans and natural numbers, such as:

If : B× N× N→ N

If (b, x, y) =

x if b = tt;

y otherwise.

These, and any other tests on N, require us to add the Booleans to our algebras. Forexample, we can define the following algebra to form Peano Arithmetic with the Booleans :

algebra Standard Model of Peano Arithmetic with the Booleans

carriers N,B


tt ,ff : → B


Add : N× N→ N

Mult : N× N→ N

Eq : N× N→ B

Lt : N× N→ B

If : B× N× N→ N

And : B× B→ B

Not : B→ B


Note, however, if we replace tt ,ff by 1, 0 ⊆ N then tests could be seen as functions onN and the addition of a second sort could be avoided.

3.3.3 Decimal versus Binary

In these examples of algebras of natural numbers, we have not considered the precise details ofthe representation of natural numbers. We have assumed a standard decimal representation,i.e., one with radix b = 10. The functions and algebras described above can be developed for anynumber representation system, for example the binary, octal and hexadecimal representationswith radix b systems for b = 2, 8 or 16.

Equivalence of Algebras The algebras obtained from using radix b are all implementationsof natural number arithmetic: in what precise theoretical sense are these different algebrasequivalent?

Consider the equivalence of two radix b representations, namely

the algebra A10 of decimal arithmetic (with b = 10)

and

the algebra A2 of binary arithmetic (with b = 2).

First, we have a transformation of decimal to binary, which is a function

φ : A10 → A2.

Second, we have a transformation of binary to decimal, which is a function

ψ : A2 → A10.

The transformations must be compatible, namely: for decimals x ∈ A10

ψ(φ(x)) = x

and for binaries y ∈ A2

φ(ψ(y)) = y.

The functions must also “preserve the operations” of the algebras. This means that the standardoperations on decimal and binary notation correspond with each other. For example, adding indecimal then translating to binary gives the same result as translating to binary then addingin binary.

Again, as in the case of Booleans and bits, this basic idea of equivalence of algebras isformalised by the notion of isomorphism. More specifically, functions that preserve operationsare called

homomorphisms,

and the homomorphisms that are also bijections, are called

isomorphisms.

Further algebras of natural numbers arise from the attempt to make finite counting systemsbased on sets of the form

0, 1, 2, . . . , n− 1for some n ≥ 1. We will discuss finite number systems later in Section 5.1.

3.4. ALGEBRAS OF INTEGERS AND RATIONALS 67

3.3.4 Partial Functions

Here are some further types of functions that can be used in algebras of natural numbers.An observation made easily at this early stage is that operations of algebras could be

partial functions.

Several functions on N, including Pred , Sub and Log , have odd-looking definitions. Forexample,

Pred : N→ N

is defined

Pred(x) =

x− 1 if x ≥ 1;

0 if x = 0.

Now 0− 1 is not a natural number, so clearly another option is to leave Pred(x) undefined forx = 0, rather than force it to be 0. We write

Pred(x) =

x− 1 if x ≥ 1;

↑ if x = 0.

where ↑ indicates Pred(0) does not have any value.It is tempting to interpret the computation of Pred(0) as an error. In order to define

Pred(0) = error

we have to expand N toNerror = N ∪ error

and definePred error : Nerror → NError

for x ∈ Nerror by

Pred error (x) =

x− 1 if x > 0;

error if x = 0;

error if x = error .

In the case of predecessor, we can at least test if x = 0. This is not an option for all partialfunctions defined by programs.

We may allow functions to be partial and serve as operations in algebras. Such algebras are

partial algebras.

3.4 Algebras of Integers and Rationals

The extension of the natural numbers by negative numbers, to make the set Z of integers,facilitates calculation and leads to more interesting and useful algebras. The same remark istrue of the extension of the integers to the set Q of rationals to accommodate division; theextension of the rationals to the set R of reals to accommodate measurements of irrational linesegments; and the extension of the reals to the set C of complex numbers to accommodate thesolution of polynomial equations. First let us look at the integers.


3.4.1 Algebras of Integer Numbers

The subtraction x − y of natural numbers x and y is unclear when x < y; the answer 0 is adhoc and loses information:

1− 10 = 0 and 1− 1000 = 0

are different calculations that ought to have different consequences.To calculate efficiently with subtraction we extend the set of natural numbers N to the set

Z = . . . ,−2,−1, 0, 1, 2, . . .

of integers.Many functions on N extend to functions on Z in simple ways.An important algebra of integers is

(Z;Zero,One;Add ,Minus ,Times)

which is displayed:

algebra Integers

carriers Z

constants Zero : → Z

One : → Z

operations Add : Z× Z→ Z

Minus : Z→ Z

Times : Z× Z→ Z

In the more familiar infix notation, we write the algebra

(Z; 0, 1; +,−, .)

Here −x is called the additive inverse operation and subtraction is derived by defining

x− y = x+ (−y).

This algebra is called the ring of integers because it satisfies a certain set of algebraic laws; wewill meet these laws in the next chapter.

Notice all integers can be created from the constants 0 and 1, by applying the operationsof + and −. These operations are constructors.

3.4.2 Algebras of Rationals

The division x/y of integers x and y is not always defined. Take 22/7: clearly, 7 divides 22, 3times with remainder 1 and therefore 22/7 is not an integer.

To calculate efficiently with division we extend the set Z of integers to the set

Q = pq| p, q ∈ Z and q 6= 0

3.4. ALGEBRAS OF INTEGERS AND RATIONALS 69

of rational numbers. Strictly speaking, we have defined a set of representations in which in-finitely many different elements denote the same rational number. For example,

2

1,10

5,20

10,100

50, . . .

or4

2,6

3,8

4, . . .

are the same. We define equality by

p1

q1=p2

q2if, and only if, p1q2 = p2q1.

The basic functions on the integers extend, for example:

p1

q1+p2

q2=p1.q2 + p2.q1

q1.q2

−pq

=−pq

p1

q1.p2

q2=p1.p2

q1.q2

Thus, the rationals are implemented in terms of the integers.An important algebra of rationals is

(Q;Zero,One;Add ,Minus ,Times , Inverse)

which we display:

algebra Rationals

carriers Q

constants Zero : → Q

One : → Q

operations Add : Q×Q→ Q

Minus : Q→ Q

Times : Q×Q→ Q

Inverse : Q→ Q

In the more familiar infix notation, we write the algebra

(Q; 0, 1; +,−, ., x−1)

This algebra is called the field of rational numbers because it satisfies a certain set ofalgebraic laws. The function −1 is called the multiplicative inverse operation. Note that x−1 isnot defined for x = 0, thus this operation is a partial function. Full division is derived by

x.y−1.

Notice all rational numbers can be created from the constants 0 and 1 by applying +, −,−1. Thus, these are constructor operations.


3.5 Algebras of Real Numbers

The data type of real numbers is the foundation upon which geometry, and the measurementand modelling of physical processes, is built. We will study these in depth later.

3.5.1 Measurements and Real Numbers

In making a measurement there is a ruler, scale or gauge that is based on a chosen unit anda fixed subdivision of the unit, for example: feet and inches, grams and milligram, hours andminutes, etc. Measurements are then approximated up to the nearest sub-unit. The numbersthat record such measurements are the rational numbers. For example, 3 minutes 59 secondsis 239/60 seconds.

In ancient Greek mathematics, it was known that certain basic measurements could notbe represented exactly by rational numbers. By Pythagoras’ Theorem, the hypotenuse of aright-angled triangle whose other two sides each measure 1 unit has a length of

√2 units. (See

Figure 3.1.) But√

2 is not a rational number and so this hypotenuse cannot be measured

C

A B1

1

√2

Figure 3.1: A right-angled triangle with hypotenuse of length√

2.

exactly. An argument demonstrating that√

2 is not a rational number appears in Aristotle(Prior Analytics, Book 1 §23). Here is a detailed version:

Theorem√

2 is not a rational number.

Proof We use the method of reductio ad absurdum, or, as it is also known, proof by contradic-tion.

Suppose that√

2 was a rational number. This means that there exists some p, q ∈ Z, suchthat q 6= 0 and

(p

q

)2

= 2. (∗)

By dividing out all the common factors of p and q we can assume, without any loss of generality,that p

qis a rational number in its lowest form, i.e., there is no integer, other than 1 or −1, that

divides both p and q.Now simplifying Equation (∗)

p2 = 2q2. (∗∗)Thus, we know that p2 is an even number, and this implies that p is an even number. If p iseven, then there exists some r > 0 such that

p = 2r.

3.5. ALGEBRAS OF REAL NUMBERS 71

Substituting in Equation (∗∗), we get

(2r)2 = 2q2

4r2 = 2q2

2r2 = q2.

Thus, q2 is also an even number, and this implies q is an even number.We have deduced that both p and q are even and divisible by 2. This contradicts the fact

that p and q have no common divisor, and the assumption that p and q exist. 2

The real numbers are designed to allow a numerical quantity to be assigned to every pointon an infinite line or continuum. Thus, a real number is used to measure and calculate exactlythe sizes of any continuous line segments or quantities. There are a number of standard waysof defining the reals, all of which are based on the idea that

real numbers can be approximated to any degree of accuracy by rational numbers.

To define a real number we think of an infinite process of approximation that allows us to finda rational number as close to the exact quantity as desired. As we will see, in Chapter 9, theseconstructions or implementation methods for the real numbers (such as Cauchy sequences,Dedekind cuts or infinite decimals) can be proved to be equivalent.

The real numbers, like the natural numbers, are one of the truly fundamental data types.But unlike a natural number, a real number is an infinite datum and may not be representableexactly in computations. The approximations to real numbers used in computers must havefinite representations or codings. In practice, there are gaps and separations between adjacentpairs of the real numbers that are represented. In fixed-point representations, the separationmay be the same between all numbers whereas in floating-point representations the separationmay vary and depend on the size of the adjacent values. Calculations with real numbers on acomputer must take account of these approximations and unusual properties that they exhibit.

We will discuss the nature of real numbers in greater depth in Chapter 9. For the momentwe are interested in making algebras of real numbers.

3.5.2 Algebras of Real Numbers

There are many interesting and useful algebraic operations on the set R of real numbers.Consider some of the functions that are associated with the set R of real numbers.

+ : R× R→ R

− : R→ R

. : R× R→ R−1 : R→ R√

: R→ R

| | : R→ R

exp : R× R→ R

log : R× R→ R

sin : R→ R

cos : R→ R

tan : R→ R


Some simple algebras of real numbers can be obtained by selecting various subsets of func-tions and combining them with R. For example:

(R; 0, 1;x+ y, x.y,−x)(R; 0, 1;x+ y, x.y,−x, x−1)

(R; 0, 1;x+ y, x.y,−x, x−1,√x, |x|)

We may add the Booleans and some basic tests to these algebras, for example

=: R× R→ B

<: R× R→ B

Collecting all these functions, and some famous constants, we may display an algebra thus:

algebra Reals

carriers R,B

constants 0, 1, π, e : → R

tt ,ff : → B

operations + : R× R→ R

− : R→ R

× : R× R→ R−1 : R→ R

exp : R× R→ R

log : R× R→ R√: R→ R

| | : R→ R

sin : R→ R

cos : R→ R

tan : R→ R

=: R× R→ B

<: R× R→ B

and : B× B→ B

not : B→ B

Many more functions could be added, of course, but there is much to say about the opera-tions included in the algebra above.

Several are operations which do not return a value on all real number arguments

−1, log,√, tan;

they are partial functions rather than total functions. So this example is a partial algebra.Division −1 : R → R is not defined on the argument x = 0. It can be defined as a total

function by defining division on the set

R− 0

of non-zero real numbers:−1 : (R− 0)→ R.

3.6. DATA AND THE ANALYTICAL ENGINE 73

Thus, the design of an algebra can be altered by restricting the domain of division to ensureit is total and adding its domain as a carrier R − 0 to the algebra. This is more tricky forsome of the other partial functions.

Alternatively, we can require that an error flag is raised, i.e., we define a set

Rerror = R ∪ error

of real numbers with a distinguished element error , and we set

0−1 = error .

Here, we must add the carrier Rerror , the error constant and redefine the operation

−1 : R→ Rerror

along with its values under all the other operations. Similar problems and options arise withthe other partial functions.

Many calculators and programming languages provide error information for partial opera-tions. For any partial function f : Rn → R with domain of definition

dom(f) = x ∈ Rn | f(x) ↓

we may add error messages by defining the new operation

f error : (Rerror )n → Rerror

defined by:

f error (x1, . . . , xn) =

f(x1, . . . , xn) if x ∈ dom(f);

error if x 6∈ dom(f);

error if xi = error for some 1 ≤ i ≤ n.

However the study of algebras of these numbers has been dominated by the study of thealgebras made from the basic operations of addition, subtraction, multiplication and, except inthe case of the integers, division. These particular algebras have many properties in commonand are best studied algebraically as examples of two general types of algebras called

rings and fields.

Other allied algebras made from matrices, polynomials and power series are also examples ofrings and/or fields.

3.6 Data and the Analytical Engine

Recall Babbage’s letter of 1835 about the Analytical Engine in Section 2.2. The conception ofthe engine was to perform the following operations, or any combination of them, to numbersconsisting of 25 figures:


add vi to vk;subtract vi from vk;multiply vi by vk;divide vi by vk;extract the square root of vk;reduce vk to zero.

The example is given of evaluating the formula p =√

(a+ b)/(cd).The five operations of addition, subtraction, multiplication, division, and extraction of roots

mentioned in the Letter to Quetelet are repeated in the introduction of Section III of his 1837description of the Analytical Engine. There he notes that (i) further operations could be addedto the mill and that (ii) the engine “possesses the power of treating the signs of the quantitieson which it operates according to the rules of algebra and thus its use is greatly extended.”

Let us make an idealised model of the data type. Let Rk be the set of all positive andnegative real numbers that can be represented by k figures or digits using the decimal notation.Clearly, there are bounds ±Mk on the range of numbers allowed and

Rk ⊆ [+Mk,−Mk];

for example, Mk = 10k − 1. The details of the set Rk of positive and negative approximationsto reals depend upon the arrangements for number representations and are of great importancefor the work on the machine.

We have an algebra:

algebra Analytical Engine

carriers Rk

constants 0, 1,+Mk,−Mk : → Rk

operations + : Rk × Rk → Rk

− : Rk × Rk → Rk

× : Rk × Rk → Rk

/ : Rk × Rk → Rk√: Rk → Rk

The discussion of data in works on the Analytical Engine is not limited to numbers, op-erations on numbers, and the signs and sizes of numbers. Other aspects of data are present,such as the coding of data, functions, addresses and symbols on punched cards. Further ideasarise concerning algebraic notions such as computing with imaginary numbers and manipu-lating symbolic expressions. Some of the discussions appear in Ada Lovelace’s notes, and inBabbage’s notebooks, but are short and speculative.

3.7 Algebras of Strings

Syntax is constructed by joining together symbols from some alphabet. A string is a sequenceof symbols. Here, we consider how we can build algebras of strings. These are essential buildingblocks that we shall use later on to construct more complex models of syntax for programminglanguages.

3.7. ALGEBRAS OF STRINGS 75

3.7.1 Constructing Strings

Let T be some non-empty alphabet or set of symbols. The nature of this set T is not relevantwhen we consider how we can form and manipulate strings over T . So T could equally well bethe digits 0 to 9,

T = 0, 1, 2, . . . , 9or the letters of the English language,

T = a, b, c, . . . , z

or digits and lettersT = a, b, c, . . . , z, 0, 1, 2, . . . , 9

or digits, letters and punctuation symbols

T = Set of ASCII characters.

We focus here on the set T ∗ of strings over T .The simplest string we can have is the empty string

ε :→ T ∗.

Alternatively, we can build strings from symbols by repeatedly using an operation

Prefix : T × T ∗ → T ∗

to add a symbol to the start of a string.For example, for T = 0, 1, 2, . . . , 9:

Prefix (2,Prefix (0,Prefix (0,Prefix (4, ε)))) = 2004

These two operations are sufficient to be able to build up the set T ∗ of all possible stringsover T . The operations ε and Prefix are constructors. We model this with an algebra

algebra Basic Strings

carriers T,T ∗

constants ε : rightarrowT ∗

operations Prefix : T × T ∗ → T ∗

3.7.2 Manipulating Strings

Once we have a set T ∗ of strings, there are many useful operations we can define on it.We can compose, or concatenate, two strings together with an operation

Concat : T ∗ × T ∗ → T ∗.

For example,Concat(ab, ba) = abba.


Instead of always placing a symbol at the start of a string, we can insert a symbol into aspecified position with a function

Insert : T × N× T ∗ → T ∗.

To ensure that this is a total function, we can place the symbol at the end of the string if thestring is not sufficiently long enough for the specified position. For example:

Insert(r, 3, sting) = string Insert(r, 7, sting) = stingr

The inverse of Insert is to project out a symbol from a string:

Proj : N× T ∗ → (T ∪ error).

Again, this is a potentially hazardous operation; if we choose a non-existent position, we returna value error to ensure that we have a total function, whilst flagging the problem. We cannothide this problem by returning a value that is in the set T , nor can we return the empty stringas ε 6∈ T .

For example,Proj (4, 2004) = 4 Proj (5, 2004) = error .

We can define an operation| | : T ∗ → N

to calculate the length of a string. For example,

|ε| = 0 |ynys| = 4.

We can also define tests on strings, for example, we can test whether two strings are thesame with an operation

Eq : T ∗ × T ∗ → B.

Note that to define this operation, we would need to suppose that the alphabet T also has atest for equality on its symbols. Given such a test, we can also define a test

In : T × T ∗ → B

to check whether a symbol is present in a string.We can design different algebras of strings by choosing some of these operations on strings.

For example,(T ,T ∗; ε;Prefix )(T ∗; ε;Concat)

(T ,T ∗; ε;Prefix ,Concat)(T ,T ∗,N; ε;Prefix , | |)

(T ∗,B;Eq)(T ,T ∗,B,N; ε, tt ,ff , 0;Prefix , In, | |,And , Succ)

We display the fourth of these algebras:


algebra Strings with Length

carriers T ,T ∗,N

constants ε : → T ∗

Zero : → N


| | : T ∗ → N



1. List all the 16 binary truth connectives

f : 0, 1 × 0, 1 → 0, 1.

Make a table of values:b1 b2 · · · ∧(b1, b2) · · ·0 0 · · · 0 · · ·0 1 · · · 0 · · ·1 0 · · · 0 · · ·1 1 · · · 1 · · ·

2. Show that every connective can be expressed in terms of And and Not . Can everyconnective be expressed in terms of:

(a) Or and Not?

(b) Implies and Not?

3. Let X and Y be non-empty sets with |X| = n and |Y | = m. Prove that the number offunctions X → Y is mn and the number of subsets of X is 2n.

Using the provisional definition of an algebra from Section 3.1, derive a formula for thenumber of algebras containing only the set X with n elements and k-ary operations.

How many two-sorted algebras contain both X and Y and have binary operations?

4. Give truth tables for the connectives Or and Implies on tt ,ff , u with the lazy interpre-tation. What identities hold between the connectives? Is it the case that

Implies(x, y) = Not(And(x,Not(y)))?

5. It is possible to represent pairs of numbers by single numbers using pairing functionswhich are a bijection

Pair : N× N→ N,

together with its inverseUnpair : N→ (N× N)

with coordinate functions Unpair 1 : N→ N and Unpair 2 : N→ N, such that

Unpair(x) = (Unpair1(x), Unpair2(x))

for x ∈ N. Prove that,Pair(x, y) = 2x(2y + 1)− 1

is a bijection. What is its inverse?

Using pairing functions show how to define functions

n pair : Nn → N

that represent n-tuples of numbers as single numbers for all n = 2, 3, . . ..


6. Extend the definitions of the functions Quot and Mod , and Max and Min, on N to Z.Can Fact , Exp and Log be extended?

7. Some functions on N are defined from functions on the set Q of rationals or the set R ofreal numbers. For example, consider the function:

NaturalsSqrt : N→ N

NaturalsSqrt(x) = b√xc.where

b c : R+ → N

bxc = (largest k ∈ N)(k ≤ x)

is the floor function which rounds down a real number to the nearest integer. Write awhile program on N to compute NaturalSqrt .

8. Define four algebras over the set Q of rational numbers.

9. Define the set of integers as the product set

Z = +,− × N.

Thus, an integer is represented by a pair (+, n) or (−, n), for a natural number n ∈N. Write out the definition of addition, additive inverse, and multiplication for thisrepresentation of the integers as pairs.

Define a functionφ : N→ Z

byφ(n) = (+, n).

Show that for any m,n ∈ N:

φ(m+ n) = φ(m) + φ(n)

φ(−n) = −φ(n)

φ(m.n) = φ(m).φ(n)

10. Represent the set Q of rational numbers as a subset of Z× Z, namely

Q = (p, q) | p, q ∈ Z and q 6= 0and write out the definitions of equality, addition, additive inverse, multiplication andmultiplicative inverse in terms of this representation of the rational numbers.

11. Show that the operations of addition, additive inverse, and multiplication on Q “simulate”the corresponding operations of

Z = (p, q) ∈ Q | q = 1.in the following sense: Define a function φ : Z→ Q by φ(x) = (x, 1) for all x ∈ Z. Showthat for all x, y ∈ Z,

φ(x+ y) = φ(x) + φ(y)

φ(−x) = −φ(x)

φ(x.y) = φ(x).φ(y)


12. Show that√


13. Show that√

2 +√


14. Which of the following functions on the set R of real numbers are partial (i.e., they arenot always defined) and what are their domains of definition?

(a) x;

(b) 1x;

(c) x2 − 2x+ 1;

(d) 1x2−2x+1

;

(e) x2 + 1;

(f ) x2−1x2+1

;

(g) sin(x);

(h) 1sin(x)

;

(i) cos(x);

(j ) 1cos(x)

;

(k) tan(x);

(l) 1tan(x)

;

(m) ex;

(n) e−x;

(o)√x; and

(p) x√x.

15. In each of these algebras of real numbers, what set of numbers is constructed by repeatedlyapplying the operations to the constants:

(a) (R; 0, 1; +);

(b) (R; 0, 1; +,−);

(c) (R; 0, 1; +,−,−1 ); and

(d) (R; 0, 1; +,−, .,−1 )?

16. Does there exist an algebra of real numbers, with a finite set of constants and operationsthat can construct all real numbers?

17. Define the following functions on strings:

(a) the concatenation function

Concat : T ∗ × T ∗ → T ∗

by using the function Prefix ;


(b) the length function| | : T ∗ → N

by using the function Prefix ;

(c) the equality testEq : T ∗ × T ∗ → B

on strings by using the functions Prefix and the equality test EqT : T × T → B onsymbols; and

(d) the testIn : T × T ∗ → B

for membership of a string by using the functions Prefix and the equality test EqT :T × T → B on symbols.

18. Define functionsLeft : T ∗ × N→ T ∗

Right : T ∗ × N→ T ∗

on strings such thatLeft(w, i)

gives the string consisting of the first i characters of w and

Right(w, i)

gives the string consisting of the characters of w from position i+ 1 onwards.

Show that for all strings w ∈ T ∗, and splitting points i ∈ N,

Concat(Left(i, w),Right(i, w)) = w.

19. Define a testIsRepeated : T × T ∗ → B

to check if a given symbol appears more than once in a string.

Using the function IsRepeated , define a test

IsDistinct : T ∗ → B

that returns true if none of the symbols in a given string appear more than once.

20. A complex number z is an expression of the form x+ iy where x and y are real numbersand i denotes

√−1. Let C be the set of complex numbers. Consider the representation

of the set C by all ordered pairs(a, b) ∈ R2

of real numbers. Define the operations of addition, subtraction, multiplication and divi-sion on this representation R2 of the complex numbers.


21. Define an algebra over the complex numbers. For example, consider the following opera-tions on the complex numbers

mod (x+ iy) =√

x2 + y2

re(x+ iy) = x

im(x+ iy) = y

conj (x+ iy) = x− iy.

22. Design a two-sorted algebra containing the sets

Bit = 0, 1 and n-Word = Bitn

together with constants and operations appropriate for constructing and processing com-puter words.

Chapter 4

Interfaces and Signatures,Implementations and Algebras

We are developing the idea of a data type in several stages. In the first stage, in Chapter 3, weobserved that a data type does not consist of data only, but of

data together with operations and tests.

We made a mathematical model of this programming concept using the idea of an algebra,which we defined, provisionally, as a family of sets combined together with families of elementsand functions. We saw plenty of examples of data types and algebras in this sense.

In this chapter we reach the second stage. Here we will add two vital components to theconcept of data type and algebra, namely, the programming idea of

interface

and the corresponding mathematical idea of

signature

to model it.Now, a data type is a collection of data, operations and tests in a specific representation.

Representations of data are many and varied. There can be minor or major differences inequivalent data representations. For example, it seems hardly to matter whether the Booleansare represented by tt ,ff or T ,F, or some other two-element set; however, representing thenatural numbers in base b = 10 or b = 2 , or some other base, has significant consequences. Ifthe representations of the data types are not “equivalent” then we can expect the differencesto be important at some point in their use.

In the second stage of developing our idea of a data type, we want to separate the form ofthe data type from the details of the representations used in its implementation. To do this,we revise our idea of a data type, by introducing these two aspects:

Data Type = Interface + Implementation.

The interface is defined as a declaration containing names for the data, constants, operationsand tests: it specifies the form of the data type, including the types of data, operations andtests. The implementation is defined in terms of specific representations for the data, operations

83

84CHAPTER 4. INTERFACES AND SIGNATURES, IMPLEMENTATIONS AND ALGEBRAS

and tests in the data type. The names of the data and the operations can be fixed by declaringan interface for a data type, but there will always be considerable variation in the details ofhow the data and operations are implemented.

Now this distinction between form and representation also makes perfect sense for themathematical idea of an algebra. We can also revise the provisional definition by adding adeclaration of names for the sets, constants, and operations. This collection of names is calleda

signature.

It acts simply as an interface to the sets and functions of the algebra. The sets and functionsthat interpret these names constitute an algebra in the earlier, provisional sense of Chapter 3.A data type is still modelled by a many sorted algebra, but now we have:

Algebra = Signature + Interpretation.

It is the distinction between the signature and the interpretation that allows us to modelexplicitly, and begin to analyse mathematically, the endless variation of major and minor detailsassociated with data representations within software. In this analysis problems arise that leadto further stages in the development of the notion of data type.

In this Chapter, we will also introduce some simple ideas about building new data typesand algebras. We will describe in general terms

sharing, adding, and removing data sets and operations

for signatures and algebras. These ideas correspond with the technical concepts of

subalgebra, expansion and reduct,

respectively.The idea of an expansion is one formalisation of the idea that a

data type imports or inherits another data type.

The idea of a reduct is one formalisation of the idea that a

data type encapsulates or hides part of another data type.

Thus, early in our theory, we see data types and algebras acquire some form of architecture.In Section 4.1, we define the notion of signature. Section 4.2 is filled with examples of signa-

tures. In Section 4.3, we give the new definition of algebra based upon signatures. Section 4.4is filled with examples of algebras. In Section 4.5, we introduce the ideas of constructions forgenerating data inside an algebra. In Section 4.6, we use the new definition in looking back atsome of the special algebras we met in the last chapter. In Section 4.7, we discuss subalgebras,expansions and reducts. In Section 4.9 we extend signatures and algebras to allow imports.

4.1. FORMAL DEFINITION OF A SIGNATURE 85

4.1 Formal Definition of a Signature

The provisional definition of an algebra we gave in Section 3.1 must be refined to meet theneeds of both the theoretical analysis and practical application of data types. What is missingfrom the provisional definition is an explicit, independent and complete notation for namingand uniquely identifying each set, constant and operation in an algebra. A collection of namesfor the sets of data, constants and operations of an algebra is called a

signature.

Thus, the proper definition of an algebra consists of both

Syntax the names for the kinds of data, specific data elements and operations on datalisted in the signature, and

Semantics the sets, elements and functions that are assigned to interpret the names.

The importance of the concept of signature in the development and application of the theoryof data cannot be underestimated:

A signature is the interface between an algebra of data and its users.

Therefore, in programming terms,

A signature is a model of the interface between a data type and its users.

We begin by considering the idea using an example.

4.1.1 An Example of a Signature

Reconsider the algebraA = (N;Zero; Succ,Add ,Mult)

of natural numbers that we introduced in Section 3.3.1; it was called the Standard Model ofPeano Arithmetic.

To make a signature we simply choose names for all the components of this algebra anddeclare them. To keep close to the notation we used for A, we choose these names:

signature Peano Arithmetic

sorts nat

constants zero : → nat

operations succ : nat → natadd : nat × nat → natmult : nat × nat → nat

endsig

The name of the signature is the identifier Peano Arithmetic. We will think of the algebraA as the result of assigning to these names some specific sets, elements and functions, a processwe call


interpreting the signature.

Thus,

• sort nat is assigned the set N,

• constant zero is assigned the element Zero ∈ N,

• and operation symbols succ, add and mult are assigned the functions Succ, Add and Multon N.

On assigning the sets and functions to the names in ΣPeano Arithmetic , we get the algebra

A = (N;Zero; Succ,Add ,Mult).

What is the point? For familiar algebras with standard notations, the signature may seempedantic or quite unnecessary. However, a crucial point is that several other algebras could beused to interpret these names. For example,

(i) an equivalent algebra based on binary rather than decimal notation, or

(ii) an inequivalent algebra based on a finite set of numbers, or even

(iii) an algebra having nothing to do with natural numbers.

To a user of the data type, different implementations of ΣPeano Arithmetic are likely to matter.This possible diversity is made explicit by using the signature as a model of an interface todata.

4.1.2 General Definition of a Signature

We now give a formal definition of a signature. The definition is designed to capture a generalconcept in a mathematically satisfying way.

Definition (Sorts and Types) Let S be a non-empty set whose elements will be used toname the sets of data in an algebra. These elements

. . . , s , . . .

of S we call sorts.If S is a non-empty set of sorts naming sets, then an expression

s(1 )× · · · × s(n) or s(1 ) · · · s(n)

for n ≥ 0 can be used to name a Cartesian product of the sets named by s(1 ), . . . , s(n). Thisincludes the empty expression λ (when n = 0 ). These expressions we call product types, andthe number n of sorts is called the arity of the product type. Let S ∗ be the set of all suchexpressions, and S+ = S ∗ − λ.

We can also name the type of a function on the sets by pairs

(s(1 )× · · · × s(n, s) or (s(1 ) · · · s(n), s)

which we will writes(1 × · · · × s(n)→ s

These expressions we call operation or function types, and the number n of arguments is calledthe arity of the function type.

4.1. FORMAL DEFINITION OF A SIGNATURE 87

Example Consider these notations for computing with natural numbers. First, there is the setof sorts. There is just one type of data, so let

S = nat.

The sets of tuples of natural numbers are named by the product types:

λ, nat , nat × nat , nat × nat × nat , . . . , nat × · · · × nat︸︷︷︸

n times

, . . .

or, simply:λ, nat , nat2 , nat3 , . . . , natn , . . .

The operation types are

λ→ nat , nat → nat , nat2 → nat , . . . , natn → nat , . . .

and these describe the type of elements of nat , unary operations, binary operations, . . . , andn-ary operations, respectively.

Note that with one sort there is precisely one product and operation type of arity n.

With this notation for sorts and types, we can define a signature.

Definition (Many-Sorted Signature) A signature consists of:

(i) a name Name for the signature which we call its identifier ;

(ii) a non-empty set S , the elements of which we call sorts ; and

(iii) an S ∗ × S -indexed family〈Σw ,s | w ∈ S ∗, s ∈ S 〉

of sets, the elements of which we call constant and operation symbols :

Constant Symbols For the empty word λ ∈ S ∗ and any sort s ∈ S , each element

c ∈ Σλ,s

is called a constant symbol of sort s . We also write

c :→ s .

Operation Symbols For each non-empty word w = s(1 ) · · · s(n) ∈ S + and any sort s ∈ S ,each element

f ∈ Σw ,s

is called an operation symbol with domain type w , range type s and arity n. We alsowrite

f : s(1 )× · · · × s(n)→ s .

Thus, the signature is the triple

ΣName = (Name, S , 〈Σw ,s | w ∈ S ∗, s ∈ S 〉).


Example Consider the signature for Peano Arithmetic we used in Section 4.1.1 As a tuple ofsorts, constants and operations, a signature has the form

ΣName = (Name; S ; 〈Σw ,s | w ∈ S ∗, s ∈ S 〉).

In this example, the signature isΣPeano Arithmetic

since the name is Peano Arithmetic and the sort set is

S = nat,

the set of constant names isΣλ,nat = zero,

the sets of operation names are

Σnat ,nat = succΣnat nat ,nat = add ,mult

Σnatn ,nat = ∅

for n > 2 .Recall our observations on notation in the Guide to the Reader. The definition of a signature

as a tuple is an example of a definition designed for mathematical analysis because it

(i) captures the concept and its components both abstractly and precisely, and

(ii) is compact and efficient when reasoning about the concept in general.

The definition is essentially an expression or encoding of the idea in the language of set theory.However, as a notation for working with examples, the definition can be too concise and formal.So we use another notation to display the definition.

We also display a signature ΣName in the following manner:

signature Name

sorts . . . , s , . . .

constants ...c : → s

...

operations ...f : s(1 )× · · · × s(n)→ s

...

endsig

4.2. EXAMPLES OF SIGNATURES 89

Our signatures will declare finitely many sorts, constants and operations, so we make arestriction.

Definition (Finite Signatures) A signature is finite if:

(i) there are finitely many sorts, i.e., S is finite; and

(ii) there are finitely many constants and operations, i.e., Σw ,s 6= ∅ for at most finitely manyw , s .

Henceforth we make the assumption:

Assumption We will assume that each signature Σ is finite.

Now let us look at lots of examples of finite signatures.

4.2 Examples of Signatures

We will illustrate the general idea of signature by giving signatures for some of the algebras ofBooleans and numbers we met in Chapter 3, and for some new data. For particular algebras, anumber of choices of notation for the sorts, constants and operations in a signature are alwaysavailable. Sometimes we write operators in prefix form, sometimes in infix, postfix or mixfix.Our examples will be displayed rather than presented as tuples.

4.2.1 Examples of Signatures for Basic Data Types

Booleans From among the many algebras of Booleans, here is a signature for a standard setof Boolean operations:

signature Booleans

sorts Bool

constants true, false : → Bool

operations and : Bool × Bool → Boolnot : Bool → Bool

endsig

The operations are prefix notations. Notice that we have shortened the declaration ofconstants by making a list rather than the two declarations, i.e., true :→ Bool and false :→Bool .

Naturals We met a signature ΣPeano Arithmetic for the Peano Arithmetic in Section 4.1.1;it used prefix notation and was based on the algebra in Section 3.3. If we wanted to usethe standard infix notation for successor, addition and multiplication then we can choose thefollowing signature Naturals :


signature Naturals

sorts nat

constants 0 : → nat

operations + 1 : nat → nat+ : nat × nat → nat. : nat × nat → nat

endsig

Integers In Section 3.4.1, we introduced an algebra of integers using a prefix notation foraddition, subtraction and multiplication. Here is a signature Integers:

signature Integers

sorts int

constants zero, one : → int

operations add : int × int → intminus : int → inttimes : int × int → int

endsig

Next, we give a signature IntegersInfix for the integers that uses the standard infix notationfor addition, subtraction and multiplication:

signature IntegersInfix

sorts int

constants 0 , 1 : → int

operations + : int × int → int− : int → int. : int × int → int

endsig

Reals In Section 3.5, we introduced an algebra of real numbers with lots of basic operations.Here is a signature Reals containing some infix and some prefix notations.


signature Reals

sorts realBool

constants 0 : → real1 : → realπ : → realtrue : → boolfalse : → bool

operations + : real × real → real− : real → real. : real × real → real−1 : real → real√

: real → real

| | : real → realsin : real → realcos : real → realtan : real → realexp : real → reallog : real → real=: real × real → Bool<: real × real → Booland : Bool × Bool → Boolnot : Bool → Bool

endsig

Notice that some of these names are for operations that are partial when applied to all realnumbers. The functions of division −1 , square root

√, tangent tan, and logarithm log are not

defined on all real numbers. However, this is not visible in the notation here. Should it be?We will not introduce a special notation for partiality. Usually our operations will be total.

Perhaps we should refine the signature and introduce special names for sorts for the domainsof partial operations. Sometimes this seems easy, as for the sorts of non-zero real numbers, e.g.,

−1 : real 6=0 → real

and positive real numbers, e.g., √: real≥0 → real .

Of course, we will need to add operations that link these new sorts to the sort real. But it canalso seem clumsy, as in the case of tan.

4.2.2 Signature for Subsets

Suppose we want to calculate with subsets of a given set. We shall have constants

empty :→ subset


to represent the set with no elements, and

universe :→ subset

to represent the set which has all the elements of that which we are calculating over.We shall combine subsets with the usual set manipulation operations:

union :subset × subset → subset

intersection :subset × subset → subset

complement :subset → subset

signature Subsets

sorts subset

constants empty : → subsetuniverse : → subset

operations union : subset × subset → subsetintersection : subset × subset → subsetcomplement : subset → subset

endsig

4.2.3 Signature for Strings

Suppose we have a signature Basic Strings with Length for calculating with strings. We shallwant to be able to create a string from some alphabet. If we allow strings of zero length, thenwe shall want a constant

empty :→ string

for the empty string. We can build a string by prefixing letters from the alphabet using anoperation

prefix : alphabet × string → string .

To measure the length of a string, we shall use the natural numbers, and an operation

length : string → nat .

signature Basic Strings with Length

sorts alphabet , string , nat

constants empty : → stringzero : → nat

operations prefix : alphabet × string → stringlength : string → natsucc : nat → nat

endsig


4.2.4 Storage Media

Thinking abstractly, a storage medium is simply something that stores data. We imagine thatit must have a store, and means of putting data into store and of taking data from store. These“means” are operations on the store. Storage media are specified by different stores equippedwith different input-output operations.

What would be a simple interface for abstract storage media?

The following signature Storage models such an interface:

signature Storage

sorts storeaddressdata

constants empty : → store

operations in : data × address × store → storeout : address × store → data

endsig

For example, data structures store and access data in different ways: there are the record,the array, the stack, the list, the queue, and many more. Some of these, such as the stack, donot have addresses.

4.2.5 Machines

Thinking abstractly, a machine is simply a device that processes input and output. It has amemory and a program; indeed it may be programmable. For simplicity, we imagine that itmust have

(i) a state that combines memory and commands, and

(ii) a means of reading input data into state and of writing the output data from the state.

The input of data changes the state of the machine. These ideas can be expressed by twooperations on the state. What would be a simple interface for abstract machines? An interfaceto a machine is modelled by this signature Machine:

signature Machine

sorts stateinputoutput

constants initial : → state

operations next : state × input → statewrite : state × input → output

endsig


4.3 Formal Definition of an Algebra

With the idea of a signature, we can define the idea of an algebra properly by assigning sets,elements and functions to each sort, constant and operation named in a signature.

Definition (Many-Sorted Algebra) Let

ΣName = (Name; S ; 〈Σw ,s | w ∈ S ∗, s ∈ S 〉)

be a signature. An algebra A with signature ΣName or, more briefly, a

ΣName-algebra A

consists of:

(i) An S -indexed family〈As | s ∈ S 〉

of non-empty sets enumerated by sorts, where for each sort s ∈ S the set As interpretsthe name s and is called the carrier of sort s .

(ii) An S ∗ × S -indexed family〈ΣA

w ,s | w ∈ S ∗, s ∈ S 〉of sets of elements and sets of functions, enumerated by the constant and operation names.

Constants For each sort s ∈ S and empty string λ ∈ S ∗,

ΣAλ,s = cA | c ∈ Σλ,s,

where the elementcA ∈ As

is called a constant of sort s ∈ S which interprets the constant symbol c ∈ Σλ,s in thealgebra.

Operations For each non-empty word w = s(1 ) · · · s(n) ∈ S + and each sort s ∈ S

ΣAw ,s = f A | f ∈ Σw ,s,

wheref A : As(1 ) × · · · × As(n) → As

or, more concisely,f A : Aw → As

is called an operation or function with domain Aw = As(1 ) × · · · × As(n), range As andarity n which interprets the function symbol f ∈ Σw ,s in the algebra.

This long and complicated definition (which includes within it the definition of a signature)seems a long way from the intuition that an algebra is just some sets together with someelements and functions. However, it is essentially the result of pinning down what the namesfor the sets, elements and functions are.

As we have already seen in the provisional definition, and in our illustrative examples, wedisplay algebras in a similar way to signatures:

4.3. FORMAL DEFINITION OF AN ALGEBRA 95

algebra A

carriers . . . ,As , . . .

constants ...cA : → As

...

operations ...f A : As(1 ) × · · · × As(n) → As

...

4.3.1 General Notation and Examples

In our definition of a Σ -algebra A, we have introduced the following new, general but simple,notation for the sets, elements and operations making up an algebra based on the names in thesignature Σ :

Name in Signature Interpretation in Algebrasort name s non-empty set As

product type w = s(1 )× · · · × s(n) Cartesian product As(1 ) × · · · × As(n)

constant name c :→ s element cA ∈ As

operation name f : s(1 )× · · · × s(n)→ s function f A : As(1 ) × · · · × As(n) → As

The point is that this is a general notation for algebras, designed to define and analyse datain general. This notation is fine for general cases but may seem cumbersome in examples,especially examples with familiar, or standard notation. However, even in familiar cases, wherea user knows what to expect of a name, there is plenty of room for error. The signature demandsanswers to the question, given sort name s , what exactly is the implementation of the data inAs . Let us try out this general notation.

Example Let ΣPeano Arithmetic be the signature Peano Arithmetic in Section 4.1.1. Any ΣPeano Arithmetic-algebra A that interprets the signature ΣPeano Arithmetic can be written as follows:

algebra A

carriers Anat

constants zeroA : → Anat

operations succA : Anat → Anat

addA : Anat × Anat → Anat

multA : Anat × Anat → Anat

Now, we choose the signature Peano Arithmetic in Section 4.1.1 with a standard interpre-tation in mind, namely the Standard Model of Peano Arithmetic in Section 3.3.1, in which

(i) the carrier Anat = N = 0 , 1 , 2 , . . ., i.e., the natural numbers in decimal notation;


(ii) the constant zeroA = 0 ; and

(iii) the operations succA = Succ, addA = Add , and multA = Mult .

Of course, for such a familiar algebra there will be occasions when we will use the standardnotation

zeroA = 0 , succA = +1 , addA = +, and multA = .

This standard ΣPeano Arithmetic-algebra is written as follows.

algebra AStandard

carriers N

constants 0 : → N


Add : N× N→ N

Mult : N× N→ N

Clearly, there are other choices for Anat . The signature requires us to pause and be very preciseabout our interpretation.

We will now play around with these notations for algebras.

4.4 Examples of Algebras

4.4.1 Examples of Algebras for Basic Data Types

Let us illustrate the new definition and notations for algebras by interpreting some of thesignatures for the Booleans and numbers given in Section 4.2 Our examples will be displayedrather than presented as tuples.

Booleans Given the signature ΣBooleans for a standard set of Boolean operations, first weapply the general notation for interpreting ΣBooleans . An arbitrary ΣBooleans -algebra has theform:

algebra A

carriers ABool

constants trueA, falseA : → ABool

operations andA : ABool × ABool → ABool

notA : ABool → ABool

The standard interpretation of the Booleans is based on the set

B = tt ,ff

and we might write, using common notation:

4.4. EXAMPLES OF ALGEBRAS 97

algebra AStandard

carriers B


operations ∧ : B× B→ B

¬ : B→ B

Since tt ,ff ∈ B, and ∧ and ¬ are functions on B, we do not have need of superscripts B etc.

Integers We introduced a signature IntegersInfix for an algebra of integers using a prefixnotation for addition, subtraction and multiplication. Here is the general form of a IntegersInfix -algebra using the general notation:

algebra A

carriers Aint

constants zeroA, oneA : → Aint

operations addA : Aint × Aint → Aint

minusA : Aint → Aint

timesA : Aint × Aint → Aint

We also introduced a signature Integers for the integers that uses the standard infix notationfor addition, subtraction and multiplication. The standard form of a Integers-algebra is basedon the set

Z = . . . ,−2 ,−1 , 0 , 1 , 2 , . . . , of integers in decimal notation. In the standard case, we need not drop the reference to Z onthe operators of the signature, say by writing + for +Z.

algebra A

carriers Z

constants 0 , 1 : → Z

operations +Z : Z× Z→ Z

−Z : Z→ Z

.Z : Z× Z→ Z

Reals Here is a ΣReals-algebra of real numbers containing some infix and some postfix nota-tions. We choose some standard representation R of the real numbers.


algebra Reals

carriers R,B

constants 0 R : → R

1R : → R

πR : → R

trueR, falseR : → B

operations +R : R× R→ R

−R : R→ R

.R : R× R→ R−1R

: R→ R√ R : R→ R

| |R : R→ R

sinR : R→ R

cosR : R→ R

tanR : R→ R

expR : R→ R

logR : R→ R

=R: R× R→ B

<R: R× R→ B

andR : B× B→ B

notR : B→ B

In many situations we might drop the superscript and subscript reference to R providingthere could be no confusion as to which functions on reals were intended.

4.4.2 Storage Media

Thinking abstractly, an implementation of a storage medium is simply an algebra of signatureΣ Storage which, using the general notation for interpreting signatures, has the form:

algebra A

carriers Astore ,Aaddress ,Adata

constants emptyA : → Astore

operations inA : Adata × Aaddress × Astore → Astore

outA : Aaddress × Astore → Adata

However, we might choose a more suggestive notation for such an algebra:

4.4. EXAMPLES OF ALGEBRAS 99

algebra A

carriers Store,Address ,Data

constants emptyA : → Store

operations inA : Data × Address × Store → StoreoutA : Address × Store → Data

4.4.3 Machines

Thinking abstractly, an implementation of a machine is simply an algebra of signature Σ Machine .Using a suggestive notation for the carriers, a typical machine is an algebra of the form:

algebra A

carriers State, Input ,Output

constants initialA : → State

operations nextA : State × Input → StatewriteA : State × Input → Output

If the set State is a finite set then A is called a finite state machine.

4.4.4 Sets

Using the signature Σ Subsets-algebra of Section 4.2.2 any Σ Subsets-algebra A using the generalnotation can be displayed in the following form:

algebra A

carriers Asubset

constants emptyA : → Asubset

universeA : → Asubset

operations unionA : Asubset × Asubset → Asubset

intersectionA : Asubset × Asubset → Asubset

complementA : Asubset → Asubset .

The intended interpretation is an algebra A in which

(i) the carrier set Asubset is the power set P(X ) of a non-empty set X , i.e., the set of allsubsets of the set X ;

(ii) the constants are

emptyA = ∅, the empty set;

universeA = X , the given set X ;


(iii) the operations are defined for any V ,W ∈ P(X ) by

unionA(V ,W ) = V ∪W ;

intersetA(V ,W ) = V ∩W ;

complementA(V ) = X − V

This gives us the algebra:

algebra A

carriers Asubset = P(X )

constants ∅ : → P(X )X : → P(X )

operations ∪ : P(X )× P(X )→ P(X )∩ : P(X )× P(X )→ P(X )− : P(X )→ P(X )

4.4.5 Strings

Using the signature Strings from Section 4.2.3 and applying our general mathematical notation,any ΣBasic Strings with Length -algebra A will have the following mathematical form:

algebra A

carriers Aalphabet ,Astring ,Anat

constants emptyA : → Astring

zeroA : → Anat

operations prefixA : Aalphabet × Astring → Astring

lengthA : Astring → Anat

However, when we interpret ΣBasic Strings with Length , we will actually have an algebra of stringsof the form defined in Section 3.7.2, where

(i) the carrier setsAalphabet = T , Astring = T ∗, and Anat = N;

(ii) the constantsemptyA = ε, zeroA = 0 ;

and

(iii) the operations

emptyA = ε, prefixA = Prefix , lengthA = | |, and succA = + 1 .


4.5. ALGEBRAS WITH BOOLEANS AND FLAGS 101

algebra A

carriers T ,T ∗,N

constants ε : → T ∗

0 : → N


| | : T ∗ → N

+ 1 : N→ N

4.5 Algebras with Booleans and Flags

Looking back on the many examples of algebras in Sections 3.2–3.7, some common features arenoticeable. One prominent feature is the special role of the Booleans in defining tests on data inan algebra. Another is the use of special data to flag errors, unknowns, and other exceptional,and usually undesirable, circumstances. We will examine these features in general, and exercisefurther our new official definition of a Σ -algebra.

4.5.1 Algebras with Booleans

Tests on data are needed to govern the flow of control in computations. Thus, most algebrasthat model data types will contain the Booleans, as tests are Boolean-valued functions. Todefine these algebras, we need to

(i) choose a notation for Booleans, and

(ii) require that the notation has a standard interpretation.

Definition (Algebras with Booleans)

(i) A signature Σ is a signature with Booleans if

Bool

is a sort name in Σ ;true, false :→ Bool

are constant symbols in Σ ; and

not :Bool → Bool

and :Bool × Bool → Bool

are operation symbols in Σ .

(ii) A Σ -algebra A is an algebra with Booleans if

Σ


is a signature with Booleans;ABool = B = tt ,ff

is a carrier of A;

trueBool = tt

falseBool = ff

are constants in A; and

notBool(b) = ¬bandBool(b1 , b2 ) = b1 ∧ b2

are the standard operations in B. That is, the sort Bool and its associated constants andoperations have their standard interpretation in A.

Once one has the Booleans, two kinds of test may be added to any algebra, namely,

equality and conditionals.

Total and Partial Equality

For each sort s , we can add the operation name

eqs : s × s → Bool

in Σ . We can interpret this with the operation

eqAs : As × As → B

which we define to have the standard interpretation

eqAs (x , y) =

tt if x = y ;

ff if x 6= y .

Now, it may not be desirable to add equality with the standard interpretation. Two varia-tions are:

eqAs (x , y) =

tt if x = y ;

↑ if x 6= y .

eqAs (x , y) =

↑ if x = y ;

ff if x 6= y .

The last interpretation is to be expected when testing infinite data such as real numbers orinfinite sequences. Given two infinite sequences

x = x (0 ), x (1 ), x (2 ), . . . and y = y(0 ), y(1 ), y(2 ), . . .

it is possible to search for a difference between them, i.e., to

find some n where x (n) 6= y(n)

and so x 6= y . But it may not be possible to test they are equal, i.e., check that

for all n, x (n) = y(n).

and hence x = y .

4.5. ALGEBRAS WITH BOOLEANS AND FLAGS 103

Conditional

For each sort s , we can add the operation name

if s : Bool × s × s → s

in Σ . We can interpret this with the operation

if As : B× As × As → As

which we define by

if As (b, x , y) =

x if b = tt ;

y if b = ff .

4.5.2 Algebras with an Unspecified Element

There are several reasons why we might add a new element to an existing set of data. Forexample, we have added:

(i) u to denote unknown or unspecified in the Booleans (Section 3.2.4); and

(ii) error to denote an exception or error in data types (Sections 3.3.4 and 3.5.1).

And other situations require the addition of:

(iii) over to denote overflow in finite number systems;

(iv) +∞,−∞ to denote points in infinite number systems;

(v) ↑ or ⊥ to denote the undefined value of a function.

Let us define one of these processes in general.Let A be any Σ -algebra with the Booleans. We make a new signature Σ u by adding a new

constant symbol unspecified s of each sort . . . , s , . . . of Σ , and transforming the operations of Σto accommodate the unspecified elements:

signature Σ u

sorts . . . , su , . . .

constants . . . , cu : → su , . . .. . . , unspecified s : → su , . . .

operations . . . , f u : su1 × · · · × su

n → su , . . .. . . , is unspecified s : su → Bool , . . .

Now we show how to make a Σ u-algebra Au from the Σ -algebra A. Let A be a Σ -algebraand consider the effect of augmenting A with special objects

us 6∈ As


to represent an undefined or unspecified datum of sort s . We will make a new algebra Au withcarriers

Aus = As ∪ us.

The constants of Aus are those of As together with

us

interpreting unspecified s .The operations of Au are derived from those of A as follows: let

F : As(1 ) × · · · × As(n) → As

be an operation of A then define the new operation

F u : Aus(1 ) × · · · × Au

s(n) → Aus

by

F u(x1 , . . . , xn) =

F (x1 , . . . , xn) if x1 ∈ As(1 ), . . . , xn ∈ As(n);

us otherwise.

The restriction of operations on Au that makes them return an unspecified value if any of theinput is unspecified is sometimes called a strictness assumption.

We can extract A from Au by means of the Boolean valued function

IsUnspecified : Aus → B

that interprets is unspecified s , which we define for any a ∈ Aus by

IsUnspecified s(a) =

tt if a = us ;

ff otherwise.

algebra Au

carriers . . . ,Aus , . . .

constants . . . , us : → Aus , . . .

. . . ,C u → Aus , . . .

operations . . . , IsUnspecified s : Aus → Bool , . . .

. . . ,F u : Aus(1 ) × · · · × Au

s(n) → Aus , . . .

definitions

. . . , IsUnspecified s(a) =

tt if a = us ;

ff otherwise., . . .

. . . ,F u(x1 , . . . , xn) =

f A(x1 , . . . , xn) if x1 ∈ As(1 ), . . . , xn ∈ As(n);

us otherwise., . . .

4.6. GENERATORS AND CONSTRUCTORS 105

4.6 Generators and Constructors

The constants and operations of an algebra can be chosen for any number of reasons. Theymight be essential or convenient for handling data in a particular situation or application.Different situations will suggest different operations. Now, one job operations do is to compute,construct, or generate new data from given data. A simple question, relevant to any Σ -algebraA, is this:

Given a subset G of elements of A, if we apply repeatedly all the operations of Ato the elements of G, what set of elements do we generate? Can we find a G fromwhich we can generate all the elements of A?

In terms of modelling data types, these questions concern what data can be made fromthe constants and by applying the operations, and, hence, can be accessed by a programapplying the constants and operations declared in the interface. Perhaps there are data in theimplementation that cannot be obtained via the interface?

The complete answers to these questions involve a considerable number of mathematicalideas, including subalgebras and terms. We will take our time over their explanation (endingin Chapter 8!). However, the basic intuitions are easy to grasp. Let us study some examples,and formulate an important definition.

Example (Constructing Natural Numbers) Let Σ be the signature

signature

sorts nat

constants 0 :→ nat

operations succ : nat → nat

endsig

and consider the standard Σ -algebra

A = (N; 0 ; Succ)

where Succ(n) = n+1 . Suppose we apply all the operations of A repeatedly to all the constantsof A. For such a simple algebra, it is easy to see that we generate:

0 , Succ(0 ), Succ(Succ(0 )), Succ(Succ(Succ(0 ))), . . .

which in the standard algebra A is0 , 1 , 2 , 3 , . . . .

In this example, we conclude that

every element of A can be constructed from 0 by applying Succ.

Example (Natural Numbers Again) Now let us change the signature and the algebraslightly. Let Σ be the signature:


signature

sorts nat

constants 0 :→ nat

operations pred : nat → nat

endsig

and consider the standard Σ -algebra

A = (N; 0 ,Pred)

where Pred(0 ) = 0 and Pred(n) = n − 1 if n > 0 .Suppose we apply all the operations of A repeatedly to all the constants of A. This time,

we get0 , Pred(0 ), Pred(Pred(0 )), Pred(Pred(Pred(0 ))), . . .

which in the standard algebra A is0 , 0 , 0 , 0 , . . .

In this example, we can construct very little using the constants and operations.Suppose we apply the operations of A repeatedly to some subset of G ⊆ N. Let G = 10.

Then we get:

Constant: 0 ,Pred(0 ),Pred(Pred(0 )),Pred(Pred(Pred(0 ))), . . .G : 10 ,Pred(10 ),Pred(Pred(10 )),Pred(Pred(Pred(10 ))), . . .

which in A is:0 , 0 , 0 , 0 , . . .10 , 9 , 8 , 7 , 6 , 5 , 4 , 3 , 2 , 1 , 0 , 0 , 0 , . . .

That is, we generate the set:

0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10.

More generally, if we apply the operations of A repeatedly to

G = n, we get 0 , 1 , 2 , . . . , n,

and toG = m, n, we get 0 , 1 , 2 , . . . ,max (m, n).

In general, we have:

Lemma Let G be a finite subset of N. Let max (G) be the largest number in G. Then applyingthe operations of 0 and predecessor in A to G constructs the elements in

0 , 1 , 2 , . . . ,max (G).

Example (Constructing Rational Numbers) Let Σ be the signature

4.6. GENERATORS AND CONSTRUCTORS 107

signature Rationals

sorts rat

constants 0 , 1 : → nat

operations + : rat × rat → rat− : rat → rat. : rat × rat → rat−1 : rat → rat

endsig

and consider the standard ΣRationals-algebra

A = (Q; 0 , 1 ; +,−, .,−1 )

where the operations are the standard arithmetic operations.Suppose we apply all the operations of A repeatedly to all the constants of A. We will list

the elements created in stages.

Addition0 , 0 + 0 , 0 + 0 + 0 , . . . ,1 , 1 + 1 , 1 + 1 + 1 , . . . ,

(We will omit terms such as 0 +1 , 0 +0 +1 +1 , . . . .) We add to these negative numbers:

Subtraction

−1 ,−(1 + 1 ),−(1 + 1 + 1 ), . . .

In particular, in A, we get all the integers from addition and subtraction

. . . ,−3 ,−2 ,−1 , 0 , 1 , 2 , 3 , . . . .

Multiplication Applying multiplication at this stage does not produce new data as the prod-uct of two integers is an integer.

Division However, we introduce some fractions when we apply division:

1

1,

1

1 + 1,

1

1 + 1 + 1, . . .

1

−1,

1

−(1 + 1 ),

1

−(1 + 1 + 1 ), . . .

In particular, in A, we have added

1 ,1

2,1

3, . . . and − 1 ,−1

2,−1

3, . . . .

Now, applying multiplication to integers and the fractions we have, we introduce morenumbers such as:

1 + 1

1 + 1 + 1,1 + 1 + 1

1 + 1,−(1 + 1 )

1 + 1 + 1,−(1 + 1 + 1 )

1 + 1,


or, in general, any fraction

p times︷︸︸︷

1 + 1 + · · ·+ 1

1 + 1 + · · ·+ 1︸︷︷︸

q times

or

p times︷︸︸︷

1 + 1 + · · ·+ 1

−(1 + 1 + · · ·+ 1 )︸︷︷︸

q times

.

In particular, in A, we have added any fraction

p

q

for q 6= 0 .

We conclude that

Lemma (Rational Number Constructors) Every element of the set Q of rational numberscan be constructed from 0 and 1 by applying the operations of addition, subtraction, multipli-cation and division.

These examples illustrate the following technical idea.

Definition (Constructors) Let A be a Σ -algebra. The constants and operations of Σ areconstructors for A if every element of A can be computed by the repeated application of theoperations of A to the constants of A.

4.7 Subalgebras

To conclude this chapter, we consider some simple ideas about comparing and changing alge-bras.

The natural numbers are contained in the integers. The integers are contained in the rationalnumbers. The rational numbers are contained in the real numbers. We are used to expressingthese ideas using sets and the subset relation, thus:

N ⊆ Z ⊆ Q ⊆ R.

Furthermore, we are used to the idea that the operations on these numbers are equivalent.Since R extends Q, Q extends Z, Z extends N, we expect for m, n ∈ N,

n +N m, n +Z m, n +Q m, and n +R m

to be the same number. In fact, this is not correct: there are tricky details concerning repre-sentations of these numbers to take care of before we can make these statements precise andcorrect. For example, depending on the precise definitions of these sets, we have to convertbetween number representations, say, with transformations

n ∈ N 7→ +n ∈ Z

z ∈ Z 7→ +z

1∈ Q

p

q∈ Q 7→ am · · · a0 .b0 · · · bn · · ·

4.7. SUBALGEBRAS 109

This process introduces conceptual and technical complications that we will sort out muchlater (when we write about homomorphisms in Chapter 7). However, intuitively, the idea seemsclear.

When calculating with these numbers we use slightly different operations and, therefore, weuse different signatures and algebras. For example, we might use

(N; 0 ; n + 1 , n + m, n.m)(Z; 0 , 1 ; x + y ,−x , x .y)

(Q; 0 , 1 ; x + y ,−x , x .y , x−1 )(R; 0 , 1 ; x + y ,−x , x .y , x−1 ).

In this section, to express the idea that two algebras share data and operations, such as Q andR, we will introduce the idea of a

subalgebra.

In the next section, to express the idea that one Σ -algebra has more data sets and operationsthan another, we will introduce the concept of an

expansion

and, conversely, that an algebra has fewer data sets and operations than another, we willintroduce the concept of a

reduct.

4.7.1 Examples of Subalgebras

Let us look at a simple example of a subalgebra to introduce the idea. Consider an algebra ofintegers made from addition and subtraction.

Let Σ be the signature:

signature Integers

sorts int

constants 0 : → int

operations + : int × int → int− : int → int

endsig

Let A be the standard Σ -algebra of the integers based on the set

Z = . . . ,−2 ,−1 , 0 , 1 , 2 , . . .

of numbers in decimal notation:A = (Z; 0 Z; +Z,−Z).

Consider the subsetEven = . . . ,−4 ,−2 , 0 , 2 , 4 , . . .


of all even integers, i.e., integers divisible by 2 . If we apply the constant and operations

0 Z,+Z,−Z

of A to even integers, we get even integers.First note that 0 is even. Let 2z1 and 2z2 be any even integers, where z1 , z2 ∈ Z. Then,

2z1 +Z 2z2 = 2 (z1 +Z z2 )

is also even. Let 2z be any even integer where z ∈ Z. Then

−Z2z = 2 (−Zz )

is also even.We say that the even integers Even are

closed under the operations

of A. Now because the set Even is closed, we can make a Σ -algebra

B = (Even; 0 Z; +Z,−Z)

that is contained in the Σ -algebra

A = (Z; 0 Z; +Z,−Z).

This B we call a Σ -subalgebra of A.Notice that the subset

Odd = . . . ,−3 ,−1 , 1 , 3 , . . .of all odd numbers is not closed under the operations of A. For example,

1 +Z 3 = 4 .

4.7.2 General Definition of Subalgebras

Let us formulate the idea of a subalgebra in general.

Definition (Subalgebra) Let A be an S -sorted Σ -algebra. An S -indexed family of subsets

B = 〈Bs ⊆ As | s ∈ S 〉.

forms a Σ -subalgebra of A if

(i) the subsets Bs contain the constants of A named in Σ ; and

(ii) the subsets Bs are closed under the operations of A named in Σ . I.e., for each f ∈ Σ ,applying each operation f A of A to elements of B produces elements of B : for

b1 ∈ Bs(1 ), . . . , bn ∈ Bs(n) ⇒ fA(b1 , . . . , bn) ∈ Bs .

4.7. SUBALGEBRAS 111

B Ab1 bn b

f

b2 . . .

Figure 4.1: A is a subalgebra of B .

Given these conditions, we can make a Σ -algebra using B and the operations of A, as illustratedin Figure 4.1.

If B is a Σ -subalgebra of A then we may omit reference to Σ and simply say that B is asubalgebra of A, writing

B ≤ A.

If B is a subalgebra of A but B 6= A then we say that B is a proper subalgebra of A (or A is aproper extension of B), and write

B < A.

Example (Integers) We can easily generalise the example of the even integers. Let Σ and Abe as in Section 4.1.1. Let n ∈ N and define for n > 0 ,

nZ = z ∈ Z | z is divisible by n .A typical element of nZ has the form

nz

for some z ∈ Z. We show that nZ is the carrier of a subalgebra using the definition inSection 4.7.2.

Claim For any n > 0 , B = (nZ; 0 Z; +Z,−Z) is a subalgebra of A = (Z; 0 Z; +Z,−Z).

Proof Clearly nZ ⊆ Z. We must check the two closure conditions for constants and operations.

(i) There is only one constant symbol 0 ∈ Σ . In B it is interpreted as the standard integerzero of A which is divisible by n and

0 Z ∈ nZ

so nZ is closed under the constants of A.

(ii) There are two operations +,− ∈ Σ . Addition + is interpreted in B as the standardinteger addition of A and

nz1 +Z nz2 = n(z1 +Z z2 )

∈ nZ

So nZ is closed under addition of A. Subtraction − is interpreted in B as the standardinteger subtraction of A and

−Znz = n(−Zz )

∈ nZ

So nZ is closed under subtraction of A.


2

4.8 Expansions and Reducts

When computing with an algebra A it may be necessary, or convenient, to add new sets of dataand appropriate operations. For instance, we may need to add

(i) Booleans,

(ii) natural numbers, or

(iii) finite and infinite sequences of elements from A.

Adding sets and operations leads to the construction of some new algebra B that is an expansionof A.

There is an extensive range of constructions for extending signatures and algebras. Wewill formulate one simple and fundamental definition of an expansion or augmentation of asignature and an algebra.

Definition (Signature Expansion and Reduct) Let Σ be an S -sorted signature and Σ ′ anS ′-sorted signature. We say that Σ ′ is an expansion of Σ , or that Σ is a reduct or subsignatureof Σ ′, if, and only if, Σ ′ has all the sorts, constant names and operation names of Σ . Moreprecisely,

1. Any sort in S is also in S ′; i.e.,S ⊆ S ′.

2. Any constant symbol in Σ is also in Σ ′; i.e., for any sort s ∈ S , and any constant c :→ s ,

c ∈ Σ implies c ∈ Σ ′.

3. Any function symbol in Σ is also in Σ ′; i.e., for any sorts s , s(1 ), . . . , s(n) ∈ S , and forany function symbol f : s(1 )× · · · × s(n)→ s ,

f ∈ Σ implies f ∈ Σ ′.

Definition (Algebra Expansion) Let Σ′

be an expansion of an S -sorted signature Σ . Let Abe a Σ -algebra and B a Σ

′

-algebra, then B is said to be a Σ′

-expansion of A or A is a Σ -reductof B if, and only if, B contains all the carriers, constants and operations of A. More precisely,

1. for each sort s ∈ SAs = Bs ;

2. for each sort s ∈ S , each constant symbol c :→ s ∈ Σ

cA = cB ;

and

4.8. EXPANSIONS AND REDUCTS 113

3. for any sorts s , s(1 ), . . . , s(n) ∈ S , any function symbol f : s(1 )× · · · × s(n)→ s ∈ Σ ,

f A = f B .

We writeB|Σ

to denote the Σ -algebra A obtained from the Σ′

-algebra B by removing the Σ′

operations notnamed in Σ , i.e., the Σ -reduct of the Σ

′

-algebra B . In certain circumstances, the operationsin Σ

′ − Σ are called hidden functions.

Example Suppose we have a signature Σ for real numbers:

signature Reals

sorts real

constants 0 , 1 : → real

operations + : real × real → real− : real → real. : real × real → real−1 : real → real

endsig

and consider the standard algebra of real numbers

A = (R; 0 , 1 ; +,−, .,−1 ).

Let Σ 0 be Σ with the division operator removed. Then, the Σ 0 reduct is the Σ 0 -algebra

A|Σ0

which is A with the division operator removed.

Example We have seen the example of a signature and algebra with the Booleans in Sec-tion 4.3.1.

Suppose we expand Σ to Σ′

by adding sorts constants and operations for Booleans. ThenΣ

′

is

signature Reals with Booleans

sorts real , bool

constants 0 , 1 : → realtt ,ff : → Bool

operations + : real × real → real− : real → real. : real × real → real−1 : real → real

not : Bool → Booland : Bool × Bool → Bool

endsig


and consider the standard algebra of real numbers and Booleans

B = (R,B; 0 , 1 , tt ,ff ; +,−, .,−1 ,¬,∧).

Then the Σ -reduct of the Σ′

-algebra B is

B|Σ

which is, of course, A.

4.9 Importing Algebras

Suppose we have a data type with

interface ΣOld and implementation AOld

and we wish to use it in creating a new data type with

interface ΣNew and implementation ANew .

Suppose, too, that the old data type will not be changed in any way; simply, it will be used inthe new data type. For example, if we want to add arrays to some existing data type of realnumbers, then we must make a new data type containing both the real numbers and arraysof real numbers. Thus, in the type of construction we have in mind, the contents of Σ Old

are included in ΣNew , and the contents of AOld are included in ANew . Using the definition ofsignature and algebra reduct, we express this formally by the condition

ΣNew is an expansion of ΣOld and the reduct ANew |ΣOld = AOld .

The construction is called

importing

and is depicted in Figure 4.2.We start by reflecting on the construction of some of the algebras we met in the previous

chapter, in order to formulate some general techniques for adding and removing data setsand operations from algebras. We extend the notation for displaying signatures with a newcomponent

import

which allows us to describe concisely the addition of new data and operations to an existingsignature. We will use import as a handy notation for specific tasks. However, it is deceptivelysimple, and the general idea of importing is quite complicated as we will see later.

4.9. IMPORTING ALGEBRAS 115

signature ΣNew

imports contentsof ΣOld

ΣOldsignature

algebra AOld

function namesnew sorts, constants and

algebra

imports contentsof AOld

operationsnew carriers, constants and

ANew

Figure 4.2: An impression of the idea of importing.

4.9.1 Importing the Booleans

Most of our data types contain tests that are needed in computations. Therefore, most ofour many-sorted algebras contain the Booleans, their basic operations (e.g., Not , And), andpossibly other Boolean-valued operations (e.g., equality, conditional). They are algebras withBooleans, as defined in Section 4.5.1. Two examples are the two-sorted algebras of PeanoArithmetic with Booleans (in Section 3.3) and real numbers with Booleans (in Section 3.5.2).

Now, we may think of constructing such a data type by adding new sets and operations toan existing data type of Booleans. This leads to the idea of

importing the Booleans

into the new algebra, or of the new algebra

inheriting the Booleans.

Recalling the two stages of any algebraic construction, we describe this process as follows.

Old Signature/Interface First, let ΣBooleans be a signature for the Booleans

signature Booleans

sorts bool

constants true, false : → bool

operations not : bool → booland : bool × bool → bool


New Signature/Interface Suppose we are constructing a new signature ΣNew from thesignature ΣBooleans so that we may define some new operations and tests using the Booleans,such as conditionals and equality. The new signature ΣNew may be defined concisely by

signature New

import Booleans

sorts . . . , snew , . . .

constants . . . , c : → s , . . .

operations . . . , eq s : s × s → bool , . . .. . . , if s : bool × s × s → s , . . .. . . , f : s(1 )× · · · × s(n)→ snew , . . .

where the sorts

• . . . , snew , . . . used are new, i.e., bool is not in the list, and

• the sorts . . . , s , . . . , s(1 ), . . . , s(n), . . . used in the declarations of the new constants andfunctions may be from the new sorts declared, or the sort bool of the Booleans.

This import notation is interpreted by:

(i) substituting all the components of the signature named in import into the sorts, con-stants and operations declared after import in ΣNew ; and

(ii) allowing any sort to be included in the type of any new operation.

The new signature defined by the notation above is simply:

signature New 0

sorts bool ,. . . , snew , . . .

constants true, false : → bool. . . , c : → s , . . .

operations not : bool → booland : bool × bool → bool. . . , eq s : s × s → bool , . . .. . . , if s : bool × s × s → s , . . .. . . , f : s(1 )× · · · × s(n)→ s , . . .

where we have used New 0 to indicate the removal of import. Clearly, ΣNew0 is an expansionof ΣBooleans in the precise sense defined in Section 4.8.


Old Algebra/Implementation Let B be the algebra based on the set B = tt ,ff ofBooleans in Section 3.2:

algebra Booleans

carriers B



And : B× B→ B

New Algebra/Implementation The new ΣNew -algebra ANew has the form:

algebra New

import B

carriers . . . ,As , . . .

constants . . . ,C : → As , . . .

operations . . . ,Eq s : As × As → B, . . .. . . , If s : B× As × As → As , . . .. . . ,F : As(1 ) × · · · × As(n) → As , . . .

This notation is interpreted by substituting the components of the standard model B ofΣBooleans into the relevant carriers, constants and operations. So, the new ΣNew algebra issimply:

algebra New 0

carriers B

. . . ,As , . . .


. . . ,C : → As , . . .


And : B× B→ B

. . . ,Eq s : As × As → B, . . .

. . . , If s : B× As × As → As , . . .

. . . ,F : As(1 ) × · · · × As(n) → As , . . .

Clearly, the algebra ABooleans is a reduct of ANew0, i.e.,

ANew0|ΣBooleans = ABooleans .


4.9.2 Importing a Data Type in General

It is not difficult to see how the construction of importing the Booleans can be adapted to importother basic data types, such as the natural numbers or the real numbers, and indeed how itcan also be generalised to import any data type. To import any data type, the construction isin two stages: on signatures and on algebras.

Old Signature/Interface Suppose we wish to construct the signature ΣNew by importingthe signature ΣOld :

signature Old

sorts . . . , sold , . . .

constants . . . , cold : → sold , . . .

operations . . . , fold : s(1 )old × · · · × s(n)old → sold , . . .

New Signature/Interface The notation for signatures with imports we use has the generalform:

signature New

import Old

sorts . . . , snew , . . .

constants . . . , cnew : → s , . . .

operations . . . , fnew : s(1 )× · · · × s(n)→ s , . . .

Now the idea is that ΣNew contains all the sorts, constants and operations of ΣOld , togetherwith some new sorts . . . , snew , . . ., and constants and operations involving, possibly, both oldand new sorts from . . . , sold , . . . and . . . , snew , . . .. It is very convenient to make the following

Assumption The sort names are actually new, i.e.,

. . . , sold , . . . ∩ . . . , snew , . . . = ∅.

The declarations of the new constants and functions can use either old sorts from Σ Old ornew sorts from ΣNew :

s , s(1 ), . . . , s(n), . . . ∈ . . . , sold , . . . ∪ . . . , snew , . . ..

Flattening

What exactly is this new signature with the import construct? The line

import Old


means that the signature above is an abbreviation; it abbreviates the signature formed bysubstituting the sorts, constants and operations of ΣOld as follows:

signature New 0

sorts . . . , sold , . . .. . . , snew , . . .

constants . . . , cold : → sold , . . .. . . , cnew : → s , . . .

operations . . . , fold : s(1 )old × · · · × s(n)old → sold , . . .. . . , fnew : s(1 )× · · · × s(n)→ s , . . .

Clearly, ΣNew0 is an expansion of ΣOld .The removal of import by means of substitution is called flattening.

Old Algebra/Implementation To complete the construction, we must define an algebraANew of signature ΣNew from an algebra AOld of signature ΣOld .

Let us suppose that we interpret the old signature ΣOld with an algebra AOld :

algebra Old

carriers . . . ,Aolds , . . .

constants . . . ,Cold : → sold , . . .

operations . . . ,Fold : s(1 )old × · · · × s(n)old → sold , . . .

New Algebra/Implementation Let ANew be constructed from AOld by:

algebra New

import AOld

carriers . . . ,Anews , . . .

constants . . . ,Cnew : → As , . . .

operations . . . ,Fnew : As(1 ) × · · · × As(n) → As , . . .

Flattening

Again, this notation means that the contents of AOld is to be substituted. This then gives usthe algebra:


algebra New 0

carriers . . . ,Aolds , . . .

. . . ,Anews , . . .

constants . . . ,Cold : → Aolds , . . .

. . . ,Cnew : → As , . . .

operations . . . ,Fold : Aolds(1 ) × · · · × Aold

s(n) → Aolds , . . .

. . . ,Fnew : As(1 ) × · · · × As(n) → As , . . .

Clearly, on substituting for the import construct, we get the reduct

ANew0|ΣOld = AOld .

Thus ΣOld will be imported into ΣNew , or ΣNew will inherit ΣOld ; and, similarly, AOld willbe imported into ANew , or ANew will inherit AOld .

4.9.3 Example

Consider the following signature for computing with real numbers:

signature Reals with Integer Rounding

import Booleans , Integers ,Reals

sorts

constants

operations less : real × real → boolinclude : int → realround up : real → int

Given the specific signatures

ΣBooleans , Σ Integers and ΣReals

this defines a new signatureΣReals with Integer Rounding

by combining all the sorts, constants and operations, and adding the new ones declared.Flattening the notation for ΣReals with Integer Rounding , we get the following expansion

ΣReals with Integer Rounding0

of ΣBooleans , Σ Integers and ΣReals :


signature Reals with Integer Rounding0

sorts bool

constants true, false : → boolzero, one : → intzeroreal , onereal , pi , e : → real

operations not : bool → booland : bool × bool → booladd : int × int → intminus : int → inttimes : int × int → intadd real : real × real → realminusreal : real → realtimesreal : real × real → realinvert : real → realexp : real × real → reallog : real × real → realsqrt : real → realabs : real → realsin : real → realcos : real → realtan : real → realin : int → realround up : real → int

Similarly, given algebras

ABooleans , AIntegers and AReals

interpreting the three imported signatures, the new ΣReals with Integer Rounding -algebra

AReals with Integer Rounding0

is defined by combining all the carriers, constants and operations, and adding the order relation,the sort conversion of integer to real, and the ceiling function.

This construction has a simple architecture as shown in Figure 4.3:

AIntegersABooleans AReals

AReals with Integer Rounding

Figure 4.3: Architecture of the algebra AReals with Integer Rounding .



1. List all the product types and operation types over the two sorts in S = nat ,Bool.Why do we assume

nat × Bool and Bool × nat

are different product types? Devise formulae that count the number of two-sorted productand operation types of arity n.

2. Devise formulae that count the number of m-sorted product types and operation typesof arity n.

3. Using the provisional definition of an algebra from Section 3.1, are the algebras

(N; 0 ; n + 1 ) and (N; 0 ; n + 1 , n + 1 )

the same algebras or not? Recast the definitions of these algebras using the formaldefinition of an algebra in Section 4.1. How do signatures affect the Counting Lemma inSection 3.2.1?

4. Let Σ be a single-sorted signature with sort s and sets Σsk ,s

of k -ary operation symbols,where k = 0 , 1 , 2 , . . . ,K and K is the maximum arity. Show that the number of Σ -algebras definable on a carrier set X with cardinality |X | = n is

ΠKk=0n

nk .|Σsk ,s

|.

The signature ΣRing of a ring contains two constants 0 and 1 , one unary operation −,and two binary operations + and −. How many ΣRing -algebras are definable on a carrierset X with n-elements? Estimate how many ΣRing -algebras satisfy the axioms for acommutative ring.

5. Let Σ be a two-sorted signature with sorts s(1 ) and s(2 ). Construct a formula forthe number of Σ -algebras definable on sets X1 and X2 of cardinality |X1 | = n1 and|X2 | = n2 . Hence, give a formula for the special case s(1 ) = s , s(2 ) = Bool , |X1 | = nand X2 = tt ,ff .

6. Write down signatures to model interfaces for the following algebras of bits:

(a) bits;

(b) bytes; and

(c) n-bit words.

7. Write down signatures to model interfaces for the following algebras of numbers:

(a) rational numbers; and

(b) complex numbers.

8. Write down a signature to model the interface for the data of Babbage’s 1835 design forAnalytical Engine.


9. Write down signatures to model interfaces for the following data structures:

(a) the array;

(b) the list; and

(c) the stack.

10. A data type interface is modelled by a signature. Model an idea of ”linking” two differentinterfaces by formulating conditions on mappings between signatures.

11. Use algebras to model the design of some of the data types in:

(i) a pocket calculator; and

(ii) a programming system of your choice.

12. Let A, B and C be any S -sorted Σ -algebras. Prove that if B ≤ A, C ≤ A and B ⊆ Cthen B ≤ C .

13. We expand the algebra of Section 4.7.2 for integer addition with multiplication. Let Σbe a signature for the algebra

A = (Z; 0 Z; +Z,−Z, .Z).

Show that for any n ≥ 1 ,B = (nZ; 0 Z; +Z,−Z, .Z)

is a Σ -subalgebra of A.

14. Show that the operations of A = (Z; 0 Z; +Z,−Z) are not constructors. What must beadded to A to equip it with a set of constructors?

15. Show that the algebra(N; 0 ; n + 1 )

has no proper subalgebras. What are the subalgebras of

(N; 0 ; n.− 1 )?

16. Let A = (R; 0 ; +,−) be the algebra of real number addition and subtraction. What arethe carriers of the following subalgebras:

(a) 〈0 〉;(b) 〈1 〉;(c) 〈1 , 2 〉;(d) 〈

√2〉; and

(e) 〈π〉?

17. Let A = (R; 0 , 1 ; +,−, .,−1 ) be the algebra of real number addition, subtraction, multi-plication and division. What are the carriers of the following subalgebras:


(a) 〈0 , 1 〉;(b) 〈2 〉;(c) 〈1

2〉;

(d) 〈√

2 〉; and

(e) 〈π〉?

18. Let A = (a, b, aa, ab, ba, bb, . . .; ε; ) be the algebra of all strings over a, b with concate-nation. Which of the following sets form subalgebras of A:

(a) an | n ≥ 0;(b) bn | n ≥ 0;(c) (ab)n | n ≥ 0;(d) a2n | n ≥ 0;(e) b3n | n ≥ 0;

19. Let

GL(2 ,R) = (

a bc d

)

| ad − bc 6= 0

be the set of non-singular 2 × 2 -matrices with real number coefficients. This set is closedunder matrix multiplication and matrix inversion, so forms an algebra

A = (GL(2 ,R); 1 ; .,−1 )

where

1 =

(1 00 1

)

is the identity matrix. Which of the following sets form subalgebras:

(a) (

1 b0 1

)

| b ∈ Z;

(b) (

a b0 d

)

| ad 6= 0?

20. When working with a particular class K of algebras, it is often important that

if A ∈ K and B is a subalgebra of A, then B ∈ K .

Definition A class K of Σ -algebras is said to be closed under the formation of subalgebrasif, and only if, whenever A ∈ K and B ≤ A then B ∈ K .

(a) Are the classes of semigroups, groups, rings and fields closed under the formation ofsubalgebras?

(b) Is any class of algebras defined by equations closed under the formation of subalge-bras?


(c) Is the class of all finite structures of any signature Σ closed under the formation ofsubalgebras?

21. Using the import notation, re-define the following signatures and algebras:

(a) the standard model of Peano arithmetic with the Booleans (see Section 3.3.1);

(b) the real numbers with the Booleans (see Section 3.5); and

(c) strings with length (see Section 3.7).

22. In the general account of import how restrictive is the assumption that the sorts, con-stants and operation symbols must be new? Give an interpretation of import withoutthe assumption, illustrating your answer with examples.

Chapter 5

Specifications and Axioms

We are developing the idea of a data type in several stages. In this chapter we reach the thirdstage. We will add a new component to the concept of both data type and algebra, namely,the programming idea of a

specification

and the corresponding mathematical idea of an

axiomatic theory

to model it. In the context of data types, the term axiomatic theory is renamed

axiomatic specification

In the second stage of developing our idea of a data type, we revised our idea of a data typeby introducing these two aspects,

Data Type = Interface + Implementation

Whilst the names of the data and the operations can be fixed by declaring an interface fora data type, there will always be considerable variation in the details of how the data andoperations are implemented.

In the third stage, we reflect on this variation of implementations. We need precise criteriafor data type implementations to be either acceptable or unacceptable. We answer the followingquestion:

Specification Problem

How can we specify the properties of a data type for its users?

The user communicates with the data type via its interface which consists of operations. Onesolution is to list some of the algebraic properties of the operations in the interface that anyacceptable implementation of the data type must possess. The algebraic criteria for acceptabledata type implementations form a specification and we propose that for a data type:

Specification = Interface + Properties of Operations.

127

128 CHAPTER 5. SPECIFICATIONS AND AXIOMS

For example, given names for operations on integers, we can require that any implementationmust satisfy the basic laws of arithmetic, like associativity and commutativity, and perhapssome conditions on overflow.

In the mathematical model, the signature fixes a notation, and the interpretation models achoice for the data representation and the input-output behaviour of the algorithms implement-ing the operations of the data type. To analyse mathematically the diversity of representationsand implementations we will postulate a list of algebraic properties of the constants and oper-ations in the signature. We model data type specifications using the mathematical idea of anaxiomatic theory which has the form,

Axiomatic Theory = Signature + Axioms.

An algebra satisfies a theory if the axioms are true of its operations named in the signature.The specification of data types is a deep subject with huge scope. In this book we are

merely pointing out its existence; for some information and guidance, see the Further Reading.In Section 5.1, we take up the idea that interfaces have many useful implementations that

form classes of algebras of common signatures defined by axiomatic specifications. We reflecton how an axiomatic specification provides a foundation for

reasoning about classes of implementations.

We examine the fact that some properties can be proved true of all implementations of a spec-ification, whilst others cannot, and must be added to a specification, if desired. In Section 5.2,we begin with a simple example of a class of implementations of the integers.

For the rest of the chapter, in Sections 5.3, 5.4, 5.5 and 5.7, we meet some axiomaticspecifications of data types and examine some of their uses. To begin with, we are interested inspecifying and reasoning with the data types of the integer, rational and real numbers. Thesesystems have been studied in great depth by mathematicians over the centuries. Indeed, it isthrough the study of number systems, ideas and methods have emerged that can be applied toany data. Specifically, we look at the mathematical ideas of the

commutative ring, field and group

from the point of view of data type specifications, and at equation-solving in these kinds ofalgebras.

5.1 Classes of Algebras

It is usual in using or designing a data type that we end up with not one algebra, but a wholeclass of algebras that satisfy a range of design criteria. The class of algebras has a commoninterface with computations, namely its signature. It ought to have some standard propertiesthat all algebras possess. The signature plays a fundamental role in

(i) making precise the concept of an algebra as a model of an implementation;

(ii) defining classes of algebras with common properties;

(iii) proving or disproving further properties of operations of algebras; and

(iv) comparing algebras.

5.1. CLASSES OF ALGEBRAS 129

5.1.1 One Signature, Many Algebras

The usual situation when modelling data is that a signature Σ is proposed and several Σ -algebras A,B ,C , . . . are constructed as depicted in Figure 5.1. The signature models aninterface and the different algebras model some different implementations for the interface.Commonly, given a signature Σ , there are infinitely many Σ -algebras of interest.

Σ

· · ·one signature/interface

many different algebras/implementationsB CA

Figure 5.1: One interface, many implementations. One signature, many algebras.

For example, the signature Σ Subsets serves as an interface to an algebra of subsets of anychosen set; and the signature ΣBasic Strings with Length serves as an interface to an algebra of stringsover any chosen alphabet.

Definition (All Σ -algebras) Let Σ be any signature. Let

Alg(Σ )

be the class of all algebras with signature Σ .

Thus, any Σ -algebra A is in the class Alg(Σ ). Very rarely are we interested in all Σ -algebras. A signature Σ is designed for a purpose and Alg(Σ ) will contain many algebras thatare irrelevant to that purpose. Usually, given a signature Σ , we are interested in finding arelatively small subclass

K ⊆ Alg(Σ )

of algebras that model meaningful representations or implementations of Σ . To isolate andexplore a subclass K we must find and postulate relevant properties of Σ -algebras that werequire to be true of all the algebras in K .

Definition (Axiomatic Specification) Let Σ be any signature. Let T be any set of prop-erties of Σ -algebras. The pair (Σ ,T ) is called an axiomatic theory or axiomatic specification.Let

Alg(Σ ,T )

be the class of all Σ -algebras satisfying all the properties in T .

In the case that a class K of Σ -algebras satisfies the properties of T , we have

K ⊆ Alg(Σ ,T )

and this is depicted in Figure 5.2.The axiomatic theory or specification (Σ ,T ) performs three tasks when modelling data.

Restriction The properties in T limit and narrow the range of implementation models, sincealgebras failing to satisfy any property in T are discarded.


Alg(Σ)

Alg(Σ, T )

K

Figure 5.2: A subclass K of the Σ -algebras satisfying properties in T .

Standardisation The theory T establishes some basic properties demanded of all implemen-tations, and it equips Σ with properties that are known to be independent of all implementa-tions.

Analysis and Verification The theory T is a basis from which to prove or disprove furtherproperties of the operations in Σ and the implementations.

5.1.2 Reasoning and Verification

Let us consider the last task of verification.Let (Σ ,T ) be an axiomatic specification. Consider some property P based on the operations

in Σ . For example, P might be an equation

t(x1 , . . . , xn) = t ′(x1 , . . . , xn)

where t(x1 , . . . , xn) and t ′(x1 , . . . , xn) are terms formed by composing the operations in Σ andapplying them to variables x1 , . . . , xn . Or P might be the existence of a solution to an equation

(∃x1 , . . . , xn)[t(x1 , . . . , xn) = t ′(x1 , . . . , xn)].

Or P might be the correctnesspSq

of a program S with respect to an input condition p and an output condition q .Suppose the property P can be proved using

(i) the axioms in T ; and

(ii) general principles of logical reasoning.

Then we expect that the property P will be true of every Σ -algebra that satisfies all the axiomsin T . This expectation we will express as an informal principal of reasoning:

Soundness of Deduction Principle If a property P is provable from the specification (Σ ,T )using logical reasoning then P is true of all Σ -algebras A satisfying the axioms in T , i.e., allA ∈ Alg(Σ ,T ).

From the Soundness Principle, we deduce the following useful principle:

5.2. CLASSES OF ALGEBRAS MODELLING IMPLEMENTATIONS OF THE INTEGERS131

Counter-Example Principle If a property P is not true of some A ∈ Alg(Σ ,T ) then Pcannot be proved from the specification (Σ ,T ).

To understand these ideas about proofs, we need to look at some examples and this we shalldo shortly. Indeed, to understand them fully we need to analyse mathematically the conceptsof

(a) property; and

(b) general principles of logical reasoning.

This is the subject matter of Mathematical Logic. Typically, the analysis involves the creationand study of the syntax and semantics for logics, which consist of

(i) a formal logical language for expressing properties as formulae; and

(ii) axioms and proof rules for making formal deductions using the formulae.

Soundness is the condition that if a formula is deduced using the proof rules then it is indeedvalid.

The most important logics are propositional logic and first-order logic.We will explore the idea of classes further, starting with the integers.

5.2 Classes of Algebras Modelling Implementations of

the Integers

Consider the data type of the integers with a simple set of operations. An interface for theinteger data type is modelled by the following signature Σ Integers :

signature Integers

sorts int

constants zero : → int


endsig

The data type of integers allows many implementations. If each implementation is an algebrawith signature Σ Integers then the signature can be interpreted by many algebras modelling theseimplementations. To explore the data type of integers we must explore a class of algebras, eachalgebra having the above signature Σ Integers , i.e., a class

K ⊆ Alg(Σ Integers).

In Section 5.3, we will give a set of axioms that defines such a class K .


Example 1: Standard Model of the Integers

An obvious Σ Integers -algebra is the usual algebra

A = (Z; 0 ; x + y ,−x , x .y)

built from the infinite setZ = . . . ,−2 ,−1 , 0 , 1 , 2 , . . .

of integers and equipped with the standard operations of +, −, and . on the integers. Indeed,let us note that this algebra of integers is the arithmetic of all users. There are others. The setof integers is infinite and must be approximated in machine computations by a finite subset.

Consider the finite sets0 , 1 , 2 , . . . , n − 1

and−M , . . . ,−1 , 0 , 1 , . . . ,+M

consisting of initial segments of the integers. What operations for arithmetic can be defined onthem? We want to build useful operations that can interpret those of the signature Σ Integers .There are some interesting decisions to be made in the designs of algebras modelling imple-mentations of the integers; see Figure 5.3. We will explore these examples here and in the

Max-Min Overflow Cyclic

signature ΣIntegers for integers

· · · · · ·sets of finite integerssets of infinite integers

· · ·· · ·integer base bInteger Model

Interface Model

integer base binteger base 2integer base 2

Figure 5.3: Classes of integer implementations.

exercises.Again, the precise details of the representations of the numbers in these algebras have not

been explained. The data may be written in binary, decimal or any other number base b andthe functions defined accordingly. This observation contributes infinitely many more algebrasto the class of possible semantics for the integers (one for each base b).

Example 2: Cyclic Arithmetic

In cyclic arithmetic, we take the initial segment

Zn = 0 , 1 , 2 , . . . , n − 1

and arrange that the successor of the maximum element n − 1 is the minimum 0 . Thus,counting is circular as shown in Figure 5.4.

Some functions on the integers Z are adapted to Zn by applying the modulus function

Modn : Z→ Zn

5.2. CLASSES OF ALGEBRAS MODELLING IMPLEMENTATIONS OF THE INTEGERS133

0 12

n− 1n− 2

Figure 5.4: Cyclic arithmetic.

defined by

Modn(x ) = (x mod n)

= remainder on dividing x by n

Consider the Σ Integers algebra of integers whose operations are modulo n arithmetic,

Zn = (0 , 1 , 2 , . . . , n − 1; 0 ; +n ,−n , .n).

The operations +n , −n and .n are derived from +, − and . on Z as follows: for x , y ∈0 , 1 , . . . , n − 1,

x +n y = (x + y) mod n

−nx = n − (x mod n)

x .ny = (x .y) mod n.

This choice leads to an infinite family of algebras, one for each choice of n.For example, take n = 5 . The operations have the following tables:

+5 0 1 2 3 4 −5 .5 0 1 2 3 40 0 1 2 3 4 0 0 0 0 0 0 0 01 1 2 3 4 0 1 4 1 0 1 2 3 42 2 3 4 0 1 2 3 2 0 2 4 1 33 3 4 0 1 2 3 2 3 0 3 1 4 24 4 0 1 2 3 4 1 4 0 4 3 2 1

The standard infinite integers Z and the modular arithmetic Zn both have mathematicalelegance.

Example 3: Algebras that are not the integers

There are algebras A ∈ Alg(Σ Integers) that do not model the integers.Suppose we interpret the signature Σ Integers in a trivial way as follows. Let A be any non-

empty set, and choose some a ∈ A. We define the constants and operations in Σ Integers asfollows: for x , y ∈ A,

zeroA = a

addA(x , y) = a

minusA(x , y) = a

timesA(x , y) = a


The resulting algebra A ∈ Alg(Σ Integers) does not qualify as a useful implementation of theintegers, of course.

Other obvious examples are algebras of rational numbers

Q = (Q; 0 ; +,−, .) ∈ Alg(Σ Integers)

and real numbersR = (R; 0 ; +,−, .) ∈ Alg(Σ Integers)

Although these algebras are not models of the integers, they have very similar properties.Next we will consider specifications to remove unwanted models and implementations from

Alg(Σ Integers).

5.3 Axiomatic Specification of Commutative Rings

The standard algebra Z of integers has many convenient and useful algebraic properties, sharedby the rational and real numbers. In particular, these number systems satisfy some simpleequations which together form the set of laws or axiomatic specification for the class of algebrascalled the

commutative rings with identity .

5.3.1 The Specification

Here is a new, abstract signature to capture the basic operations of interest.

signature CRing

sorts ring

constants 0 , 1 : → ring

operations + : ring × ring → ring− : ring → ring. : ring × ring → ring

endsig

Let us postulate that these operations satisfy the following laws or axioms.

5.3. AXIOMATIC SPECIFICATION OF COMMUTATIVE RINGS 135

axioms CRing

Associativity of addition (∀x )(∀y)(∀z )[(x + y) + z = x + (y + z )] (1 )

Commutativity for addition (∀x )(∀y)[x + y = y + x ] (2 )

Identity for addition (∀x )[x + 0 = x ] (3 )

Inverse for addition (∀x )[x + (−x ) = 0 ] (4 )

Associativity for multiplication (∀x )(∀y)(∀z )[(x .y).z = x .(y .z )] (5 )

Commutativity for multiplication (∀x )(∀y)[x .y = y .x ] (6 )

Identity for multiplication (∀x )[x .1 = x ] (7 )

Distribution (∀x )(∀y)(∀z )[x .(y + z ) = x .y + x .z ] (8 )

Let TCRing denote the set of eight axioms for commutative rings. The pair

(ΣCRing ,TCRing)

is an axiomatic specification of the integers.Let

Alg(ΣCRing ,TCRing)

denote the class of all ΣCRing -algebras that satisfy all the axioms in TCRing .

Definition A ΣCRing -algebra A is defined to be a commutative ring if, and only if, it satisfiesthe axioms above, i.e., A ∈ Alg(ΣCRing ,TCRing).

Clearly the axioms are abstracted from the familiar properties of integer, rational and realarithmetic. Hence, the standard ΣCRing algebras

Z,Q and R ∈ Alg(ΣCRing ,TCRing).

More surprisingly, and rather usefully:

Theorem For any n ≥ 2 , the cyclic arithmetic

Zn ∈ Alg(ΣCRing ,TCRing).

5.3.2 Deducing Further Laws

The axioms for a commutative ring with unity are a foundation on which to build general meth-ods for calculating and reasoning with number systems independently of their representations.To give a flavour of abstract calculation and reasoning, we will derive some further laws andtechniques for equation solving.

We will prove some simple laws and identities from the axioms to demonstrate how algebraiclaws can be deduced.

Lemma Let A be a commutative ring with unit. For any x , y ∈ A, the following conditionshold:


(i) (x + y).z = x .z + y .z ;

(ii) 0 .x = x .0 = 0 ;

(iii) x .(−y) = (−x ).y = −(x .y); and

(iv) (−x ).(−y) = x .y.

Proof The deductions are as follows.

(i)(x + y).z = z .(x + y)

= z .x + z .y= x .z + y .z

by Axiom 6 ;by Axiom 8 ;by Axiom 6 .

(ii)0 + 0 = 0

(0 + 0 ).x = 0 .x0 .x + 0 .x = 0 .x

(0 .x + 0 .x ) + (−(0 .x )) = 0 .x + (−(0 .x ))0 .x + (0 .x + (−(0 .x )) = 0

0 .x + 0 = 00 .x = 0

by Axiom 3 ;multiplying both sides by x ;by part (i) of this Lemma;

adding −(0 .x ) to both sides;by Axioms 1 and 4 ;

by Axiom 4 ;by Axiom 3 .

We leave cases (iii) and (iv) to the exercises. 2

To begin with such calculations and derivations are long and slow. To proceed, we need tosimplify notation so that arguments can be shorter and seem more familiar.

Parentheses

Axioms 1 and 2 allow us to drop brackets in summing a list of elements. For example, we canwrite simply

x + y + z

for any of(x + y) + z , x + (y + z ), (x + z ) + y , x + (z + y),(y + x ) + z , y + (x + z ), (y + z ) + x , y + (z + x ),(z + y) + x , z + (y + x ), (z + x ) + y , z + (x + y),etc.

Another convention that is common is to drop the use of the multiplication symbol . inexpressions. For example, we can write

xy + yz + zx

forx .y + y .z + z .x

since the . is easily inferred from the expression.


Formal Integers

If 1A is the unit element in a commutative ring A, then we can denote

1A, 1A + 1A, 1A + 1A + 1A, 1A + 1A + 1A + 1A, . . .

by the familiar notation,1A, 2A, 3A, 4A, . . .

or, more simply,1 , 2 , 3 , 4 , . . .

For example, the notation

5 = 5A = 1A + 1A + 1A + 1A + 1A

when interpreted in the standard infinite model Z of the integers is 5 . However, in the finitecyclic arithmetic models of Z3 it is 2 , Z4 it is 1 , Z5 it is 0 and in Zn for n > 5 it is 5 . It isvital to remember that if one sees 5 .x then this denotes

(1A + 1A + 1A + 1A + 1A).x ,

i.e., the element of the ring formed by adding the unit to itself 5 times and multiplying by x .

Definition (Formal Integers) The elements of the form

1A + 1A + · · ·+ 1A

are called formal integers

We will make heavy use of them in working on the data type of real numbers in Chapter 9.

Polynomials and Factorisation With these conventions, expressions made from the op-erations of ΣCRing are made simple and familiar. However, their meaning in a particularΣCRing -algebra may be complex and surprising. The axioms allow us to work abstractly withoperations over a whole class Alg(ΣCRing ,TCRing) of interpretations.

Familiar algebraic expressions, associated with the integers and reals, like the polynomials

x + 1 x + 2 x + 3x 2 + 3x + 2 x 2 + 5x + 6 x 3 + 6x 2 + 11x + 6

make sense in any commutative ring with unit.

Definition (Polynomials) Let ΣCRing be a signature for commutative rings. Let x be anyvariable. A polynomial in variable x of degree n over A is an expression of the form

anxn + an−1x

n−1 + · · ·+ a1x + a0

where n ≥ 0 , and the coefficients an , an−1 , . . . , a1 , a0 are formal integers, and an 6= 0 .


Using the axioms and lemmas, we can calculate with these formal polynomials. For example,we can expand algebraic expressions to reduce them to standard forms that are valid in allcommutative rings:

(x + 1 )(x + 2 ) = x 2 + 3x + 2 .

The calculation is thus:

(x + 1 )(x + 2 ) = (x + 1 )x + (x + 1 )2= x 2 + 1x + 2x + 2= x 2 + (1 + 2 )x + 2= x 2 + 3x + 2

by Axiom 8 ;by Lemma (i), Axiom 6 and conventions;

by Lemma (i);by convention

Similarly, we can deduce that

(x + 1 ).(x + 2 ).(x + 3 ) = x 3 + 6x 2 + 11x + 6

is valid in any commutative ring.More generally, these calculations are summarised as follows.

Lemma (Factorisation) The following identities are valid in any commutative ring with unit:

(i) (x + p).(x + q) = x 2 + (p + q).x + pq,

(ii) (x + p).(x + q).(x + r) = x 3 + (p + q + r).x 2 + (pq + pr + qr).x + pqr

where p, q and r are formal integers.

Proof We calculate using the axioms

(i) (x + p).(x + q) = (x + p).x + (x + p).q= x .x + p.x + x .q + p.q= x 2 + p.x + q .x + p.q= x 2 + (p + q).x + pq

by Distribution Law (8 )by Lemma 5.3.2(i)

by Commutativity (6 )by Lemma 5.3.2(i)

(ii) We leave this case as an exercise.

2

5.3.3 Solving Quadratic Equations in a Commutative Ring

The solution of polynomial equations of the form

anxn + an−1x

n−1 + · · ·+ a1x + a0 = 0

is a central problem of mathematics with a huge and fascinating history. The problem dependson

(i) where the coefficients come from, and

(ii) where the solutions are to be found.


The usual places are the integers Z, reals R and complex numbers C, of course. We are alsointerested in Zn .

Only in the late eighteenth and early nineteenth centuries were there clear theoretical expla-nations of the problem and techniques of equation-solving. Two theorems stand out. C F Gaussproved the so-called

Theorem (Fundamental Theorem of Algebra) Any polynomial equation of degree n withcomplex number coefficients has n complex number solutions.

Although there are simple algebraic formulae for finding solutions to polynomial equationsof degree n = 1 , 2 , 3 and 4 with complex number coefficients, N H Abel proved the so-called

Theorem (Unsolvability of the Quintic) No simple algebraic formulae, based on polyno-mials augmented by n

√, exist for finding solutions to polynomial equations of degree n ≥ 5 with

complex number coefficients.

The subject of solving polynomials has had a profound effect on the development of algebra,number theory and geometry and on their many applications.

Here we will use equation-solving to give an impression of working with abstract specifica-tions. We consider the problem when the coefficients and solutions come from a commutativering.

Consider a simple quadratic equation. Given a commutative ring with unit A, find all x ∈ Asuch that

x 2 + 3x + 2 = 0 .

From the Factorisation Lemma, we know that for every commutative ring with unit,

x 2 + 3x + 2 = 0 ⇔ (x + 1 )(x + 2 ) = 0 .

Now if A were Z, Q or R, our next step would be

(x + 1 )(x + 2 ) = 0 ⇔ x + 1 = 0 or x + 2 = 0

⇔ x = −1 or x = −2 .

However, is this step valid for all commutative rings or just some?

Definition Let A be a commutative ring with unit. If x , y ∈ A and

x 6= 0 , y 6= 0 and x .y = 0

then x and y are called divisors of zero.

Now the ring A has no divisors of zero if, and only if, for any x , y ∈ A

x .y = 0 implies x = 0 or y = 0 .

This is the property of Z, Q and R that we used in the last stage in solving the equation. Is ittrue of all commutative rings with unit?

Theorem The property(∀x )(∀y)[x .y = 0 ⇒ x = 0 ∨ y = 0 ]

of having no zero divisors is not true of all commutative rings with unit. Hence, the propertycannot be proved from the axioms.


Proof The property is not true of the ring Z4 of integers modulo 4 . The multiplication tableof Z4 is:

. 0 1 2 30 0 0 0 01 0 1 2 32 0 2 0 23 0 3 2 1

Clearly, in Z4 , 2 .2 = 0 but 2 6= 0 .Suppose the property were provable from the axioms. Then it would be true of all algebras

satisfying the axioms. In particular, it would be true of Z4 . Since it is not true of Z4 we havea contradiction and so the property is not provable. 2

Here is an illustration of the Soundness Principle and its corollary in action. We find thatto proceed with our abstract analysis of quadratic equation-solving, we need to refine thespecification by adding an extra necessary property.

Definition (Integral Domain) A commutative ring with unit having no zero divisors is calledan integral domain.

Let TIntD be the set of axioms formed by adding the no zero divisors property to TCRing .Then we have a new specification (ΣCRing ,TIntD) and a new class of algebras Alg(ΣCRing ,TIntD)as shown in Figure 5.5.

Alg(ΣCRing, TIntD)

Alg(ΣCRing, TCRing)

Figure 5.5: Integral domain specification.

Lemma Let A ∈ Alg(ΣCRing ,TIntD) and consider the quadratic equation

x 2 + ax + b = 0

in A. Then, if the quadratic polynomial factorises

x 2 + ax + b = (x + p).(x + q)

for some p, q ∈ A, thenx = −x and x = −q

are solutions.

5.4 Axiomatic Specification of Fields

The standard ΣCRing -algebras of the integer, rational, real and complex numbers are all integraldomains. The last three have an important additional operation:

5.4. AXIOMATIC SPECIFICATION OF FIELDS 141

division

Abstractly, division is a unary operation −1 and the inverse operation for multiplication, inthe sense that

(∀x )[x 6= 0 ⇒ x .x−1 = x−1 .x = 1 ].

If this operation −1 is added to those in ΣCRing , and the inverse axiom is added to TCRing , thenwe can create a new specification for a new data type called fields.

signature Field

sorts field

constants 0 , 1 : → field

operations − : field → field+ : field × field → field. : field × field → field−1 : field → field

endsig

Let us postulate that these operations satisfy the following laws or axioms.

axioms Field

Associativity of addition (∀x )(∀y)(∀z )[(x + y) + z = x + (y + z )] (1 )

Commutativity for addition (∀x )(∀y)[x + y = y + x ] (2 )

Identity for addition (∀x )[x + 0 = x ] (3 )

Inverse for addition (∀x )[x + (−x ) = 0 ] (4 )

Associativity for multiplication (∀x )(∀y)(∀z )[(x .y).z = x .(y .z )] (5 )

Commutativity for multiplication (∀x )[x .y = y .x ] (6 )

Identity for multiplication (∀x )[x .1 = x ] (7 )

Inverse for multiplication (∀x )[x 6= 0 ⇒ x .x−1 = x−1 .x = 1 ] (8 )

Distribution (∀x )(∀y)(∀z )[x .(y + z ) = x .y + x .z ] (9 )

Distinctness 0 6= 1 (10 )

We will study the axioms for a field, and refine them by adding further axioms aboutorderings, when we consider the data type of real numbers in Chapter 9.

Division allows us to advance with our equation-solving.

Lemma (Fields are Integral Domains) A field has no zero-divisors and hence is an integraldomain.


Proof. Let A be a field. Suppose x , y ∈ A are zero-divisors, so

x 6= 0 and y 6= 0 but x .y = 0 .

Now the inverses x−1 and y−1 exist because A is a field. Multiplying them, and substitutingx .y = 0 , we have

(x .y)(x−1 .y−1 ) = 0 .(x−1 .y−1 ),

and so, by commutativity of . and by Lemma 5.3.2(ii),

(x .x−1 ).(y .y−1 ) = 0

1 .1 = 0

1 = 0

which is a contradiction.

Adding division allows us to solve all linear equations

ax + b = 0

in a field.

Lemma (Linear Equation Field Solutions) Let A be a field. Then the equation

ax + b = 0

for a 6= 0 has a solutionx = −(a−1 .b)

in A.

Proof Suppose ax + b = 0 . Adding −b to both sides gives us

(ax + b) + (−b) = 0 + (−b)

which reduces to

a.x + b + (b − b) = −b

a.x + 0 = −b

a.x = −b.

Multiplying both sides by a−1 gives

a−1 .(a.x ) = a−1 .− b

(a−1 .a).x = −(a−1 .b)

1 .x = −(a−1 .b)

x = −(a−1 .b)

2

In fact the solution is unique.We still cannot solve all equations because of the square root function.

5.5. AXIOMATIC SPECIFICATION OF GROUPS AND ABELIAN GROUPS 143

Example The equation x 2 − 2 = 0 does not have a solution in the ring of real numbers.

Lemma Let A be a field. Consider

a.x 2 + b.x + c = 0

where a, b, c ∈ A and a 6= 0 . The following are equivalent:

(i) the equation has two solutions

−b +√

b2 − 4ac

2aand

−b −√

b2 − 4ac

2a

in A; and

(ii) the element√

b2 − 4ac ∈ A.

5.5 Axiomatic Specification of Groups and Abelian Groups

The axioms for a commutative ring and a field abstract the basic properties of

+, −, . and −1

from the corresponding operations on integer, rational and real numbers. In the case of a field,the axioms state:

(i) + is associative and commutative, and has an identity 0 and inverse −1 ;

(ii) . is associative and commutative, and has an identity 1 and inverse −1 ; and

(iii) how + and . interact according to the distributive laws.

Thus, we see that, as far as a field is concerned, + and . have the same basic properties, whilstbeing different operators. These properties of binary operations make up the concept of groupand Abelian group.

5.5.1 The Specification

We specify a group with the signature ΣGroup :

signature Group

sorts group

constants e : → group

operations : group × group → group−1 : group → group

and the laws TGroup :


axioms Group

Associativity of (∀x )(∀y)(∀z )[(x y) z = x (y z )] (1 )

Identity for (∀x )[x e = x ] (2 )

Inverse for (∀x )[x x−1 = e] (3 )

end

Adding the commutativity of gives us an Abelian group:

signature Group

sorts group



with the laws TAGroup :

axioms AGroup

Associativity of (∀x )(∀y)(∀z )[(x y) z = x (y z )] (1 )

Commutativity of (∀x )(∀y)[x y = y x ] (2 )

Identity for (∀x )[x e = x ] (3 )

Inverse for (∀x )[x x−1 = e] (4 )

end

The pairs(ΣGroup ,TGroup) and (ΣGroup ,TAGroup)

are axiomatic specifications.Let

Alg(ΣGroup ,TGroup) and Alg(ΣGroup ,TAGroup)

denote the classes of all ΣGroup algebras that satisfy the axioms in TGroup and TAGroup , respec-tively.

Definition (Group) A ΣGroup-algebra A is defined to be a group if it satisfies the groupaxioms, i.e., A ∈ Alg(ΣGroup ,TGroup).

Definition (Abelian Group) A ΣGroup-algebra A is defined to be an Abelian group if itsatisfies the Abelian group axioms, i.e., A ∈ Alg(ΣGroup ,TAGroup).

Example We have seen the following examples of Abelian groups:

(i) (Z; 0 ; +,−);


(ii) (Q; 0 ; +,−);

(iii) (Q− 0; 1 ; .,−1 );

(iv) (R; 0 ; +,−);

(v) (R− 0; 1 ; .,−1 );

(vi) (Zn ; 0 ; +,−); and

(vii) (Zp − 0; 1 ; .,−1 ) for p a prime number.

There are countless more examples. The most important examples are made from composingtransformations of data, objects and spaces.

5.5.2 Groups of Transformations

A transformation of a non-empty set X is simply a function

f : X → X .

Let T (X ) be the set of all transformations of X .The composition of functions is an operation

: T (X )× T (X )→ T (X )

defined for f , g ∈ T (X ) by(f g)x = f (g(x ))

for all x ∈ X .The operation of composition has an identity element, namely

i : X → X

defined for all x ∈ X byi(x ) = x ,

i.e., the identity function.Not every function in T (X ) has an inverse. If a transformation in T (X ) has an inverse,

then it is said to be invertible. A transformation is invertible if, and only if, it is surjective andinjective, i.e., it is bijective.

Let Sym(X ) be the set of all invertible transformations.Taking the inverse of a transformation is an operation

−1 : Sym(X )→ Sym(X ).

Since composition preserves invertible transformations, i.e.,

f , g ∈ Sym(X ) implies f g ∈ Sym(X )

and the identity transformation i ∈ Sym(X ), we can gather the operations together to form agroup

A = (Sym(X ); i ; ,−1 )

called the group of permutations on X , or symmetric group on X .


Lemma For any set X with cardinality |X| > 1 , the subalgebra A = (Sym(X ); i ; ,−1 ) is agroup, but it is not an Abelian group.

Proof Exercise. 2

5.5.3 Matrix Transformations

Matrices are used to represent geometric transformations and form groups. The transformationsare linear. This example requires an acquaintance with matrices.

A 2 × 2 -matrix of real numbers is an array of the form

a =

(a11 a12

a21 a22

)

where a11 , a12 , a21 , a22 ∈ R. Let M (2 ,R) be the set of all 2 × 2 -matrices. The 2 × 2 -matricesrepresent linear transformations of the plane R2 .

Matrix multiplication is an operation

. : M (2 ,R)×M (2 ,R)→ M (2 ,R)

defined by(

a11 a12

a21 a22

)

.

(b11 b12

b21 b22

)

=

(a11 .b11 + a12 .b21 a11 .b12 + a12 .b22

a21 .b11 + a22 .b21 a21 .b12 + a22 .b22

)

The operation has an identity element, namely

i =

(1 00 1

)

.

Not every matrix has an inverse. If a matrix has an inverse, it is said to be non-singular. Amatrix is non-singular if, and only if, its determinant, defined by the operation

det(a) = a11 .a22 − a12 .a21 ,

is non zero. Let GL(2 ,R) be the set of all 2 × 2 -non-singular matrices.Matrix inversion is an operation

−1 : GL(2 ,R)→ GL(2 ,R)

defined by

a−1 =

(a11 a12

a21 a22

)−1

=

(a22/det(a) −a12/det(a)−a21/det(a) a11/det(a)

)

Now matrix multiplication preserves non-singular matrices, i.e.,

a, b ∈ GL(2 ,R) implies a.b ∈ GL(2 ,R).

And the identity i ∈ GL(2 ,R). Thus, gathering the operations together forms a group

A = (GL(2 ,R); i ; .,−1 )

called the group of 2 × 2 -non-singular matrices, or general linear group.


Lemma GL(2 ,R) is a group but it is not an Abelian group.

Proof. The checking of the three group axioms we leave to the exercises. To see that thecommutative law of Abelian groups is not true of matrix multiplication, note that:

(1 01 1

)

.

(1 10 1

)

=

(1 11 2

)

(1 10 1

)

.

(1 01 1

)

=

(2 11 1

)

5.5.4 Reasoning with the Group Axioms

The group axioms are very simple. However, they capture the fundamental properties ofsymmetry in an abstract and general way. They enable us to develop and prove techniques andproperties with many applications. We will look at equation-solving in groups. First, we notethe following.

Theorem The commutative lawx y = y x

cannot be proved from the three axioms for a group.

Proof. Suppose, for a contradiction, that the commutative law was provable from the groupaxioms. Then, by the Soundness Principle, it would be true of all groups, i.e., all groups wouldbe automatically Abelian. However, we know that the non-singular matrices GL(2 ,R) is agroup that does not satisfy the commutative law. This contradicts the assumption.

Groups have all the properties for solving simple equations

Lemma Let A be any group. For any a, b ∈ A, the equations

x a = b and a x = b

have unique solutionsx = b a−1 and x = a−1 b,

respectively.

Proof. We must use the three group axioms to check the solutions given are correct, and thento show they are the only solutions possible.

Substituting x = b a−1 in LHS of the equation,(b a−1 ) a = b (a−1 a)

= b e= b

by associativity axiom;by inverse axiom;

by identity axiom.So the solution given is indeed a solution.

For uniqueness, suppose x a = b. Thenx = x e

= x (a a−1 )= (x a) a−1

= b a−1

by identity axiom;by inverse axiom;

by associativity axiom;by the initial equation.

Similar arguments work for a x = b and its solution x = a−1 b−1 .


Corollary For all a, b, c, d ∈ A,

c a = d a implies c = da c = a d implies c = d

A group has only one identity element, and each element has only one inverse element.The beauty and utility of the general theories of rings and fields, especially those parts that

focus on these number algebras, is amazing and we will pay tribute to them by not attemptingto trivialise them in this text. The reader is recommended to study the elements of this theoryindependently. Elementary introductions are Birkhoff and MacLane [1965], Herstein [1964] andFraleigh [1967]; advanced works are van der Waerden [1949] and Cohn [1982].

5.6 Boolean Algebra

The data types of Booleans tt ,ff , bits 0 , 1 and subsets P(X ) of any set X , have manyalgebraic properties in common. Much of what they have in common can be captured in a set ofaxioms often called the laws of Boolean algebra. The axioms, or laws, are equations expressingbasic properties of some simple operations. The set of operations and their axioms constitutean axiomatic specification

(ΣBA,TBA)

whose classAlg(ΣBA,TBA)

contains precisely the Boolean algebras. There are a number of equivalent sets of axioms thatcharacterise Boolean algebras. We will choose a slight adaptation of the axiomatisation firstfound by E V Huntington in 1904. Boolean algebra is beautiful and a deep field; see the FurtherReading.

signature BA

sorts s

constants 0 : → s1 : → s

operations ∪ : s × s → s∩ : s × s → s′ s → s

endsig

5.7. CURRENT POSITION 149

axioms BA

Associativity (∀x )(∀y)(∀z )[x ∪ (y ∪ z ) = (x ∪ y) ∪ z )]

(∀x )(∀y)(∀z )[x ∩ (y ∩ z ) = (x ∩ y) ∩ z )]

Commutativity (∀x )(∀y)[x ∪ y = y ∪ x ]

(∀x )(∀y)[x ∩ y = y ∩ x ]

Distribution (∀x )(∀y)(∀z )[x ∪ (y ∩ z ) = (x ∪ y) ∩ (x ∪ z )]

(∀x )(∀y)(∀z )[x ∩ (y ∪ z ) = (x ∩ y) ∪ (x ∩ z )]

Zero (∀x )[x ∪ 0 = x ]

Unit (∀x )[x ∩ 1 = x ]

Complementation (∀x )[x ∪ x ′ = 1 ]

(∀x )[x ∩ x ′ = 0 ]

The associativity axioms in the specification can be proved from the other 8 axioms and sothey are, strictly speaking, redundant. If we remove them, we are left with the original axiomsin Huntington [1904].

Let A be the following ΣBA-algebra of the power set P(X ) of an arbitrary set X .

algebra A

carriers P(X )

constants ∅ : → P(X )

operations ∪ : P(X )× P(X )→ P(X )∩ : P(X )× P(X )→ P(X )′ : P(X )→ P(X )

Theorem The algebra A satisfies all the axioms in TBA, i.e.,

A ∈ Alg(ΣBA,TBA).

5.7 Current Position

The mathematical concepts and the programming ideas about data they model, are summarisedbelow.


Mathematical Concept Notation Model for Programming Concept

signature Σ interface for data type

Σ -algebra A concrete representation or implementation of adata type with interface Σ

class of all Σ -algebras Alg(Σ ) class of all conceivable implementations orrepresentations of a data type with interfaceΣ

axiomatic theory (Σ ,T ) specification of properties that representa-tions or implementations of a data type mustsatisfy

axiomatic class Alg(Σ ,T ) class of all representations or implementa-tions of a data type with signature Σ sat-isfying the properties in T

The discussion in this chapter suggests two points of general interest when designing andspecifying a data type:

Interface We must select names for data and operations.

Specification We must consider what properties of the operations of the data type are neededor desired.

Looking ahead to Chapter 7, and recalling some of the examples in Chapter 3, a third pointof general interest is:

Equivalence We must have ways of telling when two implementations of the data type areequivalent.

A notion of the equivalence of implementations is needed in the comparison of say deci-mal and binary data representations of the integers. In the algebraic theory of data, this isdone by mappings between algebras with the same signature Σ , called Σ -homomorphisms. AΣ -homomorphism establishes a correspondence between the data in the algebras and the oper-ations on that data named in Σ . In particular, two specific implementations, modelled by alge-bras A and B with the same signature Σ , are equivalent if there is a bijective Σ -homomorphismbetween them called an Σ -isomorphism. These ideas are the basis for Chapter 7.



1. Prove that Zn is a commutative ring with unit.

2. Write out the tables for all the cyclic arithmetic operations of Z6 and Z7 .

3. Which of the following satisfy the no zero divisors property, i.e., are integral domains?

(a) Z5 ;

(b) Z6 ;

(c) Z7 ; and

(d) Z8 .

List all the zero divisors in each case, if any exist.

4. Prove that Zn has no divisors of zero, if, and only if, n is prime.

5. Let A be a commutative ring with unit. Consider the following cancellation law : for allx , y , z ∈ A,

x .y = x .z and x 6= 0 implies y = z .

Prove that the following are equivalent:

(a) A satisfies the cancellation law; and

(b) A has no zero divisors.

6. Which of the following equations can be solved in the commutative ring of integers:

(a) x − 3 = 0 ;

(b) x + 3 = 0 ;

(c) 2x + 3 = 0 ;

(d) 3x + 3 = 0 ; and

(e) 9x + 3 = 0?

7. Find all the solutions to the equation 3x = 1 in the commutative rings with unit

(a) Z5 ;

(b) Z6 ;

(c) Z7 ; and

(d) Z8 .

8. Find all the solutions to the equation x 2 +3x +2 = 0 in the commutative rings with unit

(a) Z5 ;

(b) Z6 ;

(c) Z7 ; and


(d) Z8 .

9. Consider maximum and minimum integers. First add constants to the signature Σ Integers

of Section 5.2 of the integers to make the signature

signature Integers with max and min

sorts int

constants zero : → intmax : → intmin : → int


endsig

Any Σ Integers -algebra A can become a Σ Integers with max and min -algebra on choosing elementsof A and interpreting Max and Min.

(a) Consider the alternative overflow equations for the maximum and minimum elements

Max + 1 = Min and Min − 1 = Max .

Show that modulo n arithmetic Zn forms a Σ Integers with max and min -algebra that sat-isfies these properties.

(b) Consider the overflow equations

Max + 1 = Max and Min − 1 = Min.

Show, by adding the symbols +∞ and −∞ to Z, how to extend the standard alge-bra of integers to form the Σ Integers with max and min -algebra that satisfies the overflowequations. Does the algebra satisfy the properties of a commutative ring?

(c) Test how Max + 1 works

(i) in your favourite programming language;

(ii) using your favourite spreadsheet; and

(iii) using a pocket calculator.

10. Prove that Zn is a commutative ring with unit.

11. Write out the tables for all the cyclic arithmetic operations of Z6 and Z7 .

12. Which of the following satisfy the no zero divisors property, i.e., are integral domains?

(a) Z5 ;

(b) Z6 ;


(c) Z7 ; and

(d) Z8 .

List all the zero divisors in each case, if any exist.

13. Prove that Zn has no divisors of zero, if, and only if, n is prime.

14. Let A be a commutative ring with unit. Consider the following cancellation law : for allx , y , z ∈ A,

x .y = x .z and x 6= 0 implies y = z .

Prove that the following are equivalent:

(a) A satisfies the cancellation law; and

(b) A has no zero divisors.

15. Which of the following equations can be solved in the commutative ring of integers:

(a) x − 3 = 0 ;

(b) x + 3 = 0 ;

(c) 2x + 3 = 0 ;

(d) 3x + 3 = 0 ; and

(e) 9x + 3 = 0?

16. Find all the solutions to the equation 3x = 1 in the commutative rings with unit

(a) Z5 ;

(b) Z6 ;

(c) Z7 ; and

(d) Z8 .

17. Find all the solutions to the equation x 2 +3x +2 = 0 in the commutative rings with unit

(a) Z5 ;

(b) Z6 ;

(c) Z7 ; and

(d) Z8 .

18. Consider maximum and minimum integers. First add constants to the signature Σ Integers

of Section 5.2 of the integers to make the signature


signature Integers with max and min

sorts int

constants zero : → intmax : → intmin : → int


endsig

Any Σ Integers -algebra A can become a Σ Integers with max and min -algebra on choosing elementsof A and interpreting Max and Min.

(a) Consider the alternative overflow equations for the maximum and minimum elements

Max + 1 = Min and Min − 1 = Max .

Show that modulo n arithmetic Zn forms a Σ Integers with max and min -algebra that sat-isfies these properties.

(b) Consider the overflow equations

Max + 1 = Max and Min − 1 = Min.

Show, by adding the symbols +∞ and −∞ to Z, how to extend the standard alge-bra of integers to form the Σ Integers with max and min -algebra that satisfies the overflowequations. Does the algebra satisfy the properties of a commutative ring?

(c) Test how Max + 1 works

(i) in your favourite programming language;

(ii) using your favourite spreadsheet; and

(iii) using a pocket calculator.

19. Use the axioms in TBA to deduce that the following equations hold for every Booleanalgebra:

(a) 0 ′ = 11 ′ = 0

(b) x ∪ x = xx ∩ x = x Idempotent Laws

(c) (x ′)′ = x Double Negation Law

(d) (x ∪ y)′ = x ′ ∩ y ′

(x ∩ y)′ = x ′ ∪ y ′ De Morgan’s Laws

20. Show the following equations are not valid in A:


(a) 0 ∩ x = x ;

(b) 1 ∪ x = x ; and

(c) 0 = 1 .

Correct the equations and prove the corrected properties are valid in all Boolean algebras.

21. When working with a particular class K of algebras, it is often important that

if A ∈ K and B is a subalgebra of A, then B ∈ K .

Definition A class K of Σ -algebras is said to be closed under the formation of subalgebrasif, and only if, whenever A ∈ K and B ≤ A then B ∈ K .

(a) Are the classes of semigroups, groups, rings and fields closed under the formation ofsubalgebras?

(b) Is any class of algebras defined by equations closed under the formation of subalge-bras?

(c) Is the class of all finite structures of any signature Σ closed under the formation ofsubalgebras?

Chapter 6

Examples: Data Structures, Files,Streams and Spatial Objects

We began our study of data in Chapter 3 by introducing six kinds of data, namely

Booleans, natural numbers, integers, rational numbers, real numbers and strings.

We used them to explore some important ideas about data types. In particular, in Chapter 4,we showed how to model data types in general, using signatures and algebras. In Chapter 5, weshowed how to specify data types using axiomatic theories. Most data types are constructedfrom these six data types. Indeed, in practice, one can make do with Booleans, integers, realsand strings.

Now we will consider modelling some new examples of data types, drawn from differentsubjects. Each example uses some general methods that construct new data types from old.The constructions involve adding new data sets and operations to data types, and importingone data type into another (recall Sections 4.7–4.9).

A data structure has operations for reading and updating and is a data type to be modelledby an algebra. Data structures are general constructions which can be made to store any data.They are sometimes called generic data types because they are designed to be applied to a wideclass of data types. We begin in Sections 6.1–6.2 with algebraic constructions of the populardata structures:

records and arrays.

We can apply these constructions to make many data types, such as,

records of strings and integers, and arrays of real numbers.

Hopefully, such data types are (or seem!) familiar. The constructions are not difficult. Theyprovide us with practice in working with general ideas and in modelling data; and they resultin some data types we will use later. Of course, there are dozens more data structures toconsider, such as stacks, lists, queues, trees, graphs, etc. See textbooks on data structures(e.g., Dale and Walker [1996]) for these and many more to think about.

Next in Section 6.3, we model algebraically the ubiquitous data type of

files.

157

158CHAPTER 6. EXAMPLES: DATA STRUCTURES, FILES, STREAMS AND SPATIAL OBJECTS

Again, this is a general construction and can be applied to make many data types, such as

files of text, measurements or images.

Files are a means of ensuring the persistency of data. Thus, a file is principally used forthe storage of data for subsequent retrieval. The means of storing and accessing the data mayvary, and leads to different models.

Now we turn to two data types that are natural and useful in modelling all sorts of systems— both natural and artificial — to be found in the world. We will model how data is distributedin time and space.

In Section 6.4, we model algebraically the important data type of

infinite streams.

Again this is a general construction and can be applied to make many data types such as

infinite streams of bits, real numbers or strings.

Now a stream is a sequence. . . , at , . . .

of data at indexed by time t . Time may be discrete or continuous. Streams are used to model

interactive systems.

An interactive system is a system that interacts with an environment over time. The interactivesystem continually receives inputs from its environment, computes for a period, and returnsa result to its environment. Examples of interactive systems abound in digital hardware,operating systems and networks; and, more visibly, in software that control machines or provideinteractive services for users. Typically, the system computes forever, measured by a discreteclock, and its behaviour can be modelled by a

transformation of streams

as shown in Figure 6.1.

interactivesystem

. . . , at, . . . , a1, a0 . . . , bt, . . . , b1, b0

input output

Figure 6.1: A typical interactive system computing by transforming streams.

We define a simple property of stream transformations called

finite determinacy

which is a necessary property of stream transformers modelling computing systems. A streamtransformation is finitely determined if its output at any time depends only on its input over afinite time interval.

159

As a case study, we examine the process of doing arithmetic with real numbers using theirinfinite decimal expansions. An infinite decimal is, essentially, a stream of digits, and multipli-cation transforms streams of digits to streams of digits. We prove that multiplication cannotbe defined by a finitely determined stream transformer, and, hence, cannot be defined by analgorithm!

Lastly in Section refsection: spatial objects, we model algebraically a data type of

spatial objects.

This general construction can be applied to model

graphical objects and scenes

and

states of physical systems.

A spatial object is an association. . . , ax , . . .

of data to all points x in space. Space can be continuous or discrete. The data is arbitrary.There are lots of operations on spatial objects modelled by an algebra. We apply these generalideas in creating a data type of use in computer graphics.

In Volume Graphics, objects are represented in three dimensions: every point in space hasdata of interest. Plenty of operations are needed to construct, transform and visualise scenes.We make a data type of spatial objects in which data that represents

visibility and colour

is assigned to every point of three-dimensional space. The data are real numbers specifyingvalues for

opacity, and red, green and blue,

respectively. We define a collection of four simple operations

union, intersection, difference and blending

on these three-dimensional spatial objects to make an algebra capable of generating somebeautiful visualisations in easy ways.

These data types involve constructions of new data types from existing data. Given ouralgebraic model of data types, we will model these data type constructions in two stages:

Signature/Interface Given a signature ΣOld , we construct a new signature ΣNew . This isdone by adding new sorts and operations to ΣOld .

Algebra/Implementation Given any ΣOld -algebra AOld , we construct a new ΣNew -algebraANew . This is done by adding new carriers and functions to AOld that interpret the new sortsand operations added to make ΣNew .

For example, given a signature Σ and Σ -algebra A, we show how to create a signatureΣArray of arrays over Σ , and corresponding algebra AArray of arrays over A.

In constructing ΣNew from ΣOld , and ANew from AOld , we will use the idea of expandingdiscussed in Section 4.7 and which introduces the import construct to the data types.


6.1 Records

The record is an invaluable programming construct for representing all sorts of user-defineddata. It is based on the idea of the Cartesian product of sets. A record data structure has fixedlength n, and is able to store n fields of data. These fields of data may be of different types.We will specify some simple operations on records, and so make an algebraic model of the datatype of records.

Suppose we have collected all the relevant fields of data to form some Σ -algebra A. Supposethe sorts of Σ are the names

. . . , s , . . .

for these fields chosen by the programmer with the application in mind. Suppose the fields ofdata are the carrier sets of A:

. . . ,As , . . .

Typically, the field algebra A is ultimately made from a collection of basic types like Booleans,integers, strings and real numbers.

We fix the type of the record as follows. Let

w = s(1 )× · · · × s(n)

be a product type over (some of) these sorts. Note that the individual fields are not necessarilydistinct from each other (i.e., we allow the case where s(i) = s(j ) but i 6= j ). Then we canconstruct an algebra

Recordw(A)

to model records with data fields As(1 ), . . . ,As(n) and length n.

6.1.1 Signature/Interface of Records

We start by constructing a signature to name the data sets of the model that we will construct.

Old Signature/Interface Suppose we have some signature Σ for the fields with name Nameand sorts

. . . , s, . . .

New Signature/Interface To the sorts . . . , s , . . . from the signature Σ , we add a new sort

recordw

to name the records of type w where

w = s(1 )× · · · × s(n)

is a product type of not necessarily distinct field types.Now let us consider what operations we want on records. As for any data structure, we

need to create records and change them.We create a record from constituent fields with a constructor function

create : s(1 )× · · · × s(n)→ recordw

6.1. RECORDS 161

This operation allows us to store data in a record.Now we also want to be able to retrieve this data. For each of the n fields, we define a

functionget field i : recordw → si

to return the i th field of a record; in programming languages these operations are typicallywritten

.si .

So far, we have created a static data structure — we cannot change any aspect of a recordonce we have created it. To rectify this situation, we can define functions

change field i : si × recordw → recordw

that will change the data stored in just the i th field of a record.Using the import notation, we combine these elements to give a signature ΣRecordw of

records:

signature Recordw

import Name

sorts recordw

constants

operations create : s(1)× · · · × s(n)→ recordw

get field1 : recordw → s(1)...

...get fieldn : recordw → s(n)change field1 : s(1)× recordw → recordw

......

change fieldn : s(n)× recordw → recordw

These operations may be expected to satisfy the following axioms:

axiomsget fieldi(create(x1 , . . . , xn)) = xi

get fieldi(change field j(R, x )) =

x if i = j ;

get fieldi(R) if i 6= j .

create(get field1(R), . . . , get fieldn(R)) = R

endaxioms

6.1.2 Algebra/Implementation of Records

Now we model an implementation of the signature of records.


Old Algebra/Implementation First, we need to import the Σ -algebra A with carrier sets

. . . ,As , . . .

of data.

New Algebra/Implementation To the carrier sets . . . ,As , . . . from the algebra A, we addthe new carrier set

Recordw(A) = As(1 ) × · · · × As(n)

to model the records of type w = s(1 )× · · · × s(n). Thus, the i th field will be of type As(i).Now let us implement the operations on records declared in the new signature ΣRecordw .We create a record with a function

Create : As(1 ) × · · · × As(n) → Recordw(A)

defined byCreate(a1 , . . . , an) = (a1 , . . . , an)

for fields a1 ∈ As(1 ), . . . , an ∈ As(n).To retrieve this data stored within a record we define a function

Get field i : Recordw(A)→ As(i)

for each of the n fields, so that

Get field i((a1 , . . . , an)) = ai

returns the field ai ∈ As(i) of the record.To modify the data stored in a record, we define a function

Change field i : As(i) × Recordw(A)→ Recordw(A)

for each of the n fields by

Change field i(b, (a1 , . . . , an)) = (a1 , . . . , ai−1 , b, ai+1 , . . . , an)

replaces the field i of the record with the value b ∈ As(i), whilst leaving all the other componentsunaltered.

Combining the data sets involved, and the functions to construct, access and manipulaterecords, we get the algebra, in summary:

6.2. DYNAMIC ARRAYS 163

algebra Recordw(A)

import A

carriers Recordw(A) = As(1 )× · · · × As(n)

constants

operations Create : As(1) × · · · × As(n)→ Recordw(A)Get field1 : Recordw(A)→ As(1 )

......

Get fieldn : Recordw(A)→ As(n)Change field1 : As(1) × Recordw(A)→ Recordw(A)

......

Change fieldn : As(n) × Recordw(A)→ Recordw(A)

definitionsCreate(a1 , . . . , an) = (a1 , . . . , an)

Get field1 ((a1 , . . . , an)) = a1

......

Get fieldn((a1 , . . . , an)) = an

Change field1 (b1 , (a1 , . . . , an)) = (b1 , a2 , . . . , an)...

...Change fieldn(bn , (a1 , . . . , an)) = (a1 , . . . , an − 1 , bn)

It is possible to define more operations on records using operations on A and these simpleoperations on Recordw(A).

6.2 Dynamic Arrays

Arrays store data of the same type. The data is stored in locations or cells that have addresses.Arrays have operations that allow any location to be read or updated. In particular, arrays canbe

(i) static in the sense that they are of fixed length n, like the record structure considered inSection 6.1; or

(ii) dynamic in the sense that an array can grow in size as required — say, if an element isinserted into a position that is beyond the current length of the array.

The addresses are usually based on a simple indexing of locations in space. Space is usually 1,2 or 3 dimensions.

We construct an algebraic model of dynamic arrays. We will do this in a very general wayby augmenting an algebra A with arrays of arbitrary length to store the elements of A. Thus,for each set As of data in A, we will have an array for that data of sort s , as shown in Figure 6.2.

As usual, we construct a signature then an algebra.


· · ·. . .

. . .

a1 a2 a3 · · · al−1 al ? ?

. . .

. . .

length l

?

empty locations

s-array

...

...

Figure 6.2: A dynamic 1-dimensional array to store data from As for each sort s ∈ S .

6.2.1 Signature/Interface of Dynamic Arrays

We first produce a signature for arrays that simply lists the sorts, constants and operations ofour model.

Old Signature/Interface Suppose we have a signature Σ with sorts . . . , s , . . . to provideus with information about the elements that we shall store.

In each array of sort s , some locations have data and others are empty or uninitialised.Thus, we shall also need to model error cases that can arise from trying to access empty oruninitialised addresses. Thus, we construct the new signature

Σ u

that will add to the signature Σ new sort names

. . . , su , . . .

and constantsus for each sort s in Σ

for distinguished elements which we can use to flag error conditions (as described in Sec-tion 4.5.2).

There are several choices for the addresses of arrays. For simplicity, we will model a one-dimensional array and use natural numbers as addresses. Thus, we shall also use the signatureΣNaturals for addresses.

New Signature/Interface Let us picture, informally, the idea of an array. We want tostore and retrieve elements in given addresses within an array. So, we shall need sorts for data,addresses and arrays of data. We have sorts for data from the imported signature Σ and weshall also import the signature ΣNaturals to provide us with a scheme for addresses.

For each sort s of the signature Σ , we are going to have an array that will store the elementsof sort s . For each data sort s , let

array s

be the sort of arrays of sort s .


To store elements in an array of sort s , we define the operation

insert s : s × nat × array s → array s

that will allow us to insert an element of sort s at a given address into an array of sort s ; andwe define a constant

null s :→ array s

that represents an array with no stored values. These constants and operations will allow usto create arrays.

For the array to be dynamic though, we shall also need an operation

lengths : array s → nat

to determine the current length of an array of sort s .Finally, to be able to access the information stored in an array of sort s , we need the

operationread s : nat × array s → su

that returns the data stored in a given address from an array; if the given array position hasnot been initialised (because no data has yet been inserted) then we can flag this with anunspecified element.

Using the import notation to combine the sets of sort names, constant symbols and oper-ation symbols, we get the signature:

signature Array(Σ )

import Σ u ,ΣNaturals

sorts . . . , array s , . . .

constants . . . , null s :→ array s , . . .

operations . . . , insert s : s × nat × array s → array s , . . .. . . , lengths : array s → nat , . . .. . . , read s : nat × array s → su , . . .

These operations may be expected to satisfy the following axioms:

axiomsread s(i , null s) = us

read s(i , insert s(x , j , a)) =

x if i = j ;

read s(i , a) otherwise.

lengths(null s) = 0lengths(insert(x , j , a)) = max (j , length s(a))

endaxioms

6.2.2 Algebra/Implementation of Dynamic Arrays

Now we will make a precise model of these operations by constructing a ΣArray -algebra AArray

from a Σ -algebra A. There are different ways of modelling arrays. We will give a slightlyelaborate model that emphasises the role of empty or uninitialised cells in an array.


Old Algebra/Implementation We implement the signature Σ with an algebra A to definethe data that we want to store in the array. Then, we construct the algebra Au from A to addextra elements for flagging errors as described in Section 4.5.2.

New Algebra/Implementation Let As be a non-empty set of data and let us 6∈ As be anobject we may use to mark unspecified data. We will also use the algebra N of natural numbersto implement the addressing mechanism. We can picture an example of an array of length l asshown in Figure 6.3.

Data

Addresses

. . .

. . . . . .

. . .

u

1 2

b b

l − 2

u a u u· · · · · ·ll − 1 l + 1 l + 2

Figure 6.3: Model of a finite array of length l . After position l , every cell is uninitialised. Somecells before position l may also be uninitialised.

We model a finite array by means of a function and a number, i.e., a pair

a∗ = (a, l),

where: the functiona : N→ Au

s

gives the data stored in each address i.e.,

a(i) = datum from As stored at address i

anda(i) = us means address i has not been initialised;

andl ∈ N

gives the length of the array. Furthermore, since the array is finite and of length l , we willassume that

a(i) = us for all i > l .

We define the set

Array s(A) = (a, l) ∈ [N→ Aus ]× N | a(i) = us for i > l.

of all finite arrays over As . Thus, a finite array is modelled as an infinite sequence a of addresses,together with a bound l on the location of addresses that have been assigned elements, and anassignment of data to some of the addresses up to l .

To interpret the operation insert s we take the function

Insert s : Aus × N× Array s(A)→ Array s(A)


defined by

Insert s(x , i , (a, l)) =

(b, l) if i ≤ l ;

(b, i) if i > l ;

where the element x is inserted into the i th position of the array (a, l) ∈ Array s(A) by

b(j ) =

a(j ) if j 6= i ;

x if j = i .

Note that our operation Insert s automatically extends the length of the array if we try to insertan element in a position that would otherwise be past the end of the array, i.e., if i > l . Inthis case, the operation only adds one element to the array, as all the intermediate values inthe positions between the end of the old array (at l) and the end of the new array (at i) retaintheir value of u.

In addition to the constants of the imported algebras Au for data and N for addresses, weadd that of the newly created array Null ∗s ∈ Array s(A) defined as

Null ∗s = (Null s , 0 )

for any i ∈ N byNull s(i) = us .

To interpret the operation length s we take the function

Lengths : Array s(A)→ N

defined byLengths((a, l)) = l .

To interpret the operation read s we take the function

Read s : N× Array s(A)→ Aus

defined byRead s(i , (a, l)) = a(i)

which reads the i th element of the array (a, l) ∈ Array s(A).In summary, we have constructed the algebra:


algebra Array(A)

import Au ,N

carriers . . . ,Arrays(A), . . .

constants . . . ,Null ∗s : → Arrays(A), . . .

operations . . . ,Lengths : Arrays(A)→ N, . . .. . . ,Reads : N× Arrays(A)→ Au

s , . . .. . . , Inserts : As × N× Arrays(A)→ Arrays(A), . . .

definitions. . . ,Null ∗s = (Nulls , 0), . . .

. . . ,Nulls(i) = us , . . .. . . ,Reads(i, (a, l)) = a(i), . . .

. . . , Inserts(x, i, (a, l)) = (b,Max (i , l)), . . .where

b(j ) =

a(j ) if j ≤ Max (i , l) and j 6= i ;

x if j ≤ Max (i , l) and j = i ;

u otherwise.

. . . ,Lengths((a, l)) = l, . . .

6.3 Algebras of Files

The concept of a file of characters has proved to be truly fundamental. For example, in theUNIX operating system, most of the basic objects are files. The many types of file are classifiedby the types of operations that can be applied to them.

What are the essential properties of files?Are files fundamentally full of characters, or are files more general? We can easily imagine

files of any data. Thus, given a signature Σ and a Σ -algebra A modelling some data type, wewant to construct a signature and an algebra

ΣFile and AFile ,

to model a data type of files containing data from some algebra A.We shall consider a simple model of files that store data; a file has a data content and a

position element.

6.3.1 A Simple Model of Files

We consider a simple model of files in which we store

(i) data, and

(ii) the current position within that data.

6.3. ALGEBRAS OF FILES 169

Signature/Interface

We construct a signature Σ SimpleFiles of useful operations on files.

Old Signature/Interface We want to store and retrieve strings of elements within a file.Thus, we shall construct the signature Σ String for strings of elements from a signature Σ (recallSection 3.7). This contains the basic operation Concat of concatenation of strings. We shallalso use the natural numbers to mark the current position within a file.

New Signature/Interface We introduce a sort file for our data type of files, and a sortread result , to take care of the side-effects that the read function produces.

We can construct a file with the constant

empty :→ file

for a newly created file with no contents, and the operation

write : file × string → file

which inserts a string at the current position within the file and moves the current positionforward accordingly.

We can read a portion of a file with the operation

read : file × nat → read result .

This returns the next specified number of characters from the file starting at the current posi-tion, and as a side-effect, moves the current position of the file forward. To make this side-effectexplicit, we take auxiliary functions,

string content : read result → stringfile content : read result → file

which we can use to separate the file and string results from the application of the read opera-tion.

We can also arbitrarily move the current position in the file with an operation

move pos : file × nat → file.

This results in a signature:

signature SimpleFiles

import String ,Naturals

sorts file, read result

constants empty : → file

operations write : file × string → fileread : file × nat → read resultmove pos : file × nat → filestring content : read result → stringfile content : read result → file


Algebra/Implementation

Old Algebra/Implementation We suppose that we implement the signature Σ with analgebra A. Then, we form strings over A to give us the algebra AString to implement thesignature Σ String .

We shall also suppose that we implement the signature ΣNaturals with an algebra ANaturals

in which we operate over the set N of natural numbers. Note that the algebra ANaturals will beprovided from the algebra AString .

New Algebra/Implementation We can implement the signature Σ File with an algebraAFile in the following manner.

We have a carrier setFile = String × N

of files that implements the sort file, so that a file

(s , p) ∈ File

stores the data content s of the file and the current position p of the file.We also have a carrier set

ReadResult = String × File

that implements the sort read result , so that

(w , f )

will store a string w and a file f .We implement the constant symbol empty with the constant

Empty = (ε, 0 ) ∈ File;

that represents a file with no data content, and current position set at 0 .We implement the function symbol write with the function

Write : File × String → File

so thatWrite(f ,w)

inserts the string w into the data portion of the file f at the current position of f . It then alsosets the current position of the file to be at the last character of the newly inserted string w .

We implement the function symbol read with the function

Read : File × N→ (String × File)

so thatRead(f , i)

returns the next i characters of the data portion of the file f from the current position of f . Itthen sets the current position of the file to be i characters further forwards. If this would take

6.3. ALGEBRAS OF FILES 171

us past the end of the file, we just take those characters that lie before the end of the file, andset the current position to the end of the file.

We implement the auxiliary functions string content and file content with projection func-tions

StringContent : ReadResult → StringFileContent : ReadResult → File

that are defined in the obvious manner:

StringContent((w , f )) = w

FileContent((w , f )) = f .

Hence,StringContent(Read(f , i))

returns the string that results from reading i characters from the file f , and

FileContent(Read(f , i))

returns the updated file, whereby the current position has been advanced by i characters.We implement the function symbol move pos with the function

MovePos : File × N→ File

so thatMovePos(f , p)

returns a file with data content the same as f , but with a new current position of p. If p isgreater than the number of characters in the data content of f , we set the new current positionto be at the last character of the file.



algebra SimpleFiles

import String ,N

carriers File = String × N

ReadResult = String × File

constants Empty : → File

operations Write : File × String → FileRead : File × N→ ReadResultMovePos : File × N→ FileStringContent : ReadResult → StringFileContent : ReadResult → File

definitionsEmpty = (ε, 0)

Write((f1 · · · fl , p),w) = (Concat(Concat(f1 · · · fp ,w), fp+1 · · · fl), p + |w|)

Read((f1 · · · fl , p), j ) =

(ε, (f1 · · · fl , p)) if j = 0 ;

(fp+1 · · · fMin(p+j ,l), (f1 · · · fl ,Min(p + j , l))) otherwise.

MovePos((f , p), j ) = (f ,Min(j , |f |))StringContent((w , f )) = w

FileContent((w , f )) = f

6.4 Time and Data: A Data Type of Infinite Streams

Making models of systems is a commonplace activity. In computing, models are used to helpunderstand a user’s requirements for a system. Models are also used in making software, toclarify the implementation options and their consequences. An aim of this book is to showhow models reveal and make precise the concepts that shape the languages and tools used forspecifications and programming. In science and engineering, similarly, models are essential indiscovering and understanding how a physical or biological system works. Systems are analysedmathematically, computationally and experimentally. Models are formulated in mathematicaltheories and in software to investigate the data the system processes, calculates or measures.

Data types play a prominent role in modelling all systems because to describe a system weneed to define carefully

• the data that characterises the system, and

• the operations on data that define the behaviour of the system.

Modelling usually involves the operation or evolutions of the system in time. Ultimately, thedata is made from basic data types we have met.

Therefore, in modelling systems in the world, whether natural systems, like a particle inmotion or a heart, or artificial systems, like a compiler or a bank, we need an inexhaustiblesupply of data and data types. Moreover, to model how these systems operate, evolve or reactin time we need to model

6.4. TIME AND DATA: A DATA TYPE OF INFINITE STREAMS 173

data distributed in time.

This we will do using the idea of a stream and making an algebra that models a data type ofstreams. In the next section we will consider the fact that systems are extended in space andmodels involve data distributed in space.

6.4.1 What is a stream?

A stream is a sequence. . . , at , . . .

of data at indexed by time points t . Time may be discrete, in which case typically it is modelledby the set N of natural numbers, or possibly the set Z of integers. Time may be continuous, inwhich case typically it is modelled by the set R+ of positive real numbers, or possibly the setR of all the real numbers. The sequences can be finite or infinite.

0 1 2 3 t− 1 t t+ 1• • • • •• . . .

· · ·

•

• •• •0 1 2 3 t− 1

•t• •t+ 1

Figure 6.4: Examples of models of discrete and continuous time.

Processing streams of data is a truly fundamental activity in Computer Science and itsapplications.

Most computing systems operate in discrete time. The algorithms underlying computers,operating systems and networks process infinite streams of bits, bytes and words. The programsthat are embedded in instruments to monitor and control machines that do the world’s work,are programmed to process infinite streams of Booleans, integers, reals and strings.

Many programs must be designed to compute forever, accepting an infinite stream of inputsand returning an infinite stream of outputs.

In contrast to the vast scope of programming with streams, our task here is simply to thinkabout streams as a data type. For any data, we want to program with streams of that data.Thus, given any Σ -algebra A, we show how to construct a signature Σ Stream and a Σ Stream -stream algebra

AStream

that models streams over A. To do this, we first have to look at time.

6.4.2 Time

Time is a deeply complicated idea with a fascinating history. To glimpse its philosophicalrichness, consult Whitrow [1980]; to glimpse its amazing history, consult Borst [1993]. Time iscompletely fundamental to all forms of mechanisation, and certainly to computing.

Discrete time is thought of as either


an ordered sequence of isolated time units

as suggested in Figure 6.4, or

a consecutive sequence of time intervals or cycles

as suggested in Figure 6.5.

• • • • · · ·1 2 3 • •t− 1 t • •t+ 1 · · ·

time cycle number tfirst time cycle

0

Figure 6.5: Time modelled as cycles.

The idea is evident in our experience of a digital clock showing hours and minutes: theclock displays 08 : 59 for an interval (of one minute) and then displays 09 : 00 for an interval(of one minute). On first reading the clock, we know we are somewhere in the interval 08 : 59.In either case, the time instants or time cycles are counted using the natural numbers. Thus,discrete time is modelled by a counting process: there is an initial or 0 th time instant or cycle,and a function that returns the next t + 1 th instant or cycle from the t th instant or cycle. Adevice that counts time is called a clock. This conception is easy to model as an algebra.

Signature/Interface The signature ΣClock for our clock names time as a sort, an initial timevalue start as a constant, and a function tick to count the passing of time.

signature Clock

sorts time

constants start : → time

operations tick : time → time

This is a basic set of operations on time; some other useful operations on time concern theordering of time.

Algebra/Implementation We implement the signature ΣClock with a standard clock mod-elled by the algebra AClock :

algebra Clock

carriers N

constants Start : → N

operations Tick : N→ N

definitionsStart = 0

Tick(t) = t + 1


Other clocks are also interesting and provide different interpretations of Σ Clock . For example,a finite counting system like cyclic arithmetic Zn is of use. A timer for cooking might displayonly minutes and be based on a ΣClock -algebra with carrier

Z60 .

Internally, it could also measure seconds and be based on the data

Z60 × Z60 .

The standard digital clock measures hours, minutes and seconds, and is implemented as analgebra with carrier

Z24 × Z60 × Z60 .

Notice these clocks require extra operations to be added to ΣClock . The operation of ΣClock

counts in only one unit of time.

6.4.3 Streams of Elements

Let us first clarify some mathematical ideas about streams and sequences.

Definition (Streams) Let A be any non-empty set. An infinite sequence or stream over A isa function

a : N→ A.

For t ∈ N, the t th element of the sequence is a(t) and sequences are often written

a = a(0 ), a(1 ), . . . or a = a0 , a1 , . . . .

Let the set of all infinite sequences over A be denoted by

[N→ A].

The term stream is appropriate when the set N models a set of time cycles.

Old Signatures/Interface We shall need two signatures to construct streams. We shall needa signature Σ for the data that we shall store in the streams, and we shall need a signatureΣClock to count the passage of time.

New Signature/Interface We make a new signature Σ Stream from the signatures Σ for thedata elements and ΣClock for the clock.

signature Stream

import Name,Clock

sorts . . . , stream(s), . . .

constants

operations . . . , reads : stream(s)× time → s , . . .. . . ,writes : s × stream(s)× time → stream(s), . . .


Depending upon the application, other names for the operations read s and writes may be moreappropriate, such as

send s and receives

orevaluates and updates .

6.4.4 Algebra/Implementation

Old Algebra/Implementation We make a new algebra AStream from the algebras A thatimplements the signature Σ for the data elements, and AClock for the clock.

New Algebra/Implementation We make a Σ Stream -algebra AStream from any Σ -algebra Aand the clock AClock :

algebra Stream

import Name,Clock

carriers . . . , [N→ As ], . . .

constants

operations . . . ,Reads : N× [N→ As ]→ As , . . .. . . ,Writes : As × N× [N→ As ]→ [N→ As ], . . .

definitions. . . ,Reads(a, t) = a(t), . . .

. . . ,Writes(x , t , a)(t ′) =

x if t = t ′;a(t ′) if t 6= t ′.

, . . .

This algebra AStream simply involves infinite sequences as data, the elements of which maybe read by means of the new operations. There are several other operations that may be addedto AStream to model other uses of infinite sequences, for example, operations that insert datawithout losing data, translate data in time, or merge two streams (see Exercises).

6.4.5 Finitely Determined Stream Transformations and Stream Pred-icates

Imagine a computing system receiving data from, and returning data to, some environment.Suppose the system operates in discrete time measured by a discrete clock with time cycles

T = 0 , 1 , 2 , . . ..

Let the input data come from a set A and the output data from a set B . We investigate theinput-output behaviour of the system over time when it takes one of the following two forms:

1. Stream Transformer The system operates continuously in time and inputs a stream

a = a(0), a(1), a(2), . . . , a(t), . . . ∈ [T → A]


of data from A and outputs a stream

b = b(0), b(1), b(2), . . . , b(t), . . . ∈ [T → B ]

of data from B . Thus, the input-output behaviour of the system can be modelled by astream transformation

F : [T → A]→ [T → B ].

2. Stream Predicate The system operates continuously in time and inputs a stream

a = a(0), a(1), a(2), . . . , a(t), . . . ∈ [T → A]

of data from A and outputs a single truth value

b ∈ B

from B = tt ,ff . Thus, the input-output behaviour of the system can be modelled by astream predicate

F : [T → A]→ B.

Now computing systems are built from

algorithms that process data in finitely many steps.

This suggests a property of stream processing:

The output of a stream transformer

F (a)(t)

at time t given input stream a, and the output of a stream predicate

F (a)

is computed in finitely many steps by algorithms that can access at most finitelymany elements

a(0), a(1), a(2), . . . , a(l)

of the input stream a. In the case of stream transformers, the number l dependson a and t ; in the case of stream predicates, l depends on a.

This property is called finite determinacy and is shown in Figure 6.6. It has the followingformal definition (which drops references to algorithms).

system

a(l), . . . , a(1), a(0)

b(t), . . . , b(1), b(0). . . , a(l + 2), . . . , a(l + 1)

Figure 6.6: System with finite determinacy principle.


Definition (Finite Determinacy)

1. A stream transformer F : [T → A]→ [T → B ] is finitely determined if, given any inputstream a ∈ [T → A] and any time t ∈ T , there exists an l ∈ T such that for every streamb ∈ [T → A], if a and b agree up to time l , i.e.,

a(0) = b(0), a(1) = b(1), a(2) = b(2), . . . , a(l) = b(l)

thenF (a)(t) = F (b)(t).

2. A stream predicate F : [T → A] → B is finitely determined if, given any input streama ∈ [T → A] there exists an l ∈ T such that for every stream b ∈ [T → A], if a and bagree up to time l , i.e.,

a(0) = b(0), a(1) = b(1), a(2) = b(2), . . . , a(l) = b(l)

thenF (a) = F (b).

An argument about the behaviour of computing systems and the role of algorithms wasused to motivate the definition of this property of stream transformations and predicates. Theargument actually suggests a hypothesis or thesis:

Thesis If a stream transformation or predicate is computable by algorithms then it is finitelydetermined. In particular, if a stream transformation or predicate is not finitely determinedthen it is not computable by algorithms.

Examples

Consider processing streams of integers.

1. Let F : [T → Z]→ [T → Z] be defined by

F (a)(t) = a(0) + a(1) + a(2) + · · ·+ a(2t).

This is finitely determined because for any time t and streams a, b, we have

a(0) = b(0), a(1) = b(1), a(2) = b(2), . . . , a(2t) = b(2t)⇒ F (a)(t) = F (b)(t).

2. Let Z : [T → Z]→ B be the zero test defined by

Z(a) =

tt if a = 0 , 0 , 0 , . . . , 0 , . . . ;ff if a 6= 0 , 0 , 0 , . . . , 0 , . . . .

Now, making these conditions explicit, we have

Z(a) =

tt if (∀t)[a(t) = 0 ];ff if (∀t)[a(t) 6= 0 ].

The predicate Z is not finitely determined. Here is the proof.


Suppose for a contradiction it was finitely determined. Then, given a stream a, thereexists a bound l such that for any stream b,

a(0) = b(0), a(1) = b(1), a(2) = b(2), . . . , a(l) = b(l)⇒ Z(a) = Z(b).

Choose a to be 0 everywhere and b that is not 0 only at time l + 1 . Clearly,

a(0) = b(0), a(1) = b(1), a(2) = b(2), . . . , a(l) = b(l) but Z (a) = tt and Z (b) = ff .

This contradicts the finite determinacy property.

6.4.6 Decimal Representation of Real Numbers

The real numbers are a number system that we use to measure quantities exactly. To measurethe length of a line or the circumference of a circle, we encounter numbers like

√2 and π

that are not rational numbers. The rational numbers are a number system that we use tomeasure quantities

exactly, using a ruler or gauge divided into units.

The real numbers are created by modelling the process of measurement using rational numbers.We use them to measure quantities

accurately or exactly to any degree of precision.

A real number is an abstraction from these measuring processes. Recall the discussion of realnumbers in Chapter 3.

There are many ways of modelling this process and, hence, representing real numbers. Later,we will devote a whole chapter to the subject (Chapter 9) and these processes are examples ofstreams. Here, we will investigate the standard notation for real numbers, namely:

infinite decimals.

We will explain the method and show how to represent infinite decimals as streams. Then wewill prove that they are unsuited to exact computation to arbitrary accuracy.

First, here are some real numbers and their infinite decimal representations:

1 1.000 · · · 0 · · ·1 0.999 · · · 9 · · ·13

0.333 · · · 3 · · ·√2 1.41421356 · · ·

π 3.14159265 · · ·227

3.14285714 · · ·

Most real numbers have one, and only one, infinite decimal representation. For example,

0 .333 · · · 3 · · · and 0 .666 · · · 6 · · ·


are the unique decimal representations of 13

and 23, respectively. However,

1 .000 · · · 0 · · · and 0 .999 · · · 9 · · ·

are two different decimal representations of 1 .In calculation, we use some finite part of the decimal representation as an approximation.

For example, 1 .4142 approximates√

2. The finite decimal notation

1 .4142

stands for the sum

1 +4

10+

1

100+

4

1000+

2

10000

of rational numbers, and is a rational number that measures using the base 10 units

1

10,

1

102,

1

103,

1

104.

However, the infinite decimal notation

1 .41421356 · · ·

stands for the process of summing

1

100+

4

101+

1

102+

4

103+

2

104+

1

105+

3

106+

5

107+

6

108+ · · ·

the rational numbers that measure using all possible base 10 units.The term decimal can be confusing because it is used to denote two aspects of number

notations:

(i) the use of base 10, and

(ii) the use of the so-called decimal point.

In this section, it is the use of the decimal point that attracts our attention.

Definition (Decimal Representation of Reals) A decimal representation of a real numberhas the form

bmbm−1 · · · b1b0 .a1a2a3 · · · an · · ·where

(i) m is a natural number, m ∈ N, and

(ii) bm , bm−1 , . . . , b1 , b0 , a1 , a2 , a3 , . . . , an , . . . ∈ 0 , 1 , . . . , 9 are digits.

The notation stands for

bm10m + bm−110m−1 + · · ·+ b1101 + b0100 + a110−1 + a210−2 + · · ·+ an10−n + · · ·


We will show that computing with decimal representations of real numbers is problematic.For simplicity in notation, we will examine the problem of computing with decimal represen-tations of real numbers in the closed interval

[0 , 10 ] = r ∈ R | 0 ≤ r ≤ 10.The numbers in [0 , 10 ] have a simple decimal representation that can easily be represented bystreams as follows.

Definition (Stream Representation of Decimals) A real number x ∈ [0 , 10 ] has a decimalrepresentation of the simpler form

b0 .a1a2a3 · · · an · · ·where b0 , a1 , a2 , a3 , . . . , an , . . . ∈ 0 , . . . , 9. In particular, there is just one digit before thedecimal point.

There is a trivial way to view or represent such decimals as streams, namely:

a = b0 , a1 , a2 , a3 , . . . an , . . .

Thus, the set of decimals, and hence the reals in [0 , 10 ], can be represented using the set

[N→ 0 , 1 , . . . , 9]of all streams of digits.

Definition (Stream Representations of Functions) A function

f : [0, 10]→ [0, 10]

on real numbers is represented by a stream transformer

F : [N→ 0, 1, . . . , 9]→ [N→ 0, 1, . . . , 9]of streams of digits if, for any real number x ∈ [0 , 10 ] and stream a ∈ [N→ 0 , 1 , . . . , 9], wehave that if the stream

a represents the decimal expansion of x

then the stream

F (a) represents the decimal expansion of f (x ).

We restrict the domain of our function f (x ) = 3x to the subset

[0 , 1 ] = r ∈ R | 0 ≤ r ≤ 1so that the range of f remains in [0 , 10 ]. (Note that if 0 ≤ x ≤ 1 then 0 ≤ f (x ) ≤ 3 .)

Theorem The function f : [0 , 1 ]→ [0 , 10 ] defined by

f(x) = 3x

for x ∈ [0 , 1 ] cannot be defined by a finitely determined stream transformation

F : [N→ 0, 1, . . . , 9]→ [N→ 0, 1, . . . , 9]of the decimal representation of real numbers.


Proof Let the stream transformer

F : [N→ 0, 1, . . . , 9]→ [N→ 0, 1, . . . , 9]

calculate the function f (x ) = 3x using decimals, i.e.,

stream a represents real x ∈ [0 , 1 ] implies stream F (a) represents real 3x .

Suppose for a contradiction that F is finitely determined. This means that for any stream aand output position k there is an input position l such that for all streams b

a(i) = b(i) for 0 ≤ i ≤ l ⇒ F (a)(k) = F (b)(k).

We derive the contradiction by investigating the way F calculates 3x for x = 13. The input 1

3

has the unique decimal representation0.333 · · ·

and as an input stream isa = 0, 3, 3, 3, . . . .

However, the output f ( 13) = 3 .1

3= 1 has two decimal representations

0.999 · · · or 1.000 · · · ,

and so the output stream is either

F (a) = 0, 9, 9, 9, . . . or F (a) = 1, 0, 0, 0, . . . .

The argument divides into two cases.

Case 1 F (a) = 0 , 9 , 9 , 9 , . . ..By the assumption that F is finitely determined, for k = 0 there is l such that for any

stream b

a(0) = b(0) = 0 and a(i) = b(i) = 3 for 1 ≤ i ≤ l ⇒ F (a)(0) = F (b)(0) = 0

Choose a stream b with 9 at position l + 1 and thereafter, i.e.,

b = 0, 3, 3, 3, . . . , 3, 9, 9, 9, . . . .

By choice, a(i) = b(i) for 0 ≤ i ≤ l . However, since

3 ∗ 0.333 · · · 3999 · · · = 1.000 · · · 02 · · ·

we see that F (b)(0 ) = 1 ; in particular, although a and b agree up to l ,

0 = F (a)(0) 6= F (b)(0) = 1.

This contradicts the assumption that F is finitely determined.

6.5. SPACE AND DATA: A DATA TYPE OF SPATIAL OBJECTS 183

Case 2 F (a) = 1 , 0 , 0 , 0 , . . ..A similar argument leads to a contradiction. (Exercise.) 2

Corollary The function f : R→ R defined by f (x ) = 3x cannot be computed by an algorithmusing the decimal representation of the real numbers.

Proof Suppose there was an algorithm that computed f (x ) = 3x using decimal representations.Then the input-output behaviour of the algorithm restricted to [0 , 1 ] can be defined by a streamtransformation F on streams [N → 0 , 1 , . . . , 9]. Because the algorithm determines the n th

decimal place in finitely many steps l , the stream transformer F is finitely determined. However,by the Theorem, no such finitely determined stream transformer exists. 2

Thus, given the Thesis, very little can be computed using infinite decimals.

6.5 Space and Data: A Data Type of Spatial Objects

In Section 6.4 we met streams. Streams are

data distributed in time,

and typical examples of streams are the input and output of interactive systems. We constructeda simple algebra to model a data type of streams.

Now, in this section, we will model

data distributed in space.

Here are some examples to shape our thinking:

• In computer graphics, data in three dimensions for transformation and visualisation (e.g.,the data from medical scanners such as CT, NMR).

• In scientific modelling, states of a physical system, including both computed data frommathematical models and output from measuring instruments.

• In data storage media, states of both hardware (e.g., memories and disks) and software(e.g., arrays and other data structures).

Space and time are combined in animations of graphical data, in simulations of physicalsystems, and in the operation of storage media.

To model data distributed in space we will define the idea of a

spatial object or spatial data field

and define operations to make data types of spatial objects. We will apply this general conceptby making an algebra of use in Volume Graphics.


6.5.1 What is space?

Ideas of space are as complicated as notions of time. The ideas have great philosophical, physicaland mathematical depth. In simple terms, we think of a space as a set of points. We oftenequip the space with some form of coordinate or addressing system to locate points. Indeed, itis common to identify points with their representation by coordinates in a coordinate systemfor the space. For example, commonly, we think of a point in the 2 dimensional plane as apair (x , y) of numbers in an orthogonal coordinate system. In particular, we think of a pointin terms of some representation of the plane. Of course, there are infinitely many coordinatesystems to choose from — we can simply vary the choice of origin, or the direction of the x-axis.With each choice, the pair (x , y) of numbers will represent a different point. More radically, wecan drop the condition that the axes are at right-angles, or are even straight lines. For manytasks we use polar coordinates. There are some more examples of systems in Figure 6.7.


Space Coordinate/Addressing system2 dimensional continuous physical space Real number coordinate system based on

• Cartesian coordinates, or

• polar coordinates

based on real numbers.3 dimensional continuous physical space Real number coordinate system based on

• Cartesian coordinates,

• polar coordinates, or

• cylindrical coordinates

based on real numbers.2 dimensional discrete physical space Integer number coordinate system based on

• Cartesian grid coordinates

based on integers.3 dimensional discrete physical space Integer number coordinate system based on

• Cartesian grid coordinates

based on integers.Data storage media Indexing addresses based on integers and

stringsInternet space Addresses such as URLs and IP addresses

based on integers and stringsStatic and mobile telephony space Addresses such as phone and device numbers

and physical coordinates based on integersand strings.

Figure 6.7: Spaces and their coordinates.


The idea of space has been analysed in great depth in mathematics. It started with theancient Greek view of geometry expounded in the 13 books of Euclid (c.300 BC). Here the pointsand lines, the circles and other objects they make up are created by algorithmic operationswith ruler and compass. In the 17th century, a new view of space emerged that was essentiallyalgebraic: in coordinate geometry, axes were used and objects like conic sections defined byequations, and calculations performed using the differential calculus. Again and again, newviews of space are discovered and new theories are developed.

For our simple purposes we can define the notion of space rather simply:

Definition (Space) A space is a non-empty set X , the elements of which are called points.

Depending on the example or application we may call points locations or sites. We will alsoidentify spaces with coordinate systems.

Example Continuous Space Usually, we represent the 2 dimensional continuous plane bytaking X = R2 where

R2 = (x, y) | x, y ∈ R,and 3-dimensional space by taking X = R3 where

R3 = (x, y, z) | x, y, z ∈ R.

We represent the n-dimensional continuous space by taking X = Rn where

Rn = (x1 , . . . , xn) | x1 , . . . , xn ∈ R.

Discrete Space We represent a discrete plane by the 2-dimensional integer grid, taking X =Z2 where

Z2 = (i, j) | i, j ∈ Z.We represent a discrete 3 dimensional space by the 3-dimensional integer grid takingX = Z3 where

Z3 = (i, j, k) | i, j, k ∈ Z.We represent the n-dimensional discrete space by taking X = Zn where

Zn = (x1 , . . . , xn) | x1 , . . . , xn ∈ Z.

Address Space We can represent the space X of addresses of a 1-dimensional array by taking

X = N or X = Z.

Similarly, the address space of a 2 dimensional array by taking

X = N2 or X = Z2 ,

the address space of a 3 dimensional array by taking

X = N3 or X = Z3 ,

and the address space of an n dimensional array by taking

X = Nn or X = Zn .


Identifiers In high level programming, variables and identifiers can be chosen to suit theprogram at hand. Typically, they are strings of symbols chosen from the English alphabetand digits. The variables specify locations where data is stored. In particular, the set Xcan be seen as a space of names that address a memory.

Since the late 19th century, ideas about space start from the idea of a set. Hence the ideasare abstract and general. Thinking about sets of points abstractly, without using a coordinatesystem representation, lead to concepts such as

metric spaces,

which are based on axiomatically specifying the process of measuring the distance betweenpoints, or the more abstract

topological spaces,

which are based on axiomatising the idea of subsets of neighbouring points. With such abstractviews of space we can more thoroughly analyse ideas of space in ways that are independent oftheir representation in coordinate systems.

6.5.2 Spatial objects

Physical processes and abstract objects are modelled or measured by assigning some data toeach point of a space. The basic idea of a spatial object is that any kind of data can bedistributed all over any chosen space X .

Definition (Spatial object) Let X be a non-empty set representing a space. Let A1 , . . . ,Am

be sets of data. This data we call the attributes of the points in the space. Let

A = A1 × · · · × Am .

Then a spatial data object or spatial data field is represented by a map

o : X → A,

which assigns attributes to points, i.e.,

o(x) = the m data attributes characterising the object at point x ∈ X .

Let O(X ,A) be the set of all spatial objects.Since A = A1 ×· · ·×Am there are m attributes of o(x ), each of which depends on the point

x . Specifically, there are m functions

a1 : X → A1 , . . . , am : X → Am ,

to compute the m attributes independently, and we can write

o(x) = (a1 (x), . . . , am(x)).

These functions a1 , . . . , am we call the attribute fields or components of the spatial object o.

This idea of a spatial object is very general.


Example An RGB Colour Model Consider data for the visualisation of 3-dimensional con-tinuous space. Let the space X be 3-dimensional space R3 . Suppose for each point p ∈ R3 .There are 4 attributes: one representing visibility by

op the opacity of the point p in space;

and three representing colour by

r the red value of the point p in space;

g the green value of the point p in space;

b the blue value of the point p in space.

The opacity op is measured on a real number scale from 0 to 1 , i.e., the interval

[0 , 1] = r ∈ R | 0 ≤ r ≤ 1.

Here the value 0 means that the point is transparent and the value 1 that it is opaque;values in between mean the point is translucent. The three colour values can also measuredusing real numbers in the interval [0 , 1 ].

So specifying the appearance of point p ∈ R3 , we have a 4-tuple

(op, r, g, b)

of real numbers, each from the interval [0 , 1 ]. Thus, in this case, with X = R3 , as thespace, and A = [0 , 1 ]4 , as the set of attributes, a spatial data object

o : X → A

is a mapo : R3 → [0 , 1]4

defined at each point p ∈ R3 by its attribute fields

o(p) = (op(p), r(p), g(p), b(p)).

n-dimensional attributes Let the space X be 3-dimensional Euclidean space R3 . Supposethere are n > 0 attributes for each point p in space, and each attribute is also a realnumber; then let A be n-dimensional space Rn . Here we have an n-tuple of real numbersas data measuring some properties at point p. Thus, in this case, each spatial data object

o : X → A

is a mapo : R3 → Rn

defined by n attribute fieldsa1 , . . . , an : R3 → R

at each point p ∈ R3 byo(p) = (a1 (p), . . . , an(p)).


States of systems Consider any system or process extended in space and changing in time.The behaviour of the system is commonly described using states. States provide snapshotsof the system at time instants by specifying relevant properties of each point in space. Asequence of states is used to describe the behaviour of the system as it operates or evolvesin time. If X is a space and A is the data of interest in the model of the system thena state is a map of the form σ : X → A. Thus, states can be seen as an example of aspatial object.

Identifiers Let Var be a set of variables and let A be a set of data then the assignment ofdata to variables has the form

a : V ar → A.

The assignment of data A in memory to a space Var of names can also be seen as aspatial object.

6.5.3 Operations on Spatial Objects

Next we look at some operations that transform spatial objects. The combination of some setof spatial objects and some operations is a data type and is modelled by an algebra. Thesedata types are designed to help process spatial objects in particular applications.

The simplest kind of operation on objects is one derived from an operation on attributes,or on space.

Suppose we have some binary operation

f : A2 → A

on the data attributes. Then we can define a new binary operation

F : O(X ,A)2 → O(X ,A)

on spatial data objects by

“applying the operation f to data at all points the space X ”.

This is called a point-wise extension or lifting of the operation f and is defined as follows.

Definition (Point-wise Attribute Operations) Suppose

o1 : X → A and o2 : X → A

are any objects. Then we define the new object

F (o1 , o2 ) : X → A

by applying f to the data at each point x ∈ X as follows:

F (o1 , o2 )(x ) = f (o1 (x ), o2 (x ))


Example Let X be any space. Let the attributes A = [0 , 1 ]. Let the function f on theattributes be

max : [0 , 1 ]× [0 , 1 ]→ [0 , 1 ]

defined by

max (s1 , s2 ) =

s1 if s1 ≥ s2 ;s2 if s1 < s2 .

Then the point-wise extension

Max : O(X , [0 , 1 ])2 → O(X , [0 , 1 ])

of max is defined for all spatial objects o1 , o2 ∈ O(X , [0 , 1 ]) at point x ∈ X by

Max (o1 , o2 )(x ) = max (o1 (x ), o2 (x )).

The function Max simply makes a new spatial object Max (o1 , o2 ) which has the highest valueof the two spatial objects o1 and o2 at each point in X . We will use this operation on opacityshortly.

In the definition above we have used the notation A for the attributes. In most cases, thisset A is built from m attributes and

A = A1 × · · · × Am .

Although the definition of the lifting is straight forward, as in the above definition, this factcomplicates the form of the operations f . Clearly, substituting for A, the operation

f : A2 → A

is actually of the form

f : (A1 × · · · × Am)× (A1 × · · · × Am)→ (A1 × · · · × Am)

This means that to define an operation f we define its m component operations

f1 : (A1 × · · · × Am)× (A1 × · · · × Am)→ A1 ,...

...fm : (A1 × · · · × Am)× (A1 × · · · × Am)→ Am .

which calculate the m values of each attribute f . Thus, for a, b ∈ A,

f (a, b) = (f1 (a, b), . . . , fm(a, b)).

We will see some examples of this shortly.


6.5.4 Volume Graphics and Constructive Volume Geometry

In the world, the objects and scenes we see are 3 dimensional. Behind the surfaces of clothesand skin are flesh and bone. A cloud, or a fire, fills space with water, or hot gases. In computergraphics, historically and currently, standard techniques represent 3 dimensional objects usingtheir 2 dimensional surfaces.

The subject of volume graphics assumes that all objects are represented fully in 3 dimensions.Objects and scenes are built and rendered using data at points or small volumes of 3 dimensionalspace, called voxels, a term intended to contrast with the 2 dimensional notion of pixel.

Volume graphics originates in techniques for the visualisation of 3 dimensional arrays of data.In medical imaging, physical objects are measured by scanning instruments that produce a datafile of measurements that approximate a 3 dimensional spatial object. In clinical applications,the aim is to visualise these digitalisations of bone and tissue for diagnosis and treatment.In physiology, for example, carefully produced digitised hearts allow us to use algorithms toperform digital dissections to explore anatomical structure; they are also needed to explorefunction by computational simulations and experimental measurements.

Clearly, all kinds of operations are needed to process such data sets. A fundamental generaltask is to combine a number of data sets to make more complex scenes. In physiology, forexample, an aim is to integrate data from dissection, simulation and experiment to model thewhole heart beating. In this cardiac application alone there is plenty of scope for discoveringnew operations.

Here we introduce a little volume graphics to illustrate the ideas about data distributed inspace. In particular, the 3 dimensional data sets are examples of spatial objects. We definesome operations by point-wise extensions, and give an algebra of spatial objects for modellingin volume graphics and visualisation. Our aim is to show a simple data type at work in a verycomplex application.

However, in the case of volume graphics, there are many applications and so there are manypossible attributes to use, operations to define, and algebras to make. Indeed, the creation ofa wide variety of data types of spatial objects constitutes a very general, high-level, algebraicapproach to Volume Graphics and is called Constructive Volume Geometry (CVG).

Spatial Objects

Why are spatial objects needed in volume graphics? Raw volumetric field data, such as adigitised heart, is discrete and bounded in 3 dimensional space. These properties turn out tobe an obstacle to a consistent specification of operations on the data sets. In order to createdata types with such operations, we introduce the idea of spatial objects, which, because theyare defined at every point in 3 dimensional space, makes them more inter-operable.

We have already introduced some spatial objects for volume graphics in the example ofSection 6.5.2. Recall that they are three dimensional objects with the four attributes of opacity ,red , and green, and blue, and have the form

o : R3 → [0 , 1 ]4 .

Each spatial object o is made from a 4-tuple of attribute fields

op, r , g , b : R3 → [0 , 1 ],


and is defined at each point p ∈ R3 by

o(p) = (op(p), r(p), g(p), b(p)).

Let O(R3 , [0 , 1 ]4 ) be the set of all such objects.The opacity field “implicitly” defines the “visibility” of the object. Any point p that is not

entirely transparent, i.e., op(p) 6= 0 , is potentially visible to a rendering algorithm.

Operations

The aim is to define an CVG algebra of spatial objects based on the opacity and RGB model.Here is its signature:

signature RGB algebra

sorts sp obj

constants

operations ∪ : sp obj × sp obj → sp obj∩ : sp obj × sp obj → sp obj− : sp obj × sp obj → sp obj+ : sp obj × sp obj → sp obj

Any algebra with this signature has a single sort sp obj and four binary operations. Thealgebra we are building interprets the sort sp obj with the set

O(R3 , [0 , 1 ]4 ).

The four basic CVG operators are called

union ∪(o1 , o2 )intersection ∩(o1 , o2 )difference −(o1 , o2 )blending +(o1 , o2 )

and examples of their use are illustrated in Figure 6.8.


A cylinder c A cylinder s

∪(c, s) ∩(c, s) +(c, s)

−(c, s) −(s , c)

Figure 6.8: The seven images illustrate the effect of the four operations on spatial objects tothe interior of the objects. They are a cylinder c, a sphere s , and the effects of the operations∪(c, s), ∩(c, s), +(c, s), −(c, s) and −(s , c).


Mathematically, each one of these four operators is defined on all spatial objects by con-structing its opacity, red, green and blue attribute fields. These attribute fields are in turndefined by point-wise extensions of simple arithmetic operations on the interval [0 , 1 ].

First, here are some six operations on the interval [0 , 1 ] that we will need.

max (s1 , s2 ) =

s1 if s1 ≥ s2 ;

s2 if s1 < s2 .

sub(s1 , s2 ) = max (0 , s1 − s2 )

min(s1 , s2 ) =

s1 if s1 ≤ s2 ;

s2 if s1 > s2 .

add(s1 , s2 ) = min(1 , s1 + s2 )

select(s1 , t1 , s2 , t2 ) =

t1 if s1 ≥ s2 ;

t2 if s1 < s2 .

mix (s1 , t1 , s2 , t2 ) =

t1 s1 + t2 s2s1 + s2

if s1 + s2 6= 0 ;

t1 + t22

if s1 + s2 = 0 .

Each of these operations on [0 , 1 ] can be applied all over the space R3 to create a point-wiseextension which we will use as operations on the attribute fields. The first extension of max toMax we gave as an example in 6.5.3. The six point-wise extensions are defined as follows:

Let a1 , a2 , b1 , b2 : R3 → [0 , 1 ]. Let p ∈ R3

Max (a1 , a2 )(p) = max (a1 (p), a2 (p))

Min(a1 , a2 )(p) = min(a1 (p), a2 (p))

Sub(a1 , a2 )(p) = sub(a1 (p), a2 (p))

Add(a1 , a2 )(p) = add(a1 (p), a2 (p))

Select(a1 , b1 , a2 , b2 )(p) = select(a1 (p), b1 (p), a2 (p), b2 (p))

Mix (a1 , b1 , a2 , b2 )(p) = mix (a1 (p), b1 (p), a2 (p), b2 (p))

The binary operations on the set O(R3 , [0 , 1 ]4 ) of spatial objects are created from theseoperations on attribute fields as follows:

Let o1 = (op1 , r1 , g1 , b1 ) and o2 = (op2 , r2 , g2 , b2 ) be two spatial objects.


∪(o1 , o2 ) = (Max (op1 , op2 ),Select(op1 , r1 , op2 , r2 ),Select(op1 , g1 , op2 , g2 ),Select(op1 , b1 , op2 , b2 ))

∩(o1 , o2 ) = (Min(op1 , op2 ),Select(op1 , r1 , op2 , r2 ),Select(op1 , g1 , op2 , g2 ),Select(op1 , b1 , op2 , b2 ))

−(o1 , o2 ) = (Sub(op1 , op2 ), r1 , g1 , b1 )

+(o1 , o2 ) = (Add(op1 , op2 ),Mix (op1 , r1 , op2 , r2 ),Mix (op1 , g1 , op2 , g2 ),Mix (op1 , b1 , op2 , b2 ))

Since the spatial objects fill space, this allows the idea to accommodate objects definedmathematically or by digitalisation, whether they have a geometry or are amorphous. Theoperations transform the data everywhere but opacity plays an important role in deciding howobjects are combined.

In Figure 6.8 we have an example of creating a scene by applying the operations.In Figure 6.9 we have an example of visualising the heart by applying the operations.

Figure 6.10 shows how a complex scene can be constructed by applying a series of operations.


Figure 6.9: Operations on heart data visualising fibres.


Figure 6.10: Tree structure of a CVG term and the scene defined by it.


Historical Notes and Comments for Chapter 6

Volume graphics is an alternate paradigm for computer graphics in which objects are repre-sented by volumes instead of surface representations. An early manifesto for the generality ofthe volume approach is Kaufman et al. [1993]; an updated vision appears in Chen et al. [2000].Constructive volume geometry (CVG) is a high level approach to volume graphics based oncreating algebras of spatial objects: see Chen and Tucker [2000]. In CVG there are a numberof data types of spatial objects and operations that can be used to put together images to formcomplex scenes. See http://www.swan.ac.uk/compsci/research/graphics/vg/cvg/index.htmlfor more information and downloadable papers. CVG is a generalisation of constructive solidgeometry (CSG) of surface graphics. In CSG, solids are described by characteristic functionss : R3 → B and algebras are created to build solid complex objects from simpler components.That technique is well established in CAD applications.



1. Use the record construction to create algebras that model the following data types:

(a) n-dimensional real space Rn ; and

(b) n-dimensional integer grid Zn .

2. Let Σ be a signature with n sorts, p constants and q operations. How many sorts,constants and operations does ΣArray have?

3. Design a signature and an algebra to model the stack data structure. Check that theoperations satisfy the equation

Pops(Pushs(x , s)) = s .

Is the equationPushs(Pops(s), s) = s

true?

4. Design a signature Σ 2−Array and a Σ 2−Array -algebra A2−Array that model an interface andimplementation of 2 -dimensional arrays.

5. Design a signature ΣMatrix(n) and a ΣMatrix(n) algebra AMatrix(n) that model an interfaceand implementation of n × n matrices over the real numbers.

6. Design a signature and an algebra to model each of the following data structures:

(a) lists of data;

(b) queues of data; and

(c) binary trees of data.

7. For each of the following equations over Σ SimpleFiles , state whether they are always true,always false, or sometimes true, in the algebra ASimpleFiles . For those equations which arenot always true, give an example when it is true and when it is false.

(a) Read(i ,Empty) = (ε,Empty);

(b) Write(StringContent(Read((F , p), n)),FileContent(Read((F , p), n))) = (F , p + n)

(c) StringContent(Read((F , p), Succ(n)))= Prefix (StringContent(Read((F , p)), Succ(Zero)),

StringContent(Read((F , Succ(p)), n))).

8. Let X and Y be non-empty sets. Create an algebra that models the data type of all totalfunctions

f : X → Y.

Compare your algebra with the algebras of arrays and streams.


9. Design a signature Σ and an algebra A that models a digital stopwatch that measures

time intervals of 110

thsecond, from 0 to 15 minutes. Suppose the stopwatch has stop,

continue and reset buttons.

10. Design a signature Σ and an algebra A that models a digital alarm clock. Suppose thealarm clock displays hours, minutes and seconds, and rings at any time given by hoursand minutes only.

11. Let ΣClock be the signature of the clock in Section 6.4.2. Define a ΣClock -algebra

AContinuous Clock

that models continuous time using the real numbers, rather than discrete time using thenatural numbers. To calculate with continuous time, what new functions are useful?

What changes are needed in the construction of the Σ Stream -algebra AStream from theΣ -algebra A to create an algebra of continuous streams over A?

12. Let A be a non-empty set and let T = 0 , 1 , 2 , . . . be a set of time cycles. Let

a = a(0 ), a(1 ), a(2 ), . . . , a(t), . . . and b = b(0 ), b(1 ), b(2 ), . . . , b(t), . . .

be any streams in [T → A]. Show that the following stream transformations are finitelydetermined; for each t , calculate l .

(a) Merge : [T → A]× [T → A]→ [T → A]

Merge(a, b) = a(0), b(0), a(1), b(1), . . . , a(t), b(t), . . .

(b) Insert : A× T × [T → A]→ [T → A]

Insert(x, t, a) = a(0), a(1), a(2), . . . , a(t− 1), x, a(t), a(t+ 1), . . .

(c) Forward : T × [T → A]→ [T → A]

Forward(t, a) = a(t), a(t+ 1), . . .

(d) Backward : T × A× [T → A]→ [T → A]

Backward(t, x, a) = x, x, x, . . . , x, a(0), a(1), . . .

where there are t copies of x before the elements a(0 ), a(1 ), . . . from a.

13. Write out the decimal expansions of

(a) 120

,

(b) 160

, and

(c) 119

.


14. Prove that the decimal expansion of a rational number pq

where q 6= 0 must either befinite and end in zeros,

bmbm − 1 · · · b1 b0 .a1a2 · · · an000 · · ·or repeat, and end in a cycle,

bmbm − 1 · · · b1 b0 .a1a2 · · · anan + 1 · · · an + kan + 1 · · · an + k · · ·In the latter case, the cycle an+1 · · · an + k has length k and repeats forever.

15. Show that the equality of real numbers in [0 , 1 ] is not definable by a finitely determinedstream predicate and cannot be computed.

16. Let X be a space. Let C be any set of coordinates or addresses for X . The associationof coordinates to points is a mapping

α : C → X

whereα(c) = x means c is a coordinate or address for x .

Thus, a coordinate system is pair (C , α). What conditions on α are needed to ensurethat

(a) Every point x has at least one coordinate c?

(b) Every point x has one and only one coordinate c?

(c) Given any point its coordinates can be calculated?

17. Let X be a space. Suppose that C1 and C2 are two different coordinate systems for Xwith functions

α1 : C1 → X and α2 : C2 → X

associating coordinates to points. We define the systems (C1 , α1 ) and (C2 , α2 ) to beequivalent if there are two mappings:

f : C1 → C2

that transforms each C1 coordinate of a point to some C2 coordinate and, conversely, afunction

g : C2 → C1

that transforms each C2 coordinate of a point to some C1 coordinate. These two trans-formation conditions are expressed by the equations

for all c ∈ C1 α1 (c) = α2 (f(c))for all c ∈ C1 α2 (d) = α1 (g(d))

which are depicted in the commutative diagrams shown in Figure 6.11:

In what circumstances does

g(f(c)) = c and f(g(d)) = d

for all c and d , i.e., are the transformations f and g inverses to one another?

In the case that X is the 2 dimensional plane, by constructing f and g , show that


XX

C2

α1

α2

C2

α2C1C1

f g

α1

Figure 6.11: Requirement for coordinate systems C1 and C2 to be equivalent.

(a) Rectangular coordinate systems based at origin p and q are equivalent.

(b) Rectangular coordinate systems based at the same origin p but at different directionsare equivalent.

(c) Rectangular and polar coordinates at the same origin are equivalent.

18. Consider data for the visualisation of 3-dimensional discrete space. Let the space Xbe the 3-dimensional grid Z3 . Suppose there are 4 attributes of each point p in spacerepresenting visibility and colour as follows:

op the opacity of the point p in space;

r the red value of the point p in space;

g the green value of the point p in space;

b the blue value of the point p in space.

The three colour values are reals. The opacity is measured on a scale from 0 transparentto 1 opaque. Then let A be the set [0 , 1 ]×R3 . Here we have an 4-tuple of real numbersas data measuring the appearance at point x ∈ X .

Thus, in this case, a spatial data object

o : X → A

is a mapo : Z3 → ([0 , 1 ]× R3 )

defined at each point (x , y , z ) ∈ Z3 in the form:

o(i , j , k) = (op, r , g , b).

Describe a method of extending the discrete space object o to a continuous space object

o ′ : R3 → (0 , 1 × R3 )

coincident with o.

19. Suppose we have some transformation

t : X → X


of the space. We can define a new operation

T : O(X ,A)→ O(X ,A)

on spatial data objects by applying the transformation on the underlying space as follows.For any spatial object

o : X → A

the new objectT (o) : X → A

is defined byT (o)(x ) = o(t(x )).

Taking X to be the 2 dimensional plane apply the following transformations to the spatialobject: < To be completed. >

20. Show that union ∪(o1 , o2 ) and intersection ∩(o1 , o2 ) operations on spatial objects arecommutative:

∪(o1 , o2 ) = ∪(o2 , o1 )

∩(o1 , o2 ) = ∩(o2 , o1 )

and associative:

∪(o1 ,∪(o2 , o3 )) = ∪(∪(o1 , o2 ), o3 )

∩(o1 ,∩(o2 , o3 )) = ∩(∩(o1 , o2 ), o3 ).

Chapter 7

Abstract Data Types andHomomorphisms

The mathematical theory of many-sorted algebras serves as a basis for a mathematical theoryof data. In Chapter 3, and Chapter 4, the notion of a many-sorted algebra was introduced anddefined formally, and a number of fundamental examples were presented. These algebras, andthe algebras modelling data type constructions in Chapter 6, will be used to define data typesin computations in Part III.

In this chapter, we will study the concept of an

abstract data type.

Every data type has an interface and a family of implementations, representations or realisa-tions. The idea of an abstract data type focuses on those properties of a data type that arecommon to, or are independent of, its implementations. Abstractions are made by fixing aninterface and cataloguing properties of the operations. The idea of abstracting from implemen-tation details is natural since data exists independently of its representation in programs. Theidea is also practically necessary since representations of specific data types are rarely the sameamongst programmers.

We will motivate and explore the general concept largely by investigating the

abstract data type of natural numbers.

The data type of natural numbers is the most simple, beautiful and important in ComputerScience. Many ideas about data and computation can be usefully tested on the naturals, asindeed this chapter will show. Indeed, theoretically, we will argue all data types that can berepresented on a computer must be representable with natural numbers. In Chapter 9, we willinvestigate the abstract data type of real numbers.

We can think of a signature Σ as an interface and a many-sorted Σ -algebra A as a math-ematical model of a specific and concrete representation, or implementation, of a data type.Thus, we can think of a data type as a class of many-sorted algebras. To explore the idea ofan abstract data type, we can investigate properties that are common to the algebras in a classmodelling a family of data type implementations. In particular, we think of an abstract datatype as a class of many-sorted Σ -algebras for which the details of the algebras are hidden.

We will consider the question:

205

206 CHAPTER 7. ABSTRACT DATA TYPES AND HOMOMORPHISMS

When are two representations, or implementations, of a data type with a commoninterface equivalent?

This becomes the mathematical question:

When are two many-sorted Σ -algebras equivalent?

We will introduce the mathematical notion of

Σ -homomorphism

to compare Σ -algebras. Simply put, a

homomorphism is a data transformation that preserves the operations on data.

From the notion of Σ -homomorphism, we can derive the notion of

Σ -isomomorphism

which, simply put, is a

data transformation that can be reversed.

These notions are the key to the problem of how to abstract from implementations and todefine precisely the notion of an abstract data type.

A second question is

How do we characterise, or specify uniquely, a family of representations or imple-mentations of a data type?


How do we characterise, or specify uniquely, a family of Σ -algebras?

We will use the axiomatic method, introduced in Chapter 4, to postulate properties of theoperations of a data type that its implementations must possess. Specifically, we will answerthese questions for the data type of the natural numbers.

We will focus on the simplest operations on natural numbers. We choose a signature con-taining names for the constant 0 and the successor operation n + 1 , and present three axiomsto govern their application. The most important axiom is

the principle of induction

which is based on the generation of all natural numbers by zero and successor. We proveDedekind’s Theorem which says, informally, that

All Σ -algebras satisfying the three axioms are Σ -isomorphic to one another and, inparticular, to the standard Σ -algebras of natural numbers in base 10 and 2 .

The techniques and results are those of Richard Dedekind (1831–1916) who published themin Was Sind und was sollen die Zahlen? in 1888. The little book Dedekind [1888a] is inspiringto study. Dedekind’s results were also obtained independently by Giuseppe Peano (1858-1932)to whom they are often attributed.

To explore further the power of the concept of a Σ -homomorphism, we use it to undertakeanother basic theoretical investigation. We ask:

207

What does it mean to implement a data type?


What does it mean to implement a Σ -algebra?

We will use homomorphisms to answer this question and, using computability theory on thenatural numbers, we answer the question:

What data types can be implemented on a computer?

Homomorphisms are vitally important in our theory of data. It is not hard to see why thisis so. Whenever we encounter data, we find we have

one interface and many implementations

which means that when modelling data, we have

one signature Σ and many Σ -algebras.

To compare implementations, we must compare Σ -algebras. This is done using Σ -homomorphisms.However, it is not easy to learn and master these mathematical ideas. A rigorous under-

standing of the data types of natural and real numbers emerged in the late nineteenth century,some 2300 years after Euclid’s great work surveyed the Greeks’ excellent knowledge of geometryand arithmetic.

The mathematical theory of algebras and homomorphisms is abstract and precise. Thingsthat are abstract are hard to learn. Things that are precise are hard to learn. But, havinglearnt them, they are wonderful tools for analysis and understanding.

In contrast, things that are concrete or imprecise are easier to learn, but poorer tools foranalysis and comprehension.

We conclude this chapter with a taste of the mathematical theory of algebras and homo-morphisms. Our objective is to understand in much greater detail, the implications of havinga Σ -homomorphism

φ : A→ B

between two Σ -algebras A and B .We will examine the set im(φ) of data in B that is computed by φ. It forms a Σ -subalgebra

of B , and is called the image of φ. The “size” of im(φ) measures how close to a surjection φ is.We will examine the extent to which φ identifies elements of A using a equivalence relation

≡φ on A called the kernel of φ. Using the kernel, we build a new Σ -algebra

A/ ≡φ

which is called the factor or quotient algebra of A by ≡φ. The size of A/ ≡φ measures howclose to an injection φ is.

These technical ideas are combined in the main theorem:

Homomorphism Theorem A/ ≡φ is isomorphic to im(φ).

This fact will be applied here and in later chapters.


7.1 Comparing Decimal and Binary Algebras of Natural

Numbers

Let us begin by examining a problem that is easy to state and obviously important. Each ofthe following algebras may be thought of as distinct implementations of the natural numbers:

• the algebra of natural numbers in base 10 or decimal form;

• the algebra of natural numbers in base 2 or binary form;

• the algebra of natural numbers in base b > 0 ;

• the algebra of natural numbers in 1 s complement;

• the algebra of natural numbers in 2 s complement;

• the algebra of natural numbers in modulo n arithmetic in a specific representation.

Given that there are many ways of representing the natural numbers, the question arisesnaturally as to:

When are two different representations of the natural numbers equivalent?

We can formulate this as the question:

When are two algebras of natural numbers equivalent?

Obviously, we will expect most of these algebras of natural numbers with common operationsto be equivalent, but in what precise sense are they equivalent?

We also expect that modulo arithmetic over different factors will not be the same, i.e., usingthe sets

Zn = 0 , 1 , 2 , . . . , n and Zm = 0 , 1 , 2 , . . . ,mwith m 6= n, as a basis for the natural numbers, will not produce the same algebras. So ourimmediate problem is to provide criteria for deciding how two many-sorted algebras of naturalnumbers can be compared.

Consider two simple algebras of natural numbers with standard operations given by thesignature:

signature Ordered Naturals

sorts nat , bool

constants 0 : → nattrue, false : → bool

operations succ : nat → natadd : nat × nat → natmult : nat × nat → nateq : nat × nat → boolgt : nat × nat → bool

7.1. COMPARING DECIMAL AND BINARY ALGEBRAS OF NATURAL NUMBERS 209

but with different number representations, namely the numbers in decimal notation

Adec = (Ndec,B; 0dec, tt ,ff , succdec, adddec,multdec, eqdec, gtdec),

where Ndec = 0 , 1 , 2 , 3 , . . ., and the numbers in binary notation

Abin = (Nbin ,B; 0bin , tt ,ff , succbin , add bin ,multbin , eqbin , gtbin),

where Nbin = 0 , 1 , 10 , 11 , . . ..

7.1.1 Data Translation

To compare these two algebras, and indeed to show they are equivalent, we use translationfunctions:

α : Ndec → Nbin

so that for each d ∈ Ndec,

α(d) = the binary number corresponding with decimal number d

and, conversely,β : Nbin → Ndec

so that for each b ∈ Nbin ,

β(b) = the decimal number corresponding with binary number b.

Since we are aiming to demonstrate equivalence, we expect that these translations arereversible: for all d ∈ Ndec

β(α(d)) = d

and for all b ∈ Nbin

α(β(b)) = b.

As we shall see shortly, these properties of the functions α and β concerning translation amountto the fact that α and β are bijections between the sets Ndec and Nbin .

7.1.2 Operation Correspondence

However, this translation of data, exchanging decimal and binary notations, is not sufficient.Since data is characterised by their operations as well as their representation, we desire thatthe operators are related appropriately. For example, addition and multiplication of decimalnumbers must correspond with the addition and multiplication of binary numbers.

Consider first the translation function α from decimal to binary. For example, take thesuccessor operator. We expect that for any d ∈ Ndec,

Successor Equation α(succdec(d)) = succbin(α(d)).

This equation can be depicted in the commutative diagram shown in Figure 7.1Similarly, for the additive and multiplicative operators, we expect to get for all d1 , d2 ∈ Ndec,

Addition EquationMultiplication Equation

α(adddec(d1 , d2 )) = addbin(α(d1 ), α(d2 ))α(multdec(d1 , d2 )) = multbin(α(d1 ), α(d2 )).


Successor Equation

Addition Equation Multiplication Equation

Equality Equation Order Equation

Ndec Ndec Ndec

α α α

adddec

addbin

×

Nbin × Nbin Nbin

Ndec Ndec Ndec

α α α

multdec

multbin

×

Nbin × Nbin Nbin

Ndec

succdecNdec

αα

Nbin

succbinNbin

Ndec

α α

eqbin

×

Nbin × Nbin

Ndec Ndec

α α

gtbin

×

Nbin × Nbin

BB

Ndec eqdec gtdec

Figure 7.1: Commutative diagrams illustrating the correspondence between the operationsmapping from decimal to binary representations of the natural numbers.

For the relations, we expect that for all d1 , d2 ∈ Ndec,

Equality EquationOrder Equation

eqdec(d1 , d2 ) = eqbin(α(d1 ), α(d2 ))gtdec(d1 , d2 ) = gtbin(α(d1 ), α(d2 ))

The same properties, expressed by the same type of equations, should also hold for thetranslation function β from Nbin to Ndec. Thus, we expect to get for all b1 , b2 ∈ Nbin ,

Successor EquationAddition EquationMultiplication EquationEquality EquationOrder Equation

β(succbin(b)) = succdec(β(b))β(addbin(b1 , b2 )) = adddec(β(b1 ), β(b2 ))β(multbin(b1 , b2 )) = multdec(β(b1 ), β(b2 ))

eqbin(b1 , b2 ) = eqdec(β(b1 ), β(b2 ))gtbin(b1 , b2 ) = gtdec(β(b1 ), β(b2 ))

This is illustrated in Figure 7.2.These properties of α and β concerning operations make α and β into so-called homomor-

phisms. Homomorphisms are translations between algebras that preserve the operations.Since α and β are inverses to one another, they are termed isomorphisms.These types of criteria are appropriate for comparing any pair of algebras of numbers. We

now investigate these mathematical properties in general, for any algebra, only later returningto the problem of comparing Ndec and Nbin .

7.2. TRANSLATIONS BETWEEN ALGEBRAS OF DATA AND HOMOMORPHISMS 211

Successor Equation

Addition Equation Multiplication Equation

Equality Equation Order Equation

Ndec

succdecNdec

ββ

Nbin

succbinNbin

Ndec Ndec Ndec

β β β

adddec

addbin

×

Nbin × Nbin Nbin

Ndec Ndec Ndec

β β β

multdec

multbin

×

Nbin × Nbin Nbin

Ndec

β β

eqbin

×

Nbin × Nbin

Ndec Ndec

β β

gtbin

×

Nbin × Nbin

BB

Ndec eqdec gtdec

Figure 7.2: Commutative diagrams illustrating the correspondence between the operationsmapping from binary to decimal representations of the natural numbers.

7.2 Translations between Algebras of Data and Homo-

morphisms

In the light of the comparison of algebras of natural numbers in Section 7.1, we will now presenta series of definitions that allow us to compare any algebras. Specifically, for any signature Σand any Σ -algebras A and B , we will define translations from the data of A to the data of Bthat preserve the operations named in Σ . These translations will be called

Σ -homomorphisms.

Of course, we will sometimes use Σ -homomorphisms with inverses which are also Σ -homomorphisms.These mappings will be called

Σ -isomomorphisms.

and will be treated later, in Section 7.3.

7.2.1 Basic Concept

Here is the general concept of a homomorphism.


Definition (Homomorphism) Let A and B be many-sorted algebras with common signatureΣ . A Σ -homomorphism φ : A→ B from A to B is a family

φ = 〈φs | s ∈ S 〉

of mappings such that for each sort s ∈ S :

Data Translation

(i) φs : As → Bs ;

Operation Correspondence

(ii) the map φ preserves constants, i.e., for each constant c :→ s in Σ

Constant Equation for c φs(cA) = cB ;

(iii) the map φ preserves operations, i.e., for each operation

f : s(1 )× · · · × s(n)→ s

of Σ , and any a1 ∈ As(1 ), a2 ∈ As(2 ), . . ., an ∈ As(n) then

Operation Equation for fφs(f

A(a1 , a2 , . . . , an)) = f B(φs(1 )(a1 ), φs(2 )(a2 ), . . . , φs(n)(an));

This Operation Equation is depicted in the commutative diagram of Figure 7.3

As(n) As

φs(n) φs

fA

fBBs

As(1)

Bs(1)

φs(1)

× · · · ×

Bs(n)× · · · ×

Figure 7.3: Commutative diagram illustrating the Operation Equation.

Commonly, our algebras have tests so we have equipped our algebras with a standard copyof the Booleans B. Here, B occurs in both algebras A and B , and need not be transformed.

Definition (Homomorphism preserving Booleans) Let A and B be many-sorted algebraswith the Booleans and common signature Σ . A Σ -homomorphism φ : A→ B from A to B isa family

φ = 〈φs | s ∈ S 〉of mappings such that for each sort s ∈ S :

Data Translation


(i) φs : As → Bs ;

(ii) for the sort bool ∈ S , Abool = Bbool = tt ,ff and φbool : Abool → Bbool is the identitymapping;

Operation Correspondence

(iii) the map φ preserves constants, i.e., for each constant c :→ s in Σ

Constant Equation for c φs(cA) = cB ;

(iv) the map φ preserves operations, i.e., for each operation

f : s(1 )× · · · × s(n)→ s

of Σ , and any a1 ∈ As(1 ), a2 ∈ As(2 ), . . ., an ∈ As(n) then

Operation Equation for fφs(f

A(a1 , a2 , . . . , an)) = f B(φs(1 )(a1 ), φs(2 )(a2 ), . . . , φs(n)(an));

and

(v) the map φ preserves relations, i.e., for each relation

r : s(1 )× · · · × s(n)→ bool

in Σ , and any a1 ∈ As(1 ), a2 ∈ As(2 ), . . ., an ∈ As(n) then, recalling condition (ii),

Relation Equation for rrA(a1 , a2 , . . . , an) = rB(φs(1 )(a1 ), φs(2 )(a2 ), . . . , φs(n)(an)).

To simplify notation in conditions (iv) and (v), let us define the product type

w = s(1 )× s(2 )× · · · × s(n)

and a functionφw : Aw → Bw

byφw(a1 , a2 , . . . , an) = (φs(1 )(a1 ), φs(2 )(a2 ), . . . , φs(n)(an))

that simply combines the corresponding component mappings . . . , φs , . . . of φ. We can rewritethe Operation Equation

φs f A(a) = f B φw(a)

for all a = (a1 , . . . , an) ∈ Aw . This is an equation between functions

φ f A = f B φw ;

and is displayed by the simple commutative diagram in Figure 7.4Condition (v) is the special case of (iv), taking s = Bool and applying property (ii). For

relations, the equation isrA = rB φw

and the diagram in Figure 7.5 must commute.


fA

fB

Aw

Bw

As

Bs

φφw

Figure 7.4: Commutative diagram illustrating the preservation of operations under a homo-morphism.

Aw

Bw

B

rA

φw

rB

Figure 7.5: Commutative diagram illustrating the preservation of relations under a Boolean-preserving homomorphism.

7.2.2 Homomorphisms and Binary Operations

To understand the very general conditions (i) – (v) in the definition of a homomorphism, weneed lots of illustrations. We will begin with some simple examples involving only a singleconstant, binary operation and binary relation. Binary operations and relations are operationsand relations with two arguments. They occur everywhere and we have seen many in theconcepts of group, ring and field.

Let ΣBinSys be the signature

signature Binary System

sorts s , bool

constants e : → stt ,ff : → bool

operations : s × s → s= : s × s → bool

and : bool × bool → boolnot : bool → bool

Here are some examples of ΣBinSys-algebras:

• the natural numbers, integers, rational and real numbers equipped with either e = 0 and = addition, or e = 1 and = multiplication;

• strings over any alphabet equipped with the e = empty string and = concatenation;


• subsets of a set with the e = empty set, and = union, intersection, or difference;

together with equality and the standard Boolean operations — some 11 types of algebra in all.Suppose A and B are ΣBinSys-algebras. Applying the definition of a homomorphism that

preserves the Booleans from Section 7.2.1, clause by clause, we obtain the following:A ΣBinSys-homomorphism φ : A→ B is a family

φ = 〈φs , φbool〉

of two mappings, such that:

(i) φs : As → Bs and φbool : Abool → Bbool ;

(ii) φbool : tt ,ff → tt ,ff is defined by

φbool(tt) = tt and φbool(ff ) = ff ;

(iii) Constant Equation φs(eA) = eB ;

(iv) For all x , y ∈ As ,

Binary Operation Equation φs(x A y) = φs(x ) B φs(y);

(v) For all x , y ∈ As ,

Binary Boolean-valued Operation Equation φs(x =A y) = (φs(x ) =B φs(y));

or, more familiarly,

x =A y if, and only if, φs(x ) =B φs(y).

Note that in the presence of equality as a basic operation, the homomorphism is automati-cally an injection.

7.2.3 Homomorphisms and Number Systems

We begin with some simple examples of homomorphisms between number systems. The numbersystems are simply algebras of real numbers or integers equipped with addition or multiplication.The algebras are all examples of Abelian groups (recall Section 5.5).

Let ΣGroup be the signature:

signature Group

sorts group



The laws for the operations of an Abelian group are as follows:


axioms AbelianGroup

Associativity (∀x )(∀y)(∀z )[x (y z ) = (x y) z ]

Commutativity (∀x )(∀y)[x y = y x ]

Identity (∀x )[x e = x ]

Inverse (∀x )[x x−1 = e]

Example 1: Real Numbers with Addition

The addition and subtraction of real numbers is modelled by the ΣGroup-algebra

G = (R; 0 ; +,−)

which satisfies the axioms of an Abelian group. We give some examples of ΣGroup-homomorphismsφ : G → G .

Let λ ∈ R be any real number and define a mapping

φλ : R→ R

byφλ(x ) = λ.x

For instance, if λ = 2 , then we have defined a doubling operation

φ2 (x ) = 2 .x .

Lemma For any λ ∈ R,φλ : G → G

is a ΣGroup-homomorphism.

Proof We must show that φλ preserves the constants and operations named in ΣGroup . Wemust check that three equations hold.

First, we consider the identity. Now

φλ(0 ) = λ.0

= 0 .

Let x , y ∈ R. Then,

φλ(x + y) =λ.(x + y); (by definition of φλ)

=λ.x + λ.y ; (by distribution law)

=φλ(x ) + φλ(y). (by definition of φλ)

Thus, we have shown the validity of

Group Operation Equation φλ(x + y) = φλ(x ) + φλ(y).

Finally, we consider subtraction. For x ∈ R,


Inverse Equation φλ(−x ) = λ.− x= −(λ.x )= −φλ(x ).

2

These homomorphisms φλ are important, both practically and theoretically. Practically,they represent a linear change of scale and can be generalised to linear transformations ofspace. Theoretically, they are the only homomorphisms that are also continuous functions onthe real numbers.

Recall that, roughly speaking, a function f : R→ R is continuous if its graph can be drawn“without taking the pencil off the paper”; see Figure 7.6 and 7.7. A technical definition is this:

Figure 7.6: Function f continuous on R.

−1 1 2

Figure 7.7: Function f discontinuous at x = . . . ,−1 , 0 , 1 , 2 , . . ..

Definition (Continuous function) A function f : R → R is continuous if, for any marginof error ε > 0 , ε ∈ R, on the output of f , there exists a margin of error δ > 0 , δ ∈ R, on theinput, such that for all x , y ∈ R,

|x − y | < δ implies |f (x )− f (y)| < ε.

Lemma Let ψ : (R; 0 ; +,−) → (R; 0 ; +,−) be any group homomorphism. If ψ is continuous,then there exists some λ ∈ R such that

ψ(x ) = φλ(x ) = λ.x

for all x ∈ R.

The proof is not difficult, but requires some properties of continuous functions on R thatwould be better explained elsewhere. There are uncountably many group homomorphisms from(R; 0 ; +,−) to itself, and some have remarkable properties, for instance:


There is a group homomorphism φ : (R; 0 ; +,−) → (R; 0 ; +,−) such that, for anyopen interval

(a, b) = x ∈ R | a < x < band any real number y ∈ R, then there exists an x ∈ (a, b), such that

φ(x ) = y .

Such a homomorphism is severely discontinuous.

Example 2: Real Numbers — Comparing Addition and Multiplication

We compare the operations of addition and subtraction with those of multiplication and divisionon the real numbers. Let

G1 = (R; 0 ; +,−)

be the group of all real numbers based on addition. Let

G2 = (R+; 1 ; .,−1 )

be the group of all strictly positive real numbers, i.e.,

R+ = x ∈ R | x > 0,

based on multiplication.For any real number a, the exponential to base a is a map

expa : R→ R+

defined for x ∈ R byexpa(x ) = ax .

Now the familiar index laws for exponentials are:

a0 = 1

ax+y = ax .ay

a−x =1

ax

and they translate into the following equations

Identity EquationGroup Operation EquationInverse Equation

expa(0 ) = 1expa(x + y) = expa(x ). expa(y)

expa(−x ) = expa(x )−1 .

These are precisely the equations that state that

expa : G1 → G2 is a ΣGroup-homomorphism.

The logarithm to any base a is a map

loga : R→ R+

which is also a ΣGroup-homomorphism loga : G2 → G1 , because for all x ∈ R+,


Identity EquationGroup Operation EquationInverse Equation

loga(1 ) = 0loga(x .y) = loga(x ) + loga(y)loga(x

−1 ) = − loga(x ).

Thus, the ideas that “exponentials turn addition into multiplication” and “logarithms turnmultiplication into addition” are made precise by the statements expa and loga are ΣGroup-homomorphisms. Of course, the fact that multiplication can be simulated by addition is animportant discovery in the development of practical calculation; we will discuss this shortly,when we return to these examples in Section 7.3.

Example 3: Integers

Recall the discussion of the integers Z and the cyclic integers Zn for n ≥ 2 in Section 5.2, wherethey were presented as a family of algebras sharing the basic algebraic properties of rings. Thepoint was that Zn are finite algebras that look like Z. We will make this point in another wayusing homomorphisms to model the idea that Z simulates Zn for all n ≥ 2 .

LetG1 = (Z; 0 ; +,−)

be the additive group of integers, and let

G2 = (Zn ; 0 ; +,−)

be the additive group of integers modulo n ≥ 2 .

Lemma The map φ : Z→ Zn defined for all x ∈ Z by

φ(x ) = (x mod n)

is a ΣGroup-homomorphism.

Proof We must show that φ preserves the constants and operations named in ΣGroup . The firstproperty is trivial since

Identity Equation φ(0 ) = 0 .

We must show that for all x , y ∈ Z:

Addition Equation φ(x + y) = φ(x ) + φ(y)Inverse Equation φ(−x ) = −φ(x )

Consider addition. Let

x = q1n + r1 and y = q2n + r2

for 0 ≤ ri < n. Thus,φ(x ) = r1 and φ(y) = r2 .

Consider the right-hand-side of the addition equation and add these values in Zn :

φ(x ) + φ(y) = (r1 + r2 mod n).


Now suppose in Z thatφ(x ) + φ(y) = r1 + r2 = q3n + r3

for 0 ≤ r3 < n. Thenφ(x ) + φ(y) = r3

in Zn . Now consider the left-hand-side of the addition equation. On adding

x + y = (q1 + q2 )n + r1 + r2

= (q1 + q2 + q3 )n + r3

where 0 ≤ r3 < n. Henceφ(x + y) = r3

in Zn . Thus, φ(x + y) = φ(x ) + φ(y).Consider inverse. Let x = qn + r1 for 0 ≤ r1 < n. Then for the right-hand-side of the

inverse equation,φ(x ) = r1 and − φ(x ) = n − r1 .

For the right-hand-side of the inverse equation,

−x = −qn − r1

= −(q + 1 )n + n − r1

where 0 ≤ n − r1 < n. Thus,φ(−x ) = n − r1

in Zn and φ(−x ) = −φ(x ). 2

7.2.4 Homomorphisms and Machines

To suggest, even at this early stage, the breadth of the applications of algebraic homomorphisms,we look at an algebraic formulation of machines and the idea that one machine simulatesanother.

Definition A deterministic state machine M consists of

(i) a non-empty set I of input data;

(ii) a non-empty set O of output data;

(iii) a non-empty set S of states;

(iv) a state transition functionNext : S × I → S ;

(v) an output functionOut : S × I → O .

We writeM = (I ,O , S ;Next ,Out).

If each component set I , O and S are finite then M is called a finite state machine or Mooremachine.

7.3. EQUIVALENCE OF ALGEBRAS OF DATA: ISOMORPHISMS AND ABSTRACT DATA TYPES221

It is easy to see that, as defined, M is a three-sorted algebra with two operations!Let ΣMachine be a signature for machines such as:

signature Machine

sorts input , output , state

constants

operations next : state × input → stateout : state × input → output

Formally, a machine is just a ΣMachine-algebra. A finite deterministic state machine is simplya finite ΣMachine-algebra.

Consider two machines

M1 = (I1 ,O1 , S1 ;Next1 ,Out1 ) and M2 = (I2 ,O2 , S2 ;Next2 ,Out2 ).

Definition (Machine Simulation) Machine M1 simulates machine M2 if there exists aΣMachine-homomorphism φ : M1 → M2 .

Let us unpack the algebraic definition of homomorphism to see what is involved.Since the machine algebras are three-sorted, a ΣMachine homomorphism φ consists of three

maps:φinput : I1 → I2 φoutput : O1 → O2 φstate : S1 → S2

that preserve each operation of ΣMachine . That is, for any state s ∈ S1 and input x ∈ I1 ,

State Transition EquationOutput Equation

Next2 (φstate(s), φinput(x )) = φstate(Next1 (s , x ))Out2 (φstate(s), φinput(x )) = φoutput(Out1 (s , x )).

These equations when expressed as commutative diagrams, are as shown in Figure 7.8.

S2 ×

×S1

I2

I1

φstate φinput

S1

φstate

S2

Next2

Next1

S2 ×

×S1

I2

I1

φstate φinput

O1

φoutput

O2

Out2

Out1

Figure 7.8: Machine S1 simulates machine S2 .

7.3 Equivalence of Algebras of Data: Isomorphisms and

Abstract Data Types

Our immediate objective is to define precisely the equivalence of two algebras of commonsignature Σ . We now generalise our examination of the equivalence of different algebras ofnatural numbers in Section 7.1, and formalise the idea that


Two Σ -algebras A and B are equivalent when there are two data transformations

α : A→ B and β : B → A

that are inverse to one another and preserve the operations on data named in Σ .

This results in the concept of Σ -isomorphism which allows us to abstract from data represen-tations and formalise the idea of an abstract data type.

7.3.1 Inverses, Surjections, Injections and Bijections

First, let us recall some basic properties of functions and their inverses that, clearly, we willneed shortly.

Definition (Inverses of Maps) Let X and Y be non-empty sets. Let f : X → Y be a map.

1. A map g : Y → X is a right inverse function for f : X → Y if for all y ∈ Y ,

(f g)(y) = f (g(y)) = y

or, simply,f g = idY

where idY : Y → Y is the identity map. A right inverse g is also called a section of f .

2. A map g : Y → X is a left inverse function for f : X → Y if for all x ∈ X ,

(g f )(x ) = g(f (x )) = x

or, simply,g f = idX

where idX : X → X is the identity map. A left inverse g is also called a retraction of f .

3. A map g : Y → X is an inverse function for f : X → Y if it is both a left and rightinverse, i.e.,

f g = idY and g f = idX

Now these inverse properties, especially (1) and (2), may seem rather abstract, but theycorrespond nicely with some other, more familiar, basic properties of functions.

Definition (Properties of Functions) Let X and Y be non-empty sets. We identify threebasic properties of a function f : X → Y .

(i) The function f is said to be surjective, or onto, if for any element y ∈ Y there exists anelement x ∈ X , such that

f (x ) = y .

Note that many elements of X may be mapped to the same element in Y .


(ii) The function f is said to be injective, or one-to-one, if for any elements x1 , x2 ∈ X

x1 6= x2 implies f (x1 ) 6= f (x2 )

or, equivalently,f (x1 ) = f (x2 ) implies x1 = x2 .

Note that there may be elements of Y to which no element of X is mapped.

(iii) The function f is said to be bijective, or a one-to-one correspondence, if f is both injectiveand surjective.

Here is one half of the connection.If f : X → Y is surjective then there exists a map g : Y → X such that for y ∈ Y ,

(f g)(y) = y . (1 )

The idea is that for y ∈ Y , the value

g(y) = any choice x ∈ X such that f (x ) = y .

There are many choices for solutions and, therefore, many maps.Each g that satisfies equation (1 ) above is injective and is called a right-inverse or a section

of f .If f : X → Y is injective then there exists a map g : Y → X such that for x ∈ X ,

(g f )(x ) = x . (2 )

The idea is that for y ∈ Y ,

g(y) =

the unique x ∈ X such that y = f (x ) if y ∈ im(f ),

any element of X if y 6∈ im(f ).

There are many solutions and, therefore, many maps. Each g that satisfies equation (2 ) aboveis surjective and is called a left-inverse or a retraction of f .

If f : X → Y is bijective it is both surjective and injective, so there exists a map g : Y → Xwhich is both a right- and left-inverse for f , i.e., for all x ∈ X , y ∈ Y

(f g)(y) = y and (g f )(x ) = x .

In this case there is only one map with this property; it is called the inverse of f and it iswritten f −1 .

We can picture the map f and its inverse f −1 as a pair of functions with f : X → Y andf −1 : Y → X as shown in Figure 7.9.

In summary, we have:

Lemma (Inverses) Let X and Y be non-empty sets. Let f : X → Y be a map. Then,

(i) f has a right-inverse if, and only if, f is surjective;

(ii) f has a left-inverse if, and only if, f is injective; and

(iii) f has an inverse if, and only if, f is bijective.

To complete the proof, one has to show the existence of inverses imply surjectivity andinjectivity.


f−1(b)

a f(a)••

A B

•f−1

•b

f

Figure 7.9: A function and its inverse.

7.3.2 Isomorphisms

Now we can define the key notion of isomorphism as a “reversible” homomorphism.

Definition (Isomorphism 1) Let A and B be Σ -algebras. A Σ -homomorphism φ : A→ B isa Σ -isomorphism if there is a Σ -homomorphism ψ : B → A that is both a right- and left-inversefor φ. That is, if

φ = 〈φs : As → Bs | s ∈ S 〉 and ψ = 〈ψs : As → Bs | s ∈ S 〉

then, for all a ∈ As

(φs ψs)(a) = a

and(ψs φs)(b) = b.

The map ψ is unique and is written φ−1 .

Definition The algebras A and B are Σ -isomorphic if there exists a Σ -isomorphism φ : A→ B ,and we write

A ∼= B .

Let us spell out the use of isomorphisms:

Two algebraic structures of common signature Σ are considered identical if, andonly if, there is a Σ -isomorphism between them.

Suppose that A and B are isomorphic by some φ, then:

(i) the elements of A and B correspond uniquely to one another under φ and there is aninverse function φ−1 : B → A such that

φ−1(φ(a)) = a and φ(φ−1(b)) = b;

and

(ii) these correspondences φ : A → B and φ−1 : B → A preserve the operations of bothA and B , since for each operation f : w → s of Σ , the diagrams shown in Figure 7.10commute.

We identify two classes of Σ -homomorphism φ : A→ B :


Aw As

Bw Bs

fA

fB

φw φ (φ−1)w

Bw

Aw

Bs

As

φ−1

fB

fA

Figure 7.10: Isomorphisms preserve operations.

Definition (Properties of homomorphisms) Let A and B be Σ -algebras.

1. A Σ -homomorphism φ = 〈φs | s ∈ S 〉 is a Σ -epimorphism if for each sort s ∈ S , thefunction φs : As → Bs is surjective or onto; i.e., for every element b ∈ Bs there exists anelement a ∈ As , such that

φs(a) = b.

2. A Σ -homomorphism φ = 〈φs | s ∈ S 〉 is a Σ -monomorphism if for each sort s ∈ S , thefunction φs : As → Bs is injective or one-to-one; i.e., for all elements a1 , a2 ∈ As ,

a1 6= a2 implies φs(a1 ) 6= φs(a2 )

or, equivalently,φs(a1 ) = φs(a2 ) implies a1 = a2 .

The following simple fact allows us a short cut to the notion of isomorphism.

Lemma Let φ : A → B be a Σ -homomorphism. If φ is a bijection then its unique inverseφ−1 : B → A is also a Σ -homomorphism.

Thanks to this lemma, it is common to see an isomorphism defined to be a bijective homo-morphism. Thus:

Definition (Isomorphism 2) A Σ -homomorphism φ = 〈φs | s ∈ S 〉 is a Σ -isomorphism iffor each sort s ∈ S , the function φs : As → Bs is bijective or a one-to-one correspondence; i.e.,φs is both injective and surjective.

Example: An Equivalence between Addition and Multiplication of Real Numbers

Recall the examples of expa and loga from Section 7.2.3. These group homomorphisms are theinverses for one another.

Lemma The groups G1 = (R; 0 ; +,−) and G2 = (R+; 1 ; .,−1 ) are isomorphic.

This fact, and the concept of homomorphism, is a basis for the traditional use of logarithmsand antilogarithms in calculations. To speed multiplication and division of real numbers bytransforming to problems using addition and subtraction, tables of values of log10 and exp10

were used to calculate with the formulae:

x .y = exp10 (log10 (x ) + log10 (y))

x ÷ y = exp10 (log10 (x )− log10 (y))


The early discovery of logarithms involved more complex isomorphisms than exp10 and log10 ,however.

John Napier (1550-1617) published the idea of tables of logarithms in his Mirifici logarith-morum canonis descriptio in 1614; the Latin title means A description of the marvellous ruleof logarithms. The problem Napier solved was to simulate, or implement, multiplication anddivision by addition and subtraction. In the preface of his later work of 1616 (Napier [1616])we find:

“Seeing there is nothing (right well beloved students in the Mathematics) that isso troublesome to Mathematicall practise, nor that doth more molest and hinderCalculators, than the Multiplications, Divisions, square and cubical Extractions ofgreat numbers, which beside the tedious expense of time, are for the most partsubject to many slippery errors. I began therefore to consider in my minde, bywhat certaine and ready Art I might remove those hindrances. And having thoughtupon many things to this purpose, I found at length some excellent briefe rules tobe treated of (perhaps) hereafter. But amongst all, none more profitable than this,which together with the hard and tedious Multiplications, Divisions and Extrac-tions of rootes, doth also cast away from the worke itselfe, even the very numbersthemselves that are to be multiplied, divided and resolved into rootes, and puttethother numbers in their place, which performe as much as they can do, onely byAddition and Subtraction, Division by two or Division by three . . . ”

The first table based on the decimal system — where log 1 = 0 and log 10 = 1 — was publishedby Henry Briggs (1561-1630) in his Arithmetica Logarithmica of 1624. See Struik [1969] forextracts of Napier’s works and Fauvel and Gray [1987] for a selection of relevant extracts.

7.3.3 Abstract Data Types

Imagine some data type that can be implemented by a number of methods. The data type hasin interface Σ . Each method leads to a Σ -algebra.

Definition (Data Type) A data type consists of:

(i) a signature Σ ; and

(ii) a class K of algebras . . . ,A, . . . of signature Σ .

Each algebra A ∈ K is a model of representation of the data type. We want to think aboutdata and its properties abstractly, that is, independently of representation. We formalise theidea that the two implementations, modelled by Σ -algebras A and B , are equivalent by usingthe notion of Σ -isomorphism, of course.

Definition (Abstraction Principle for Data Types) Let P be a property of a data typeimplementation. We say that P is an abstract property if P is an invariant under isomorphism,i.e., if B is an algebra, and A and B are isomorphic algebras, then P will also hold of B . Thus:

If P is true of A and A ∼= B then P is true of B .

Simple examples of such properties include

7.4. INDUCTION ON THE NATURAL NUMBERS 227

(i) finiteness;

(ii) having constructors;

(iii) any property definable by a set of equations; and

(iv) any property definable by a set of first-order formulae.

In the light of the above discussion of abstract properties of implementations, we can definean abstract data type to be a class of implementations satisfying certain properties.

Definition (Abstract Data Type) An abstract data type consists of

(i) a signature Σ ; and

(ii) a class K of algebras of signature Σ that is closed under isomorphism, i.e.,

A ∈ K and B ∼= A implies B ∈ K .

For a few classical abstract data types, like the natural numbers and real numbers, we cancreate classes with the special property

All algebras in K are isomorphic, i.e.,

A,B ∈ K implies A ∼= B .

We will use the natural numbers as our first important example of an abstract data type.

7.4 Induction on the Natural Numbers

We begin our examination of the natural numbers by recalling the Principle of Induction. Weassume the reader is familiar with the Principle and some of its applications, so our remarkswill be brief. The Principle is based on an algebraic property of numbers.

Consider the signature

signature Naturals

sorts nat



which we can implement with the algebra

algebra Naturals

carriers N



definitionsZero = 0

Succ(n) = n + 1


As we noticed earlier (in Chapter 3), all the natural numbers are generated from the constantzero by the repeated application of the operation succ. Thus,

zero, succ(zero), succ(succ(zero)), succ(succ(succ(zero))), . . .

are terms denoting every natural number, whose values are

Zero, Succ(Zero), Succ(Succ(Zero)), Succ(Succ(Succ(Zero))), . . .

or0 , 1 , 1 + 1 , 1 + 1 + 1 , . . .

The fact that all data can be built up from the constants by applying the operations is animportant property of an algebra; such algebras are said to be minimal. Notice another spe-cial property of the way Succ generates the numbers: each number has one, and only one,construction using Succ.

A consequence of these properties of this algebra of natural numbers is a method of reasoningcalled the Principle of Induction.

Let P be a property of natural numbers. Sometimes this means P is a set of numbers,

P ⊆ N

and we write P(n) to mean n ∈ P .Alternatively, sometimes this means P is a predicate or Boolean-valued function,

P : N→ B

and we say P holds at n ifP(n) = tt

or, simply, P(n) is true. The Boolean-valued function P determines the set

S = n ∈ N | P(n) = tt

of elements for which P : N→ B holds.

7.4.1 Induction for Sets and Predicates

Suppose we want to prove that

all natural numbers are in a set S

or, equivalently,

all natural numbers have a property P.

The Principle of Induction is based upon two statements. Let S ⊆ N. Suppose that:

Base Case 0 ∈ S .

Induction Step If n ∈ S then succ(n) ∈ S .


Then we deduce that:

0 ∈ S because of the Base Case

1 ∈ S because 0 ∈ S and the Induction Step

2 ∈ S because 1 ∈ S and the Induction Step

...

n + 1 ∈ S because n ∈ S and the Induction Step

...

We want to conclude that all the natural numbers are in the set, i.e.,

0 , 1 , 2 , . . . ∈ S or S = N,

because the natural numbers are precisely those made from 0 by applying the operator succ.To conclude from the argument we simply assume the following general principle or axiom

is true.

Principle of Induction on Natural Numbers: Sets

Let S be any set of natural numbers. If the following two statements hold:

Base Case 0 ∈ S.

Induction Step If n ∈ S then n + 1 ∈ S.

Then n ∈ S for all n ∈ N, i.e., S = N.

The Principle of Induction for the natural numbers is commonly stated in terms of theproperties, rather than sets, as follows:

Principle of Induction on Natural Numbers: Properties

Let P be any property of the natural numbers. If the following two statements hold:

Base Case P(0 ) is true.

Induction Step If P(n) is true then P(n + 1 ) is true.

Then P(n) is true for all n.

Alternatively, if P is the name of the property and n a variable, the Principle is concisely statedin logical formulae,

(P(0 ) ∧ (∀n)[P(n)⇒ P(n + 1 )]

)⇒ (∀n)[P(n)].

The Principle can be weakened by placing restrictions on the classes of sets or predicatesthat can appear in the statement.


7.4.2 Course of Values Induction and Other Principles

A variation is this course-of-values induction.

Principle of Course of Values Induction on Natural Numbers

Let P be any property of the natural numbers. If the following two statements hold:

Base Case P(0 ) is true.

Induction Step If P(i) is true for all i < n then P(n + 1 ) is true.

Then P(n) is true for all n.

Alternatively, if P is the name of the property and n a variable, the Course of Values InductionPrinciple is concisely stated in logical formulae,

(P(0 ) ∧ (∀n)(∀i : i < n)[P(i)⇒ P(n)]

)⇒ (∀n)[P(n)].

7.4.3 Defining Functions by Primitive Recursion

The inductive generation of the natural numbers provides an inductive method of definingfunctions. A function f : N → N is defined on all numbers n ∈ N by defining f (0 ) and givinga method that computes the value of f (n + 1 ) from the value of f (n). Here is a definition ofone mechanism.

Definition (Primitive Recursion) Let g : Nk → N and h : N × Nk × N → N be totalfunctions. Then the function

f : N× Nk → N

is defined by primitive recursion from the functions g and h if it satisfies the equations

f (0 , x ) = g(x )

f (n + 1 , x ) = h(n, x , f (n, x ))

for all n ∈ N and x ∈ Nk .The equations allow us to calculate the values

f (0 , x ), f (1 , x ), f (2 , x ), . . .

of the function f on any x by substitution as follows:

f (0 , x ) = g(x )

f (1 , x ) = h(0 , x , f (0 , x ))

= h(0 , x , g(x ))

f (2 , x ) = h(1 , x , f (1 , x ))

= h(1 , x , h(0 , x , g(x )))

f (3 , x ) = h(2 , x , f (2 , x ))

= h(2 , x , h(1 , x , h(0 , x , g(x ))))

...


It seems clear that there is only one function f that satisfies the equations and that we havean operational method for computing it on any n ∈ N.

Lemma (Uniqueness) If g and h are total functions then there is a unique total function fthat is defined by primitive recursion from g and h.

Proof We can prove uniqueness. Suppose that f1 and f2 are total functions that satisfy theprimitive recursion equations, i.e.,

f1 (0 , x ) = g(x )

f1 (n + 1 , x ) = h(n, x , f1 (n, x ))

and

f2 (0 , x ) = g(x )

f2 (n + 1 , x ) = h(n, x , f2 (n, x ))

We will prove that for all n and x that f1 (n, x ) = f2 (n, x ), using the Principle of Induction onN.

Base Case Consider the case n = 0 :

f1 (0 , x ) = g(x )

= f2 (0 , x )

Therefore, the functions are equal at 0 .

Induction Step As induction hypothesis, suppose that

f1 (n, x ) = f2 (n, x ) (IH)

and consider the values of f1 (n + 1 , x ) and f2 (n + 1 , x ). Now

f1 (n + 1 , x ) =h(n, x , f1 (n, x )) (by the definition of f1 )

=h(n, x , f2 (n, x )) (by the induction hypothesis (IH))

=f2 (n + 1 , x ) (by the definition of f2 )

Therefore, if the functions are equal at n, we have shown that they are alsoequal at n + 1 .

By the Principle of Induction, we conclude that

f1 (n, x ) = f2 (n, x )

for all n and x . 2


Examples

Consider the basic functions of arithmetic.Addition is primitive recursive over successor. It is easy to see that add : N × N → N

satisfies

add(0 ,m) = m

add(n + 1 ,m) = succ(add(n,m)).

Multiplication is primitive recursive over addition. The function mult : N× N→ N satisfies

mult(0 ,m) = m

mult(n + 1 ,m) = add(m,mult(n,m)).

Other functions appear in the exercises.

7.5 Naturals as an Abstract Data Type

Let us reconsider the algebras of natural numbers in Section 7.1. First, we focus on the simpleoperation of counting named in the following signature ΣNaturals .

signature Naturals

sorts nat



As we noted earlier, there is a wide range of algebras that have this signature and thatperform a type of counting. The algebras can be infinite or finite, and possess unexpectedproperties. In fact, the class Alg(ΣNaturals) is very diverse.

We will classify those ΣNaturals -algebras that are isomorphic to the standard algebra forcounting

Ndec = (0 , 1 , 2 , . . .; 0 ; n + 1 ).

Hence we will show that all the infinite algebras of natural numbers, represented using differentbases, are isomorphic. To classify the standard algebras we need just three axioms concerningthe properties of the constant zero and operation succ in ΣNaturals :

axioms Dedekind

1 (∀x )[succ(x ) 6= zero]

2 (∀x )(∀y)[succ(x ) = succ(y)⇒ x = y ]

Induction (∀X )[zero ∈ X and (∀x )[x ∈ X ⇒ succ(x ) ∈ X ]⇒ (∀x )[x ∈ X ]]

Theorem (Dedekind)

7.5. NATURALS AS AN ABSTRACT DATA TYPE 233

1. The ΣNaturals-algebras Ndec and Nbin satisfy Dedekind’s Axioms.

2. Let A and B be any ΣNaturals-algebras satisfying Dedekind’s Axioms. Then

A ∼= B .

Corollary All algebras satisfying Dedekind’s Axioms are isomorphic with Ndec.

First we explore the axioms and then we prove the Theorem. The signature ΣNaturals andDedekind’s Axioms 1 and 2 express, in a very abstract way, the essence of a counting system:there is a first element zero and an operation that is injective, returning a next element succ(x )that is unique to x (Axiom 2 ) and does not return the first element (Axiom 1 ).

Here are some Σ -algebras of integers (in decimal notation) that satisfy the axioms.

(0 , 1 , 2 , . . .; 0 ; n + 1 )(1 , 2 , 3 , . . .; 1 ; n + 1 )

(11 , 12 , 13 , . . .; 11 ; n + 1 )(−1 , 0 , 1 , 2 , . . .; −1 ; n + 1 )

(−19 ,−18 , . . . , 0 , 1 , 2 , . . .;−19 ; n + 1 )

In particular, let k be any integer, positive or negative,

(k , k + 1 , k + 2 , . . .; k ; n + 1 )

satisfies the axioms.

Lemma (Minimality of Induction Axioms) Let A = (N ; a; succA) be any ΣNaturals-algebrasatisfying Dedekind’s Axioms. Then the carrier of A is the set

N = a, succA(a), succ2A(a), . . . , succn

A(a), . . ..In particular, the carrier can be enumerated, without repetitions, by applying the successorfunction to the first element, and is an infinite set.

Proof Let E = a, succA(a), succ2A(a), . . . , succn

A(a), . . . be the set of elements of N enumer-ated by succA. Clearly,

E ⊆ N .

That N ⊆ E follows from the Induction Axiom. To see that every element of N is in theenumeration we note that:

• a ∈ N ; and

• if succnA(a) ∈ N then succn+1

A (a) ∈ N .

By Dedekind’s Induction Axiom, A = N .To see that there are no repetitions, suppose i ≥ j and the i th and j th elements are the

same:succi

A(a) = succjA(a).

Then, by applying Dedekind’s Axiom 2 , i − j times to succA, we get

succi−jA (a) = a.

By Dedekind’s Axiom 1 , this holds only when i − j = 0 and i = j . Thus, all elements in theenumeration N are distinct. In particular, A is infinite. 2


We will now use this lemma to prove Dedekind’s Theorem.The first statements that Ndec and Nbin satisfy the axioms, we will not prove. It is tempting

to say that the statement is obvious, since the axioms are taken from the standard arithmeticalstructures. However, that these algebras satisfy the axioms is actually a formal property thatcan be verified using a precise representation of the numbers.

Proof of Dedekind’s Theorem

LetA = (N ; a; succA) and B = (M ; b; succB)

be any two ΣNaturals -algebra satisfying Dedekind’s Axioms.By the Minimality of Induction Axiom Lemma, we know the carriers of A and B can be

enumerated thus:N = a, succA(a), succ2

A(a), . . . , succnA(a), . . .

andM = b, succB(b), succ2

B(b), . . . , succnB(b), . . ..

We define a mapφ : N → M

byφ(a) = b

and for n > 0 ,φ(succn

A(a)) = succnB(b).

We will show that the map is a ΣNaturals -isomorphism φ : A→ B .Now it is easy to check that φ is a bijection. The map φ is clearly surjective because each

element succnB(b) of M is the image of an element succn

A(a) of N . To see that it is injective,we suppose

φ(succiA(a)) = φ(succj

A(a)).

Thus, by definition of φ,succi

B(b) = succjB(b).

This is the case if, and only if, i = j .Let us show it preserves operations and is a Σ -homomorphism: clearly it preserves constants,

since by definition,φ(a) = b.

Next, we show that it also preserves successor, i.e., for any x ∈ N ,

φ(succA(x )) = succB(φ(x )).

Now for any x ∈ N there is one, and only one, n such that x = succnA(a). Hence

φ(succA(x )) = φ(succA(succnA(a)))

= φ(succn+1A (a))

= succn+1B (b)

= succB(succnB(b))

= succB(φ(succnA(a)))

= succB(φ(x )).

7.6. DIGITAL DATA TYPES AND COMPUTABLE DATA TYPES 235

2

Definition (Abstract Data Type of Natural Numbers) We may define the abstract datatype of the natural numbers to be the class

Alg(ΣNaturals ,EDedekind)

of all ΣNaturals -algebras satisfying Dedekind’s Axioms (see Figure 7.11).

Adec•

Abin•

Alg(ΣNaturals)

Alg(ΣNaturals, EDedekind)

Figure 7.11: The class of all ΣNaturals -algebras.

7.6 Digital Data Types and Computable Data Types

In practice, abstract ideas about data are essential in the analysis, development and documen-tation of systems. However, ultimately, they are reduced to low-level executable representationsof data made from bits and machine words. Thus, the data types that are implementable aremodelled by algebras of data that also possess bit and machine word representations.

We will use our algebraic theory of data, and especially homomorphisms, to answer thefollowing question

What abstract data types can be implemented on a digital computer?

Now, to determine which abstract data types can be implemented on a computer, we mustabstract away from the seemingly infinite variety of machine architectures. We shall see moreof this idea later in Chapter 19, but here we will focus on the data types that machines process.

Today, we think of digital machines as processing (electronically) data types consisting of

bits and n-bit words,

for different n, such as the convenient byte (n = 8 ) and common n = 32 or 64 bit word. Theoperations on bits and n-bit words are many and varied, depending as they do on the manyand varied designs for machine architectures. In practice, however, these data types implementmainly arithmetic processes that represent data and govern the operation of machines.

For a theoretical investigation, we will ignore bounds on the size of memory, words, etc.Instead, we might devote some thought to the question


What is a digital data?What is a digital computation?

What makes data digital is its representation. A digital datum can be represented by afinite amount of information, e.g., by finitely many symbols. Today, the symbols 0 , 1 arepre-eminent. Digital computation is a process that transforms the finite representations.

Indeed, these ideas apply equally to digital communication.In history, digital computation, by hand and by machine, has not been based on bits and

n-bit words, but on many different forms of arithmetic notations. Indeed, at a slightly higherlevel of abstraction, a more convenient data type to describe digital computation is the naturalnumbers.

First, we will make the slight data abstraction:

We suppose that the basic data type of a digital machine is the natural numbers.Data is coded using natural numbers.

This simplification allows us to avoid worrying about the parameter of word size, for example.We will investigate the following mathematically precise version of the implementation ques-

tion for data types:

What algebras can be represented by sets of natural numbers and functions on thenatural numbers?

7.6.1 Representing an Algebra using Natural Numbers

Let Σ be any signature. Suppose, for simplicity, that Σ is single-sorted and has the form

(s ; c1 , . . . , cr ; f1 , . . . , fq).

Let the data type implementation be modelled by a Σ -algebra A which has the form:

A = (A; cA1 , . . . , c

Ar ; f A

1 , . . . , fAq )

where the cAi ’s are constants and f A

j ’s are operations. This algebra models some representationof a data type with interface Σ . If the data is digital, then we assume it can be representedusing natural numbers.

To represent or code the data in this algebra A using natural numbers, we need the following:

1. A map α : N → A to code all the data in A; if a ∈ A and n ∈ N and α(n) = a, we saythat n is an α-code for a. Every a ∈ A must have at least one code n ∈ N.

2. The relation ≡α which is used to detect duplication of codes for data is defined form, n ∈ N by

m ≡α n if, and only if, α(m) = α(n).

This relation is an equivalence relation.

Definition (Numbering) A numbering of A is a surjective mapping

α : N→ A.

The equivalence relation ≡α is called the kernel of the numbering.

7.6. DIGITAL DATA TYPES AND COMPUTABLE DATA TYPES 237

In theory, a numbering α is a digital representation of the data in A.Having coded the data by numbers, we must be able to simulate the operations on data by

operations on their codes.To represent the constants and operations of the algebra A on numbers, we need the fol-

lowing:

3. For each constant ci in Σ naming cAi ∈ A, we need to choose a number cN

i ∈ N such that

α(cNi ) = cA

i

for 1 ≤ i ≤ r .

4. For each basic operation fi in Σ naming a function f Ai : Ani → A, we need a function

f Ni : Nni → N

called a tracking function, that simulates the operations of A in the number codes.

This idea of simulation we make precise in the equation

f Ai (α(x1 ), . . . , α(xni

)) = α(f Ni (x1 , . . . , xni

)).

for 1 ≤ i ≤ q , such that the diagram shown in Figure 7.12 commutes.

Ani AfAi

Nni NfNi

αni α

Figure 7.12: Tracking functions simulating the operations of A.

This analysis leads to the following ideas.

Definition (Numbered Algebra 1) An algebra is said to be numbered if

(i) there exists a surjective coding map α;

(ii) for each constant ci ∈ Σ , a number cNi ∈ N; and

(iii) for each function fi in Σ , tracking functions f Ni .

Hypothesis If a data type represented by an algebra A is implementable on a digital computer,then A is numbered.


7.6.2 Algebraic Definition of Digital Data Types

Let us look more closely at the details of this idea of digital representation. The componentsof a numbered algebra are:

(i) a coding map; and

(ii) some tracking machinery.

These have a natural and familiar structure.In our definition of a numbered algebra above, we have the following components:

• The codes for constants and tracking functions can be combined together to form thealgebra

Ω = (N; cN1 , . . . , c

Nr ; f N

1 , . . . , fNq ).

This is an algebra with the same signature Σ as A. In particular, this is an algebra ofnumbers with number-theoretic operations.

• The conditions on α as a coding map for data are equivalent to the statements that:

(i) α is surjective;

(ii) the functionα : Ω→ A

is a Σ -homomorphism; and

(iii) ≡α is the kernel of α.

Thus, the notion of homomorphism is exactly the idea we need to clarify and make precise theidea of digital representation!

Definition (Numbered Algebra 2) A Σ -algebra A is said to be numbered if there exists aΣ -algebra Ω of natural numbers and a surjective Σ -homomorphism α : Ω→ A.

So far, the analysis of the implementation question has resulted in the observation:

Hypothesis If a data type can be implemented on a digital computer then it is modelled by aΣ -algebra A that possesses a surjective Σ -homomorphism

α : Ω→ A

from a Σ -algebra Ω of natural numbers.

7.6.3 Computable Algebras

Having represented the Σ -algebra A of data by a Σ -algebra Ω of numbers, the question arises:

What Σ -algebras of numbers can be implemented on a computer?

More specifically, for a representation α : Ω → A, we need to know that the code set, kernel,constants and tracking functions are actually computable on natural numbers.

For functions, relations and sets to be computable, there must exist algorithms to computethem. The code set is N, and the constants are just numbers, so they are computable. Thus,to perform digital computation, we need to postulate algorithms to

7.7. PROPERTIES OF HOMOMORPHISMS 239

(i) compute the tracking functions f N, and

(ii) decide the kernel ≡α.

Computable functions and sets on N are the subject matter of Computability Theory. Thiswas started in the 1930’s by A Church, A M Turing, E Post and S C Kleene. One of its manyearly achievements was to capture the concept of a

function definable by an algorithm on N.

Many different ways to design algorithms and write specifications and programs have beeninvented and analysed These mathematical models of computation have defined the same classof functions, often called the

partial recursive functions on N.

Such philosophical investigations and mathematical results are the basis of the

Church-Turing Thesis (1936) The set of functions on N definable by algorithmsis the set of partial recursive functions on N.

Definition A Σ -algebra A is computable if, and only if, there is a Σ -algebra of numbers Ωwith partial recursive operations on N and a Σ -epimorphism α : Ω → A such that the kernel≡α is recursively decidable.

Assuming this thesis, one simple answer to the implementation question is:

Hypothesis The abstract data types implementable on a digital computer are those possessingan implementation modelled by a computable Σ -algebra.

7.7 Properties of Homomorphisms

We have spent quite some time on the idea of a homomorphism, motivating it, illustratingit, defining it. We have used it in Dedekind’s Theorem, and in an analysis of data typerepresentation and digital computation. It ought to be clear that homomorphisms are anextremely important technical notion, with interesting and useful applications. We concludethis chapter with a taste of the general theory that rounds off our picture of homomorphisms,and has some applications later on. Our goal is to prove the Homomorphism Theorem inSection 7.9.

If A and B are Σ -algebras and φ : A → B is a Σ -homomorphism, then we know that thedata in A corresponds with at least some of the data in B . The set of data in B is called theimage of φ, as shown in Figure 7.13.

Definition If A and B are S -sorted Σ -algebras and φ = 〈φs : As → Bs | s ∈ S 〉 is a Σ -homomorphism from A to B , we define the image of A under φ by

im(φ) = 〈ims(φ) | s ∈ S 〉

whereims(φ) = b ∈ Bs | b = φs(a) for some a ∈ As.


a

φ

im(φ)

A B

φ(a)

Figure 7.13: The image im(φ) of a homomorphism φ.

We may also use the notation

φ(A) = 〈φs(As) | s ∈ S 〉for the image of A under φ.

Shortly we will show that φ(A) is a Σ -subalgebra of B .

Lemma Let A and B be S-sorted Σ -algebras and φ : A→ B a Σ -homomorphism. Then im(φ)is a Σ -subalgebra of B.

Proof Clearly, im(φ) contains the constants of B , since for each constant symbol c :→ s ∈ Σ ,

cB = φs(cA)

as φ is a homomorphism.We must show that im(φ) is closed under the operations of B . Let f : s(1 )× · · · × s(n)→

s ∈ Σ be any function symbol. For any elements b1 ∈ im(φs(1 )), . . . , bn ∈ im(φs(n)) of theimage of A under φ, there exist elements a1 ∈ As(1 ), . . . , an ∈ As(n), such that

φs(1 )(a1 ) = b1 , . . . φs(n)(an) = bn .

Thus, on substituting,

f B(b1 , . . . , bn) = f B(φs(1 )(a1 ), . . . , φs(n)(an))

= φs(fA(a1 , . . . , an)

since φ is a Σ -homomorphism. Thus, f B(b1 , . . . , bn) ∈ im(φs) because f A(a1 , . . . , an) ∈ A. 2

Another simple property of homomorphisms is that we can compose them.

Lemma (Composing Homomorphisms) If A, B and C are S-sorted Σ -algebras and φ :A → B and ψ : B → C are Σ -homomorphisms, then the composition ψ φ : A → C is aΣ -homomorphism.

Proof For each constant symbol c :→ s ∈ Σ ,

cC = ψ(cB) and cB = ψ(cA)

as φ and ψ are homomorphisms; substituting for cB , we have

= ψ(φ(cA))

= ψ φ(cA).

7.8. CONGRUENCES AND QUOTIENT ALGEBRAS 241

For each function symbol f : s(1 ) × · · · × s(n) → s ∈ Σ , and any elements a1 ∈As(1 ), . . . , an ∈ As(n),

ψ φ(f A(a1 , . . . , an)) =ψ(φ(f A(a1 , . . . , an))) (by definition)

=ψ(f B(φs(1 )(a1 ), . . . , φs(n)(an))) (since φ is a Σ -homomorphism)

=f C (ψs(1 )(φs(1 )(a1 )), . . . , ψs(n)(φs(n)(an)))(since ψ is a Σ -homomorphism)

=f C (ψs(1 ) φs(1 )(a1 ), . . . , ψs(n) φs(n)(an)) (by definition.)

Therefore, ψ φ is a homomorphism. 2

7.8 Congruences and Quotient Algebras

There are plenty of equivalence relations in modelling computing systems. They arise, forexample, where different notations represent the same data, or different programs implementthe same specification. In this section, we show how an equivalence relation on an algebra mayallow us to construct a new algebra, called the quotient or factor algebra, which turns out tobe an invaluable tool in modelling.

7.8.1 Equivalence Relations and Congruences

Recall the definition of an equivalence relation on a set S .

Definition (Equivalence Relations) Let S be any non-empty set. An equivalence relation≡ on S is a binary, reflexive, symmetric and transitive relation on S , This means that ≡⊆ S×Swhich, writing ≡ in infix notation, satisfies the following properties: the relation is

Reflexive (∀x ∈ S )[x ≡ x ]

Symmetric (∀x ∈ S )(∀y ∈ S )[x ≡ y implies y ≡ x ]

Transitive (∀x ∈ S )(∀y ∈ S )(∀z ∈ S )[x ≡ y and y ≡ z implies x ≡ z ]

Given an equivalence relation ≡ on a set S , we can define for each element x ∈ S theequivalence class

[x ] ⊆ S

of x with respect to ≡ to be the set

[x ] = y | y ≡ x.of all elements in S equivalent with x .

It is easily shown that the equivalence classes of S with respect to ≡ form a partition of theset S , i.e., a collection of disjoint subsets whose union is the set S . We let S/ ≡ denote the setof all equivalence classes of members of S with respect to ≡,

S/≡ = [x ] | x ∈ S.We call S/≡ the quotient or factor set of S with respect to the equivalence relation ≡.

Given a quotient set S/ ≡, we can choose an element from each equivalence class in S/ ≡as a representative of that class.


Definition A subset T ⊆ S of elements, such that every equivalence class [x ] has some repre-sentative r ∈ T and no equivalence class has more than one representative r ∈ T , is known asa transversal or, in certain circumstances, a set of canonical representatives or normal formsfor S/ ≡.

The special kind of equivalence relation on the carrier set of an algebra A which is “com-patible” with the operations of A is known as a congruence.

Definition (Congruence) Let A be an S -sorted Σ -algebra and

≡ = 〈≡s ⊆ As × As | s ∈ S 〉

an S -indexed family of binary relations on A, such that for each s ∈ S the relation ≡s isan equivalence relation on the carrier set As . Then ≡ is a Σ -congruence on A if, and onlyif, the relations ≡s satisfy the following substitutivity condition: for each function symbolf : s(1 )× · · · × s(n)→ s and any a1 , a

′1 ∈ As(1 ), . . . , an , a

′n ∈ As(n),

a1 ≡s(1 ) a ′1 and · · · and an ≡s(n) a ′

n implies f A(a1 , . . . , an) ≡s f A(a ′1 , . . . , a

′n).

We let Con(A) denote the set of all Σ -congruences on A. If ≡ is a Σ -congruence on A,then for any a ∈ As , [a] denotes the equivalence class of a with respect to ≡s ,

[a] = a ′ ∈ As | a ′ ≡s a.

Sometimes we index congruences ≡θ, ≡ψ, etc., in which case we write [a]θ for the equivalenceclass of a with respect to ≡θ.

7.8.2 Quotient Algebras

The substitutivity condition of a congruence is precisely what is required in order to performthe following construction.

Definition (Quotient Algebra) Let ≡ be a Σ -congruence on an S -sorted Σ -algebra A. Thequotient algebra A/ ≡ of A by the congruence ≡ is the Σ -algebra with S -indexed family ofcarrier sets

(A/ ≡) = 〈(A/ ≡)s | s ∈ S 〉,where for each sort s ∈ S ,

(A/ ≡)s = (As/ ≡s),

and the constants and operations of the quotient algebra are defined as follows.For each constant symbol c :→ s ∈ Σ , we interpret

cA/≡ = [cA].

For each function symbol f : s(1 )× · · · × s(n)→ s , we interpret

f A/≡([a1 ], . . . , [an ]) = [f A(a1 , . . . , an)].

We must check that what we have called the quotient algebra is indeed a Σ -algebra. Thepoint is that the definition of f A/≡ we have given depends on the choice of representations.

7.8. CONGRUENCES AND QUOTIENT ALGEBRAS 243

Lemma Let A be an S-sorted Σ -algebra and ≡ a Σ -congruence on A. Then, the quotientalgebra A/ ≡ is an S-sorted Σ -algebra.

Proof Since the carriers of A are non-empty and we have defined the constants, we need onlycheck that for each function symbol f : s(1 ) × · · · × s(n) → s , the corresponding operationf A/≡ is well-defined as a function

f A/≡ : (A/ ≡)s(1 ) × · · · × (A/ ≡)s(n) → (A/ ≡)s .

Consider any ai , a′i ∈ As(i) for 1 ≤ i ≤ n and suppose that ai ≡s(i) a ′

i for each 1 ≤ i ≤ n. Wemust show

f A/≡([a1 ], . . . , [an ]) = f A/≡([a ′1 ], . . . , [a ′

n ]),

i.e., f A/≡([a1 ], . . . , [an ]) does not depend upon the choice of representatives for the equivalenceclasses [a1 ], . . . , [an ]. By assumption, ≡ is a Σ -congruence. So by definition, if ai ≡s(i) a ′

i foreach 1 ≤ i ≤ n, then

f A(a1 , . . . , an) ≡s f A(a ′1 , . . . , a

′n),

i.e.,[f A(a1 , . . . , an)] = [f A(a ′

1 , . . . , a′n)].

Then, by definition of f A/≡,

f A/≡([a1 ], . . . , [an ]) = f A/≡([a ′1 ], . . . , [a ′

n ]).

2

Examples

1. For any S -sorted Σ -algebra, the family 〈=s | s ∈ S 〉, where =s is the equality relation onAs , is a Σ -congruence on A, known as the equality, null or zero congruence =.

What is the relationship between A and A/ =?

2. The S -indexed family A2 = 〈A2s | s ∈ S 〉, where A2

s = As × As , is also a Σ -congruenceon A, known as the unit congruence.

As an exercise, we leave it to the reader to check that A/A2 is a unit algebra.

3. Consider the commutative ring of integers. For any n ∈ N, we define a relation forx , y ∈ Z by

x ≡n y if, and only if, (x mod n) = (y mod n).

Equivalently, x ≡n y means that

• if x ≥ y , then x − y = kn for some k ∈ N, and

• if y ≥ x , then y − x = kn for some k ∈ N.

It is easy to check that ≡n is an equivalence relation and, indeed a congruence on thealgebra.


7.9 Homomorphism Theorem

We will now gather together the technical ideas of Sections 7.7 and 7.8.Given any Σ -homomorphism φ : A → B , we can construct a Σ -congruence in a canonical

way.

Definition (Kernel) Let A and B be Σ -algebras and φ : A → B be a Σ -homomorphism.The kernel of φ is the binary relation

(≡φ) = 〈≡φs: s ∈ S 〉

on A defined bya ≡φs

a ′ if, and only if, φs(a) = φs(a′)

for all a, a ′ ∈ As .

Lemma Let φ : A→ B be a Σ -homomorphism. The kernel ≡φ of φ is a Σ -congruence on A.

Proof Since equality on B is a equivalence relation on B , it is easy to see that ≡φ is anequivalence relation on A. To check that the substitutivity condition holds, consider anyfunction symbol f : s(1 ) × · · · × s(n) → s ∈ Σ and any a1 , a

′1 ∈ As(1 ), . . . , an , a

′n ∈ As(n).

Suppose that we havea1 ≡φs(1)

a ′1 , . . . , an ≡φs(n)

a ′n .

Since φ is a Σ -homomorphism, then

φs(fA(a1 , . . . , an)) = f B(φs(1 )(a1 ), . . . , φs(n)(an))

= f B(φs(1 )(a′1 ), . . . , φs(n)(a

′n)) (by hypothesis)

= φs(fA(a ′

1 , . . . , a′n)).

So, by definition,f A(a1 , . . . , an) ≡φ f A(a ′

1 , . . . , a′n).

Therefore, ≡φ satisfies the substitutivity condition. So ≡φ is a Σ -congruence on A. 2

Conversely, given any Σ -congruence≡ on a Σ -algebra A, we can construct a Σ -homomorphismφ : A→ (A/ ≡) in a canonical way.

Definition (Natural map) Let A be a Σ -algebra and ≡ a Σ -congruence on A. The naturalmap or quotient map

nat : A→ (A/ ≡)

of the congruence is a family 〈nat s | s ∈ S 〉 of maps, defined for any a ∈ As by

nat s(a) = [a]φs.

For a kernel congruence ≡φ, we let natφ denote the corresponding natural mapping.

Lemma Let A be a Σ -algebra and ≡ any Σ -congruence on A. The natural map

nat : A→ (A/ ≡)

of the congruence ≡ is a Σ -epimorphism.

7.9. HOMOMORPHISM THEOREM 245

Proof To check that nat is a Σ -homomorphism, consider any constant symbol c :→ s ∈ Σ .Then

nat s(cA) = [cA]

= cA/≡

by definition of A/ ≡.For any function symbol f : s(1 )×· · ·× s(n)→ s ∈ Σ , and any a1 ∈ As(1 ), . . . , an ∈ As(n),

we have

nat(f A(a1 , . . . , an)) = [f A(a1 , . . . , an)]

= f A/≡([a1 ], . . . , [an ])

by definition of A/ ≡,= f A/≡(nat(a1 ), . . . , nat(an)).

So the natural mapping nat is a Σ -homomorphism and clearly nat is surjective, i.e., nat is aΣ -epimorphism. 2

The First Homomorphism Theorem asserts that for any Σ -epimorphism φ : A → B , thehomomorphic image φ(A) and the quotient algebra A/ ≡φ are isomorphic, and hence, for thepurposes of algebra, identical.

Theorem (First Homomorphism Theorem) If φ : A→ B is a Σ -epimorphism, then thereexists a Σ -isomorphism

ψ : (A/ ≡φ)→ B

such that for all a ∈ A,φ(a) = ψ(natφ(a))

i.e., the diagram shown in Figure 7.14 commutes, where natφ : A → (A/ ≡φ) is the natural

φ

A/ ≡φ

A B

natφ ψ

Figure 7.14: First Homomorphism Theorem.

mapping associated with the kernel ≡φ of φ.

Proof Define ψ byψs([a]φs

) = φs(a).

If [a]φs= [b]φs

then φs(a) = φs(b). So ψs([a]φs) = ψs([b]φs

) and therefore, ψs([a]φs) is uniquely

defined.


To check that ψ is a Σ -homomorphism, consider any constant symbol c :→ s ∈ Σ . Then

cB = φ(cA)

= ψs([cA]φs

)

= ψs(cA/≡φ).

Consider any function symbol f : s(1 )×· · ·×s(n)→ s ∈ Σ , and any a1 ∈ As(1 ), . . . , an ∈ As(n).Then

f B(ψs(1 )([a1 ]φs(1)), . . . , ψs(n)([an ]φs(n)

)) = f B(φs(1 )(a1 ), . . . , φs(n)(an))

by definition of ψs ;= φs(f

A(a1 , . . . , an))

since φ is a Σ -homomorphism;= ψs([f

A(a1 , . . . , an)])

by definition of ψs ;= ψs(f

A/≡φ [a1 ]φs(1), . . . , [an ]φs(n)

)

by definition of A/ ≡φ.Therefore, ψ is a Σ -homomorphism.Since ψ is surjective, φ is surjective. For any a, a ′ ∈ A, if [a]φ 6= [a ′]φ then φ(a) 6= φ(a ′),

and soψ([a]φ) 6= ψ([a ′]φ).

Therefore, ψ is also injective, and hence bijective. So ψ is a Σ -isomorphism. 2

Example

In Section 7.6, we modelled the implementation of a data type on a digital computer using anumbering. We showed that:

Suppose a data type is modelled by a Σ -algebra A. If the data type is implementableon a digital computer, then A can be coded using a Σ -algebra Ω of natural numbersand a Σ -epimorphism

α : Ω→ A.

Now, applying the First Homomorphism Theorem to this situation, we conclude immediatelythat

Suppose a data type is modelled by a Σ -algebra A. If the data type is implementableon a digital computer, then A is Σ -isomorphic to a quotient algebra Ω/ ≡α of naturalnumbers, i.e.,

A ∼= Ω/ ≡α .



1. The translations of natural number representations from decimal to another radix b > 0 ,and back again, are part of the input and output procedures of many computations. Findalgorithms that accomplish these transformations.

2. Let G1 = (R; 0 ,+,−) and G2 = (R+, 1 , .,−1 ). Which of the following are ΣGroup-

homomorphisms from G1 to G2 :

(a) f (x ) = log2 (x k) for any fixed k ;

(b) f (x ) = 2 x−k

for any fixed k ;

(c) f (x ) = sin(x ); and

(d) f (x ) = x 2?

3. Show that there is an isomorphism between the algebra

(tt ,ff ; tt ,ff ;Or ,Not)

and the algebra(0 , 1; 0 , 1 ;Vor ,Vnot)

with Vnot(0 ) = 1 and Vnot(1 ) = 0, and

Vor(0 , 0 ) = 0 Vor(1 , 1 ) = 1Vor(1 , 0 ) = 1 Vor(0 , 1 ) = 1

4. Show that the relation ∼= of isomorphism is an equivalence relation, i.e,, isomorphism isreflexive, symmetric and transitive on the class Alg(Σ ) of all Σ -algebras.

5. Let ΣRing be the signature of rings. Let

R1 = (Z; 0 , 1 ; +,−, .)

be the ring of integers andR2 = (Zn ; 0 , 1 ; +,−, .)

be the ring of cyclic integers for n ≥ 2 . Show that

φ : Z→ Zn

defined for x ∈ Z by φ(x ) = (x mod n) is a ΣRing -homomorphism.

6. Let ΣGroup be the signature of groups. Let G1 and G2 be any groups. Let φ : G1 → G2

be any map. Show that if for all x ∈ G1 ,

φ(x .y) = φ(x ).φ(y)

thenφ(x−1 ) = φ(x )−1 .

Hence, φ is a ΣGroup-homomorphism if, and only if, φ preserves the multiplication oper-ation.


7. Let G1 = (G1 , 1 , e1 ) and G2 = (G2 , 2 , e2 ) be groups. Let φ : G1 → G2 be a grouphomomorphism. Define the kernel of φ by

ker(φ) = g ∈ G1 | φ(g) = e2.

Show thatφ is an injection if, andonlyif, ker(φ) = e1.

8. Prove the lemma in Section 7.3.2.

9. Prove the following results using the Principle of Induction:

(a) For any natural number n > 0 ,

1 2 + 2 2 + · · ·+ n2 =1

6n(n + 1 )(n + 2 ).

(b) For any natural number n > 3 ,2 n < n!

(c) For any natural number n > 0 ,

n3 − n is divisible by 6 .

10. Which of the following equations are true of the natural numbers?

(a) For any natural number n > 0 ,

1 3 + 2 3 + · · ·+ n3 = (1 + 2 + · · ·+ n)2 .

(b) For any natural number n ≥ 0 ,

n2 + n + 41

is a prime number.

In each case give a proof using the Principle of Induction or give a counter-example.

11. Prove the following using the Principle of Induction. For any r ≥ 1 and all n ∈ N,

n∑

i=1

i(i + 1 )(i + 2 ) · · · (i + r − 1 ) =1

r + 1n(n + 1 )(n + 2 ) · · · (n + r)

12. Prove the Principle of Course of Values Induction from the Principle of Induction on thenatural numbers.

13. Using the primitive recursive equations, calculate by substitution the values of add(4 , 7 )and mult(4 , 7 ).

14. Show that exp : N × N → N defined by exp(n,m) = mn is primitive recursive overmultiplication.


15. Let g : Nk → Nk be a total function. A function f : N× Nk → Nk is defined by iterationover g if

f (0 , x ) = x

f (n + 1 , x ) = g(f (n, x )).

Show that

f (n, x ) = g · · · g(x ) n times

= gn(x ).

Check that if f is definable by iteration over g then it is definable by primitive recursionover the identity function i : Nk → Nk and g .

16. Let Σ be the signature

signature

sorts nat


operations f : nat → nat

and letA1 = (N; 0 ; n + 1 ) and A2 = (N; k ; g)

where g : N → N is any function. Show that any Σ -homomorphism φ : A1 → A2 isprimitive recursive in g .

17. Give a definition that generalises the concept of primitive recursion on N to functions ofthe form

f : N× A→ B

where A and B are non-empty sets. Show that f is uniquely defined by adapting theproof of Lemma 7.4.3

18. Give a definition that generalises the concept of iteration on N (in Question 15 to functionsof the form

f : N× A→ B

where A and B are non-empty sets. Show that iterations are primitive recursions. Underwhat circumstances can primitive recursions be defined using iterations?

Chapter 8

Terms and Equations

The operations of an algebra are basic operations on the data of the algebra. The compositionof the basic operations named in a signature results in an algebraic formula which is called a

term over a signature.

Thus, the concept of a term formalises the intuitive idea of an algebraic formula but in a waythat is immensely general. Not surprisingly, signatures, basic operations and the terms madefrom them are simply everywhere.

Equations are made from basic operations by asserting that two terms are equal. Algo-rithms are made from basic operations by scheduling terms in some order. In an imperativeprogramming language, expressions in assignments and tests in control statements are termsmade from basic operations declared in a data type interface.

In this chapter we begin the study of the amazing ideas of the term and the equation. Wewill introduce the idea of term and equation rather gently with many examples, keeping inmind their wide applicability. Like a signature, a term over a signature is syntax.

We will model algebraically the data type of terms and apply our general algebraic theoryof data types to create a theory of terms. Terms are a fundamental data type for doing anykind of algebraic manipulation.

To reason about terms, and to define functions on terms, we formulate

principles of structural induction or recursion

on terms. We show how these induction principles are used in transforming terms, in

• mapping terms to trees, and

• evaluating terms in data types.

Armed with the concept of a term, in Section 8.5, we analyse the idea of an

equation.

An equation is made by setting two terms to be equal. An equation is a syntactic idea that isimmensely general, for wherever there are terms, there are equations.

With the key ideas about terms explained it will be obvious that terms are an importantkind of data, as important to the computer scientist as strings and numbers. Therefore, it isnatural to use some simple ideas about modelling data types by algebras, to model the

251

252 CHAPTER 8. TERMS AND EQUATIONS

data type of terms.

We will show that the set of terms over a signature forms an extremely important algebra witha truly inexhaustible supply of applications in Computer Science. By equipping the set of termswith operations we create algebras of terms, called

term algebras.

By simply applying our general theory of data we obtain a general algebraic theory of the datatype of terms.

With this algebraic approach to terms, we are able to show that the ideas of

• a function defined by induction or recursion on terms;

• a term evaluation map; and

• a homomorphism on a term algebra

are equivalent concepts.Finally, using the Homomorphism Theorem, we are are able to prove quite easily that:

Theorem Every algebra is isomorphic to some algebra of terms divided by a congruence relationon terms.

In Section 8.1 we introduce terms. In Section 8.2 we show how to define functions on termsby recursion. In Section 8.3 and Section 8.4 we use structural induction to define the parsetrees of a term and how terms are evaluated on data.

In Section 8.5 we study equations. The algebraic methods begin to assert themselves onlyin Section 8.6 where the term algebras are defined. Section 8.7 shows the equivalence of struc-tural induction with homomorphism and term evaluation and proves the general representationtheorem.

8.1 Terms

First we look at examples of terms and examine their form. Then we introduce the technicaldefinitions in two stages, first for single-sorted signatures and then for many-sorted signatures.

8.1.1 What is a Term?

A term is the result of composing names for operations and applying them to names for con-stants and variables. For example, using the familiar operations of succ and add here are someterms for calculating with natural numbers using prefix notation:

succ(succ(0 )), add(x , succ(y)), succ(add(x , y));

and here are “equivalent” terms that use infix notations:

(0 + 1 ) + 1 , x + (y + 1 ), (x + y) + 1 .

More precisely, given a signature Σ we will define the Σ -terms.Intuitively, a term specifies a computation in which a finite sequence of operations are

applied to data that are the values of the constants and variables. A given collection of namesfor constants and operations is called a

8.1. TERMS 253

signature.

Thus, a term is a syntactic object made from a signature. Indeed, before defining terms it isessential that we declare the signature we will be using. See Chapter 4 for a full introductionto signatures.

Now, the computation of the value of a term on given data is called term evaluation. Inparticular, the Σ -terms are evaluated in Σ -algebras.

Another idea close to that of term, common in programming languages, is

expression or program term.

Expressions appear on the right hand side of assignments, e.g.

x :=√

y2+z 2 ,

and in tests, e.g.

if y2+z 2 < x 2 then . . . else . . .fi

Expressions compute with the operations of a data type. However, sometimes the set of ex-pressions is allowed to be larger than that of terms because of programming constructs, e.g.,

x = choose(y1 , y2 ).

Terms and algebraic terms are determined by a signature. Expressions and program terms aredetermined by a programming language.

When computing with number systems, such as the integers, rationals or real numbers, withthe specific operations of

+,−, .the common name for a term is polynomial, e.g.,

(x + 1 ) + (x − 1 )(x + 1 ).(x − 1 )x 3 + x 2 + x + 1

x 2 − 2√

2x + 2

New operations are commonly added to +, −, . to create more complex algebraic expressionsthat are also terms over the new signature.

A term, at least when applied to numbers, is close to the intuitive idea of algebraic formula.The formulae

2πr and πr 2

for the circumference and area of a circle of radius r are terms based on multiplication. Someother basic formulae are shown in Figure 8.1

Thus, the question

“What is an algebraic formula?”

has the answer

term


Analytical Engine formula√

(a + b)/cd

Hero’s formula√

s(s − a)(s − b)(s − c)

Quadratic formula−b +

√

b2 − 4ac2a

√

ax 2 + bx + cdx 2 + ex + f

k -th root

√√ · · ·√︸︷︷︸

x

(k times)

Polynomial anxn + an−1x

n−1 + · · ·+ a1x + a0

Extended Polynomial ax 2 + bx√

x + cx + d√

x + e

Conic Section Formula ax 2 + by2 + cxy + dx + ey + f

Pendulum formula 2π√

lg

Inverse square formula kqq ′

r2

Figure 8.1: Examples of basic formulae.

over some signature.Now, the notion of a term is basic for that of

equation.

Simple algebraic equations are made from polynomials, e.g.,

(x + 1 ) = (x − 1 )

(x + 1 ).(x − 1 ) = 0

x 3 + x 2 = x + 1

x 2 − 2√

2x + 2 = 0

Indeed, in Section 8.5, we will define an equation to be anything of the form

t = t ′

where t and t ′ are terms.Because these mathematical examples are familiar they help prepare us to study the theory

of terms. However, they do not suggest the great generality and universal applicability of terms:

8.1. TERMS 255

Working Principle Wherever you find operations on data you find

(i) terms and

(ii) equations between terms.

We will now define the notion of term properly.

8.1.2 Single-Sorted Terms

To make the notion of a term clearer, we shall consider the idea when we only have one sort inour signature.

Definition (Single-sorted terms) Let Σ be a single-sorted signature

signature

sorts s

constants . . . , c : → s , . . .

operations . . . , f : sn → s , . . .

Let X be a non-empty set of variables of sort s .We inductively define the set T (Σ ,X ) of terms by:

(i) each constant symbol c :→ s in Σ is a term

c ∈ T (Σ , X);

(ii) each variable symbol x ∈ X is a term

x ∈ T (Σ , X);

(iii) for each n-ary function symbol f : sn → s in Σ and any terms t1 , . . . , tn ,

f (t1 , . . . , tn) ∈ T (Σ ,X )

is a term; and

(iv) nothing else is a term.

Note that without clause (iv), we would not know that clauses (i)–(?? are the only ways ofmaking terms. Thus property is needed for induction on terms.

The definition places no restrictions on the set X of variables. Usually, we will assume that

X is finite or countably infinite.

Sometimes we will enumerate explicitly what the variables in X are, e.g.,

X = x1 , x2 , . . . , xn or X = x1 , x2 , . . ..Sometimes we will choose X and use familiar notations to denote individual variables, e.g.,

x , y , z ∈ X .

We will illustrate the general idea of a term by giving examples of Σ -terms for a number ofchoices of signature Σ , starting with basic data types.


Example: Boolean terms

Consider a signature for a standard set of Boolean operations in prefix notation:

signature Booleans1

sorts bool


operations not : bool → booland : bool × bool → boolor : bool × bool → boolimplies : bool × bool → bool

endsig

LetX = b1 , b2 , . . .

be a set of variables of sort bool . Here are some terms in T (ΣBooleans1 ,X ):

true, false,b1 , b2 , b3 , . . .

and(b1 , b2 ), and(b1 , b3 ), . . .and(b1 , and(b2 , b3 )), and(and(b1 , b2 ), b3 ),

or(b1 , not(b1 ))implies(implies(not(b2 ), not(b1 )), implies(b1 , b2 )),implies(implies(b1 , b2 ), implies(not(b2 ), not(b1 ))),

implies(and(implies(b1 , b2 ), implies(b2 , b3 )), implies(b1 , b3 ))

Consider the signature ΣBooleans2 which we define as follows:

signature Booleans2

sorts bool


operations ¬ : bool → bool∧ : bool × bool → bool∨ : bool × bool → bool⇒: bool × bool → bool

endsig

and the set X = b1 , b2 , b3 , . . . of bool -sorted variables.

8.1. TERMS 257

We can form terms in T (ΣBooleans2 ,X ):

true, false,b1 , b2 , b3 , . . .∧(b1 , b2 ) ∧ (b1 , b3 )

∧(b1 ,∧(b2 , b3 )), ∧(b1 ,∧(b2 , b3 ))∨(b1 ,¬(b1 ))

⇒ (⇒ (¬(b2 ),¬(b1 )),⇒ (b1 , b2 ))⇒ (⇒ (b1 , b2 ),⇒ (¬(b2 ),¬(b1 ))),∧(⇒ (b1 , b2 ),⇒ (b2 , b3 )),⇒ (b1 , b3 )).

Finally, in their more familiar infix form:

true, false,b1 , b2 , b3 , . . .b1 ∧ b2 b1 ∧ b3

(b1 ∧ b2 ) ∧ b3 , b1 ∧ (b2 ∧ b3 ),b1 ∨ ¬b1

(¬b2 ⇒ ¬b1 )⇒ (b1 ⇒ b2 )(b1 ⇒ b2 )⇒ (¬b2 ⇒ ¬b1 ),

((b1 ⇒ b2 ) ∧ (b2 ⇒ b3 ))⇒ (b1 ⇒ b3 ).

These syntactic expressions are also known as

propositional expressions or propositional formulae.

Example: Natural number terms

Consider the signature ΣNaturals1 which we define as follows:

signature Naturals1

sorts nat


operations succ : nat → natadd : nat × nat → natmult : nat × nat → nat

endsig

LetX = x1 , x2 , . . .

be a set of variables of sort nat .Then the set T (ΣNaturals ,X ) of terms includes:

x1 succ(x1 ) succ2 (x1 ) · · · succn(x1 ) · · ·add(zero, x1 ) add(x1 , x2 ) add(succ(x1 ), succ2 (x2 ))

mult(x1 , succ(x2 )) mult(zero, add(x1 , zero)) add(x1 ,mult(x2 , succ(zero))),mult(succ(x1 ),mult(succ(x2 ), succ(x3 ))).


The following signature ΣNaturals2 uses the standard infix notation for successor, additionand multiplication:

signature Naturals2

sorts nat

constants 0 : → int

operations + 1 : int → int+ : int × int → int. : int × int → int

endsig

The corresponding terms in T (ΣNaturals2 ,X ) are

x1 x1 + 1 (x1 + 1 ) + 1 (· · · ((x1 + 1 ) + 1 ) + · · ·+ 1 ) + 1 )0 + x1 x1 + x2 (x1 + 1 ) + ((x2 + 1 ) + 1 )x1 .(x2 + 1 ) 0 .(x1 + 0 ) x1 + (x2 .(0 + 1 ))

(x1 + 1 ).((x2 + 1 ).(x3 + 1 ))

Example: Real Number Terms

The algebra we extracted from Babbage’s letter on the Analytical Engine in Section 2.2 can begiven the following single-sorted signature ΣAE :

signature AE

sorts real

constants 0 , 1 ,Max ,Min : → real

operations + : real × real → real− : real × real → real. : real × real → real/ : real × real → real√

: real → real

endsig

Composing these functions in various ways leads to a range of familiar useful formulae orexpressions, which are all examples of terms over ΣAE . Algebraic expressions concerning realnumbers use standard conventions in infix notation and brackets that simplify the appearanceof terms. Let X be any set of variables of sort real . Suppose x , y , z ∈ X . For example, hereare some terms over ΣAE :

x + (y + z ) (x + y) + zx .(y .z ) (x .y).z

8.1. TERMS 259

In working with real numbers, these pairs of terms are to evaluate to the same values and,because addition and multiplication of reals satisfy the associativity law, we write

x + y + zx .y .z

respectively. However, in the case of ΣAE , we must be careful with the constants Max andMin.

A quadratic expressiona.x 2 + b.x + c

is a simplification of some term such as

(a.(x .x ) + (b.x )) + c

or((a.x ).x ) + ((b.x ) + c).

There is a great deal of algebraic analysis behind these familiar and seemingly obvioussimplifications. 2

Most terms have variables, though not all. We will define precisely for any term t ∈T (Σ ,X ), the set

var(t)

of all variables from X appearing in t . For example,

var(x1 + (x2 + x3 )) = x1 , x2 , x3var(succ2 (zero)) = ∅.

Definition (Term Variables) Let t ∈ T (Σ ,X ). The set var(t) of all variables in t is definedby induction on the structure of t .

Constant t ≡ c then:var(t) = ∅

Variable t ≡ xi then:var(t) = xi

Operation t ≡ f (t1 , . . . , tn) then:

var(t) = var(t1 ) ∪ · · · ∪ var(tn)

Definition (Closed and Open Terms) A term is closed if it does not contain any variables,i.e.,

var(t) = ∅otherwise it is called open, when

var(t) 6= ∅.The set T (Σ , ∅) of closed terms over a signature Σ and the empty set of variables is denotedby

T (Σ ).


Example The set T (ΣNaturals1 ) of closed terms includes:

zero succ(zero) succ2 (zero) · · · succn(zero) · · ·add(zero, zero) add(zero, succ(zero)) add(succm(zero), succn(zero))

mult(zero, zero) mult(zero, succ(zero)) mult(succ(zero), zero)add(zero,mult(succm(zero), succn(zero))) mult(succm(zero), succn(zero))

mult(add(succm(zero), succn(zero)), succp(zero)).

8.1.3 Many-Sorted Terms

Signatures are naturally many-sorted because operations take arguments of different sorts. Thiscomplicates the notion of terms somewhat: all constants, variables and terms have various sorts.

Definition (Many-sorted terms) Let Σ be the signature

signature

sorts . . . , s , . . .

constants . . . , c : → s , . . .

operations . . . , f : s(1 )× · · · × s(n)→ s , . . .

For each sort s ∈ S , the setXs

is a non-empty set of variable symbols of sort s . Let

X = 〈Xs | s ∈ S 〉

be a family of non-empty sets of variables indexed by the sort set S of Σ .For each sort s ∈ S , we must define the set

T (Σ ,X )s

of all terms of sort s . LetT (Σ ,X ) = 〈T (Σ ,X )s | s ∈ S 〉

be the family of all Σ -terms indexed by the sort set S .We will have to define all the T (Σ ,X )s for all s ∈ S at the same time. We need to do this

because terms of sort s typically have sub-terms of other sorts. We say this is an inductionthat is simultaneous for all sorts s .

The set T (Σ ,X )s of all terms of sort s is defined by simultaneous or mutual induction by:

(i) each constant symbol c :→ s is a term of sort s , i.e.,

c ∈ T (Σ ,X )s ;

(ii) each variable symbol x ∈ Xs is a term of sort s , i.e.,

x ∈ T (Σ ,X )s ;

8.1. TERMS 261

(iii) for each function symbolf : s(1 )× · · · s(n)→ s

and any terms

t1 of sort s(1 ), . . . , tn of sort s(n)

i.e.,t1 ∈ T (Σ ,X )s(1 ), . . . , tn ∈ T (Σ ,X )s(n),

the expression f (t1 , . . . , tn) is a term of sort s , i.e.,

f (t1 , . . . , tn) ∈ T (Σ ,X )s ;

and

(iv) nothing else is a term.

Example: Terms over Natural Numbers and Booleans

Let us now consider two sorted terms over the following signature ΣNat+Tests :

signature Nat + Tests

sorts nat , bool

constants zero : → nattrue : → boolfalse : → bool

operations succ : nat → natpred : nat → natadd : nat × nat → natmult : nat × nat → natequals : nat × nat → boolless : nat × nat → boolnot : bool → booland : bool × bool → bool

endsig

This is a two sorted signature so we define

X = 〈Xnat ,Xbool〉.

to be two sets of variables whereXnat = x1 , x2 , . . .

is a nat-sorted set of variable symbols and

Xbool = b1 , b2 , . . .


is a bool -sorted set of variable symbols.We can form the terms of sort nat :

x2

succ(x2 )succ(succ((x2 )))

succn(x2 )add(x1 , x2 )

mult(pred(x1 ), succ(x2 ))succ(add(zero, x1 ),mult(x2 , x3 ))

. . .

Notice that all terms of sort nat have sub-terms of sort nat . This is because all the operationsthat return values of sort nat have arguments of sort nat only.

Now consider the terms of sort bool :

b1

truefalse

and(true, true)not(b1 )

or(b1 , and(false, b2 ))equals(x1 , zero)

less(x1 , x2 )and(equals(x2 , succ(zero)), not(less(x2 , succ(zero))))

. . .

Here, terms of sort nat are sub-terms of some terms of sort bool . Thus, to define terms of sortbool , we also need to define terms of sort nat at the same time, so we need a simultaneous ormutual induction.

Example: Terms over Real Numbers and Booleans

Here is a signature ΣReals with lots of basic operations and containing some infix and somepostfix notations.

8.1. TERMS 263

signature Reals

sorts real , bool

constants 0 , 1 , π : → realtrue, false : → bool

operations + : real × real → real− : real → real. : real × real → real−1 : real → real√

: real → real

| | : real → realsin : real → realcos : real → realtan : real → realexp : real → reallog : real → real

=: real × real → bool<: real × real → bool

not : bool → booland : bool × bool → bool

endsig

We defineX = 〈Xreal ,Xbool〉.

to be two sets of variables whereXreal

is a real -sorted set of variable symbols and

Xbool

is a bool -sorted set of variable symbols.Typical terms in ΣReals ,X ) of sort real include terms we have seen earlier (e.g., polynomials

etc.). Recalling our remarks on brackets and associativity, we have new terms such as these:for x , y , z ∈ Xreal ,

sin(x + y)sin(x ). cos(y) + cos(x ). sin(y)

sin(x + y + z )sin(x ). cos(y). cos(z ) + cos(x ). sin(y). cos(z ) + cos(x ). cos(y). sin(z )− sin(x ). sin(y). sin(z )

sin2 (x )− cos2 (y)−x . log(|x|)

log(log(log(x )))x 2 + x . log(x ) + x + log(x ) + 1


Terms of sort bool include the earlier terms, and tests involving variables of sort real suchas these: for x , y , z , r , ε ∈ Xreal ,

x = y¬(x = y)

x < y ∨ x = yx 2 + y2 < r2 ⇒ z > 1000

|x − y | < ε

Example: Storage Media

Recall the following signature Σ Storage that models a simple interface for abstract storage media:

signature Storage

sorts store, address , data

constants

operations in : data × address × store → storeout : address × store → data

endsig

This is a three sorted signature. We define

X = 〈Xstore ,Xaddress ,Xdata〉

to be three sets of variables where

Xstore = s(1 , s(2 , . . .

is a store-sorted set of variable symbols,

Xaddress = a1 , a2 , . . .

is an address-sorted set of variable symbols, and

Xdata = d1 , d2 , . . .

is a data-sorted set of variable symbols.Typical Σ Storage terms in T (Σ Storage ,X ) of sort store are the variables

s1 , s2 , . . .

and terms with outermost operation symbol in:

in(d1 , a1 , s1 )in(out(a1 , s1 ), a2 , s2 )

in(d2 , a2 , in(d1 , a1 , s1 ))in(out(a3 , s2 ), a2 , in(d1 , a1 , s1 ))

8.2. INDUCTION ON TERMS 265

The operation in is the only operation symbol with codomain store.The terms of sort address are precisely the variables:

a1 , a2 , · · ·

since there are no constants of sort address nor operations with codomain address .The terms of sort data are the variables

d1 , d2 , . . .

and terms with outermost operation symbol out :

out(a1 , s1 ),out(a1 , in(d1 , a1 , s1 )),

out(a3 , in(out(a2 , s2 ), a1 , s1 )).

The operation out is the only operation symbol with codomain data.

8.2 Induction on Terms

Terms are syntactic objects which are built from constant symbols and variables by the appli-cation of function symbols. Starting from constants . . . , c, . . . from signature Σ and variables. . . , x , . . . from X , the operation symbols are applied again and again: given an operationsymbol f and terms t1 , . . . , tn , construct the new term

f (t1 , . . . , tn).

All the terms are made in this way. Each term is the result of a unique process of choosingconstants and variables and applying function symbols. Different choices or orders of applicationlead to different terms.

The set T (Σ ,X ) of all terms is similar to that of the natural numbers N, in which eachnumber is built from the constant 0 by applications of the function n + 1 in a unique way.

In analogy with the natural numbers, we may use this way of generating terms to formulatemethods of reasoning and define transformations by

induction or recursion on terms.

First, we explain these ideas in the case of single-sorted terms. Then, we repeat the expla-nation in the case of many-sorted terms.

8.2.1 Principle of Induction for Single-Sorted Terms

Suppose Σ is single sorted. Now given any Σ -term t , either t is some constant symbol c, or tis some variable, or t is a term built up by function application and has the form f (t1 , . . . , tn).Induction exploits this simple fact.


Principle of Induction for Single-Sorted Terms

Let Σ be a single-sorted signature, with sort s, and T (Σ ,X ) the set of all terms over Σ . LetP be any property of Σ -terms, i.e.,

P ⊆ T (Σ ,X )

is a set of terms having that property we write P(t) for t ∈ P. If the following two statementshold:

Base Case P(c) is true for each constant symbol c ∈ Σ ;

P(x ) is true for each variable x ∈ X .

Induction Step Let f : sn → s ∈ Σ be any function symbol. Let t1 , . . . , tn ∈ T (Σ ,X ) be anyterms of sort s. If

P(t1 ), . . . ,P(tn)

are all true thenP(f (t1 , . . . , tn))

is also true.

Then we may conclude that

P(t) is true for all terms t ∈ T (Σ ,X ).

Thus, one may say that, if property P holds for all constants and variables, and propertyP is preserved after applying each operation symbol to terms satisfying P , then P holds for allterms.

8.2.2 Principle of Induction for Many-Sorted Terms

Now, suppose Σ is many sorted with set S of sorts. For each sort s ∈ S we have to thinkabout the Σ -terms of sort s . Given any s-sorted Σ -term t , either t is some s-sorted constantsymbol c :→ s , or t is some s-sorted variable, or t is a term built up by the application of somefunction f : s(1 )× · · · s(n)→ s and has the form f (t1 , . . . , tn).


Principle of Induction for Many-Sorted Terms

Let Σ be an S-sorted signature and T (Σ ,X ) the set of all terms over Σ . Let P be anyproperty of all Σ -terms, i.e.,

P ⊆ T (Σ ,X )

is a set of terms having that property; we write P(t) for t ∈ P. If the following two statementshold:

Base Case P(c) is true for each constant symbol c :→ s ∈ Σ and for each sort s ∈ S;

P(x ) is true for each variable x ∈ Xs and for each sort s ∈ S.

Induction Step Let f : s(1 )× · · · s(n)→ s ∈ Σ be any function symbol. Let

t1 ∈ T (Σ ,X )s(1 ), . . . , tn ∈ T (Σ ,X )s(n)

be any terms of sorts s(1 ), . . . , s(n). If

P(t1 ), . . . ,P(tn)

are all true thenP(f (t1 , . . . , tn))

is also true.

Then we may conclude that

P(t) is true for all terms t ∈ T (Σ ,X ).

We will use these principles shortly.

8.2.3 Functions on Terms

The fact that 0 and n + 1 can construct all natural numbers is used to define functions onall natural numbers (for example, by primitive recursion, see Section 7.4.3). Similarly, the factthat the constants, variables and function symbols can construct all terms is used to definefunctions on all terms.

Definition (Structural Induction or Recursion for single sorted terms) Let Σ be asingle-sorted signature and T (Σ ,X ) the set of all Σ -terms with variables from X . Let A beany non-empty set.

A functionφ : T (Σ ,X )→ A

is defined by structural induction or structural recursion if for each:

(i) constant c ∈ Σ , there exists some element cA ∈ A,

(ii) variable x ∈ X , there exists some element xA ∈ A,


(iii) function symbol f ∈ Σ , there exists some function f A : An → A,

such that the following equations are satisfied for all terms t1 , . . . , tn ∈ T (Σ ,X ):

φ(c) = cA

φ(x ) = xA

φ(f (t1 , . . . , tn)) = f A(φ(t1 ), . . . , φ(tn))

Before exploring this general method for defining functions on terms, we record an importantfact. It seems clear that the equations describe how to evaluate φ on every term, and that onlyone function φ could satisfy these equations.

Lemma (Uniqueness) The function φ : T (Σ ,X ) → A satisfying the equations is uniquelydefined, i.e., if φ1 and φ2 are two functions satisfying the equations then φ1 = φ2 .

Proof See Exercise 11. 2

Example (Measuring single sorted terms) Consider ways of defining the size of a term.First, one can measure a term by the height or depth of application or nesting of its operations,or, simply, by the number of operations in the term.

Height LetHeight : T (Σ ,X )→ N

be the function that measures the height Height(t) of a term t ∈ T (Σ ,X ). This functionis defined by structural induction as follows:

Height(c) = 1

Height(x ) = 1

Height(f (t1 , . . . , tn)) = max (Height(t1 ), . . . ,Height(tn)) + 1

Comparing these equations with the general definition by structural induction, we seethat A = N and:

(i) for each constant c ∈ Σ , the element cA is

1 ∈ N;

(ii) for each variable x ∈ X , the element xA is

1 ∈ N;

(iii) for each n-ary operation symbol f ∈ Σ , the function f A : Nn → N is such that

f A(m1 , . . . ,mn) = max (m1 , . . . ,mn) + 1 .

Size LetSize : T (Σ ,X )→ N


be the function that measures the size Size(t) of a term t ∈ T (Σ ,X ). This function isdefined by structural induction as follows:

Size(c) = 1

Size(x ) = 1

Size(f (t1 , . . . , tn)) = Size(t1 ) + · · ·+ Size(tn) + 1

Comparing these equations to the general definition by structural induction, we see thatA = N and

(i) for each constant c ∈ Σ , the element cA is

1 ∈ N;

(ii) for each variable x ∈ X , the element xA is

1 ∈ N;

(iii) for each n-ary operation symbol f ∈ Σ , the function f A : Nn → N is such that

f A(m1 , . . . ,mn) = (m1 + · · ·+ mn) + 1 .

Now, let us see the idea in the many-sorted case.

Definition (Structural Induction or Recursion for Many-Sorted Terms) Let Σ be anS -sorted signature and T (Σ ,X ) = 〈T (Σ ,X )s | s ∈ S 〉 the set of all S -sorted Σ -terms withvariables from an S -sorted set X = 〈Xs | s ∈ S 〉. Let A = 〈As | s ∈ S 〉 be any S -sorted familyof non-empty sets.

An S -sorted familyφ = 〈φs : T (Σ ,X )→ A | s ∈ S 〉

of functions is defined by structural induction or structural recursion if:

(i) for each sort s and constant c :→ s ∈ Σ , there exists some element cA ∈ As ,

(ii) for each sort s and variable x ∈ Xs , there exists some element xA ∈ As ,

(iii) for every sort s , s(1 ), . . . , s(n) and function symbol f : s(1 )× · · · × s(n)→ s ∈ Σ , thereexists some function f A : As(1 )× · · · × As(n)→ As ,

such that the following equations are satisfied for all terms t1 ∈ T (Σ ,X )s(1 ), . . . , tn ∈T (Σ ,X )s(n):

φs(c) = cA

φs(x ) = xA

φs(f (t1 , . . . , tn)) = f A(φs(1 )(t1 ), . . . , φs(n)(tn))

Lemma (Uniqueness) The family φ = 〈φs : T (Σ ,X ) → A | s ∈ S 〉 of functions satisfyingthe equations is uniquely defined, i.e., if φ1 and φ2 are two functions satisfying the equationsthen φ1 = φ2 .


Proof See Exercise 11. 2

Example (Measuring Many-Sorted Terms)

Height LetHeight = 〈Height s : T (Σ ,X )s → N | s ∈ S 〉

be an S -sorted family of functions that measure the height Height s(t) of an s-sorted termt ∈ T (Σ ,X )s . These functions are defined by structural induction as follows: for eachsort s ∈ S ,

Height s(c) = 1

Height s(x ) = 1

Height s(f (t1 , . . . , tn)) = max (Heights(1 )(t1 ), . . . ,Heights(n)(tn)) + 1 .

Size LetSize = 〈Sizes : T (Σ ,X )s → N | s ∈ S 〉

be an S -sorted family of functions that measure the size Size s(t) of an s-sorted termt ∈ T (Σ ,X )s . These functions are defined by structural induction as follows: for eachsort s ∈ S ,

Sizes(c) = 1

Sizes(x ) = 1

Sizes(f (t1 , . . . , tn)) = Sizes(1 )(t1 ) + · · ·+ Sizes(n)(tn) + 1 .

8.2.4 Comparing Induction on Natural Numbers and Terms

The principles of structural induction have been presented as methods for proving propertiesand defining functions. They are plausible, even intuitive, methods firmly based on how termsare made. How do we know they are true:

Are these principles of structural induction axioms that we can assume or can theybe proved?

The answer is that they can be proved from a simple principle, namely, the principle ofinduction on natural numbers. Specifically, if induction on N is true then structural inductionon T (Σ ,X ) is true.

Theorem (Induction on Terms) The following are equivalent:

1. The Principle of Induction on N.

2. The Principle of Induction on T (Σ ,X ) for any non-empty signature Σ .

Proof. First, we prove that (1) implies (2). Suppose (1) holds. We have to prove that thestatement of structural induction holds for terms, using statement (1).

Let Height : T (Σ ,X )→ N be the height function defined above. Let P ⊆ T (Σ ,X ) be anyproperty of terms.

Suppose

8.3. TERMS AND TREES 271

(i) P(c) and P(x ) are true; and

(ii) if P(t1 ), . . . , P(tn) are true, then P(f (t1 . . . , tn)) is true.

We have to show that P(t) is true for all t ∈ T (Σ ,X ); we will do this using natural numberinduction on the height of terms.

Basis In the basis case of this natural number induction all terms t are of height 1 : Height(t) =1 . These terms are precisely the constants and variables. By assumption (i), we know that Pis true for all terms of height 1 .

Induction Step The induction hypothesis is: Let k > 1 . Suppose P(t) is true for all t suchthat Height(t) < k .

Now, any term t with Height(t) = k has the form

t ≡ f (t1 , . . . , tn)

where Height(t1 ), . . . ,Height(tn) < k . Thus, by the Induction Hypothesis on N, we know that

P(t1 ), . . . ,P(tn) are true.

By assumption (ii), we know that

P(f (t1 , . . . , tn)) is true.

So, by the Principle of Induction on N, for all terms t ,

P(t) is true.

For a proof that (2) implies (1), see the exercises at the end of this chapter.

8.3 Terms and Trees

The structure of a term can be displayed using a tree. In such a tree, the root and internalnodes are labelled by operation symbols, and the leaves are labelled by constants and variables.After looking at some examples, we will define the process of building trees for terms in general,first in the simple single-sorted case, and then in the many-sorted case. The general process isdefined using structural induction on terms. Trees of terms are widely used in giving semanticsand compiling programming languages.

8.3.1 Examples of Trees for Terms

To start with a single sorted example, recall the signature ΣNaturals1 for natural numbers andthe ΣNaturals1 terms in Examples 8.1.1. Some of the closed terms t in T (ΣNaturals1 , ∅) and theirtree representations Tr(t) are illustrated in Figure 8.2. With variables from X , some of theterms in T (ΣNaturals1 ,X ) and their tree representations are illustrated in Figure 8.3.

For some many-sorted examples, recall the nat , bool-sorted term algebra T (Σ Nat+Tests ,X )of Example 8.1.3. Figure 8.4 shows some terms and their tree representation.


Tr(succ(succ(zero)))

succ

zero

succ

Tr(add(zero, succ(zero)))

zero

add

succ

zero

Figure 8.2: Tree representation of some terms of the algebra T (ΣNaturals1 , ∅).

succ

add

mult

y succ

x

x zero

Tr(add(x, mult(y, succ(zero))))Tr(succn(x))

...

succ

Figure 8.3: Tree representation of some terms of the algebra T (ΣNaturals1 ,X ).

and

false b2

or

b1

TrBool(or(b1, and(false, b2)))

x pred

equals

succ

x

TrBool(equals(x, pred(succ(x))))

Figure 8.4: Tree representation of some terms of the algebra T (ΣNat+Tests ,X ).

8.3.2 General Definitions by Structural Induction

Consider the single-sorted signature Σ . Let X be a set of variable symbols of sort s . LetTree(Σ ,X ) be a set of trees whose nodes are labelled by constants, variables or function sym-bols, i.e., by elements of Σ ∪ X .

Definition (Single-Sorted Trees for Terms) We map the terms of the term algebra intotrees by the map

Tr : T (Σ ,X )→ Tree(Σ ,X )

which is defined by structural induction as follows:

(i) for each constant symbol c :→ s in Σ , the tree

Tr(c) = ·c

i.e., a single node labelled by c;

(ii) for each variable symbol x ∈ X , the tree

Tr(x ) = ·x

8.3. TERMS AND TREES 273

i.e., a single node labelled by x ; and

(iii) for each function symbol f : sn → s in Σ and any terms

t1 , . . . , tn ∈ T (Σ , X),

the treeTr(f (t1 , . . . , tn))

consists of a node labelled by f , and n edges, to which the sub-trees

Tr(t1 ), . . . ,Tr(tn)

are attached by their roots. This is shown in Figure 8.5.

· · ·

f

Tr(t1) Tr(tn)

Figure 8.5: The tree Tr(f (t1 , . . . , tn)).

We now consider the many-sorted case.Let Σ be an S -sorted signature and X an S -sorted set of variable symbols. Let Tree(Σ ,X )

be a set of trees.

Definition (Many sorted Trees for Terms) We map the terms of the term algebra intotrees using the S -sorted family

Tr = 〈Tr s : T (Σ ,X )s → Tree(Σ ,X )s | s ∈ S 〉of maps which are defined by structural induction simultaneously on terms of all sorts s asfollows:

(i) for each constant symbol c :→ s in Σ , the tree

Tr s(c) = ·ci.e., a single node labelled by c;

(ii) for each variable symbol x ∈ Xs , the tree

Tr s(x ) = ·xi.e., a single node labelled by x ; and

(iii) for each function symbol f : s(1 )× · · · × s(n)→ s , and any terms

t1 ∈ T (Σ ,X )s(1 ), . . . , tn ∈ T (Σ ,X )s(n),

the treeTr s(f (t1 , . . . , tn))

consists of a node labelled by f , and n edges, to which the sub-trees

Tr s(1 )(t1 ), . . . ,Tr s(n)(tn)

are attached by their roots. This is shown in Figure 8.6.


Trs(1)(t1)

· · ·

Trs(n)(tn)

f

Figure 8.6: The many-sorted tree Tr s(f (t1 , . . . , tn)).

8.4 Term Evaluation

Terms define calculations by specifying the order in which a number of operations are applied todata that are either constants or held in variables. We will now describe how these calculationsare performed, i.e.,

how terms are evaluated in a set of data.

We again need to use structural induction to define term evaluation.Terms are syntactic objects. Term evaluation gives terms their semantics. To evaluate a

term t we choose a set A of data, and choose an operation f A on A to interpret the operationname f in t . We also need to choose what data cA in A interpret the constant symbol c in tand to choose the values of all the variables in t . This is formalised as follows.

Thus, the semantics of terms is given by a set A and a map

v : T (Σ ,X )→ A,

where v(t) is the semantics or value of term t . To calculate v we must

(i) interpret the constant symbols by elements of A;

(ii) interpret the operation symbols by functions on A; and

(iii) assign elements of A to all the variables using a map v : X → A, which we call anassignment of the variables.

First, we describe the simplest case where Σ is a single-sorted signature.Let c be a constant symbol. We interpret c by an element cA ∈ A. Let f be an n–ary

operation symbol. We interpret f by an operation f A : An → As . Clearly we have an algebra:

algebra A

carriers A

constants . . . , cA : → A, . . .

operations . . . , f A : An → A, . . .

With respect to these fixed interpretations, term evaluation (in a call by value style) is givenby the following.

8.4. TERM EVALUATION 275

Definition (Single-Sorted Term Evaluation) Given an assignment

v : X → A

of an element v(x ) ∈ A to each variable x ∈ X , we define the term evaluation map

v : T (Σ ,X )→ A

derived from v by induction on the structure of terms: for all constants c ∈ sig, variablesx ∈ X , operations f ∈ sig and Σ -terms t1 , . . . , tn , we have

v(c) = cA

v(x ) = v(x )

v(f (t1 , . . . , tn)) = f A(v(t1 ), . . . , v(tn)).

Term evaluation may be compared with call-by-value in imperative programming, where vrepresents a state and v is expression evaluation.

We now extend this definition to a many-sorted signature Σ . Let S be a non-empty set andlet Σ be an S -sorted signature.

Let c :→ s be a constant symbol in Σ . We interpret c by an element cA ∈ As . Letf : s(1 ) × · · · × s(n) → s be an operation symbol of Σ . We interpret f by an operationf A : As(1 ) × · · ·As(n) → As . Clearly we have a Σ -algebra:

algebra A

carriers . . . ,As , . . .

constants . . . , cA : → As , . . .

operations . . . , f A : Aw → As , . . .

Definition (Many-Sorted Term Evaluation) Given an S -sorted family

v = 〈vs : Xs → As | s ∈ S 〉of assignments of elements vs(x ) ∈ As to variables x ∈ Xs , we define the family

v = 〈v s : T (Σ ,X )→ As | s ∈ S 〉of term evaluation maps derived from v by induction on the structure of terms: for every sorts , s(1 ), . . . , s(n) ∈ S , constant c :→ s ∈ sig, variable x ∈ Xs , operation f : s(1 )× · · · × s(n) ∈sig and terms t1 ∈ T (Σ ,X )s(1 ), . . . , tn ∈ T (Σ ,X )s(n), we have

v s(c) = cA

v s(x ) = v(x )

v s(f (t1 , . . . , tn)) = fA(v s(1 )(t1 ), . . . , v s(n)(tn)).

The construction of v from v is an important idea. For the moment let us refer to it as an

extension property,

since the map v : X → A is extended from variables to terms by v : T (Σ ,X ) → A. Later wewill see that it generalises the Principle of Induction to a general algebraic setting through theconcepts of

freeness and initiality.


Semantics of Terms

Having defined the evaluation v(t) of a term t given an assignment v : X → A, we will definethe semantics of a term over all assignments on A. We must uniformise the construction ofv(t) from t and v .

Let [X → A] be the set of all assignments to variables in X of data in A. There are twoequivalent definitions.

1. The mapTE : [X → A]× T (Σ ,X )→ A

is defined by

TE (v , t) = value of term t with variables having values given in v : X → A

= v(t).

2. The mapJ KA : T (Σ ,X )→ ([X → A]→ A)

is defined for t ∈ T (Σ ,X ) byJtKA : [X → A]→ A

where

JtKA(v) = value of term t with variables having values given in v : X → A,

= v(t).

We also write JtKA(v) as JtKA,v .

Example (Term Substitution) Another operation on T (Σ ,X ) worth noting is that of termsubstitution.

Let t be a term, x = x1 , . . . , xn be a sequence of variable symbols and t = t1 , . . . , tn bea sequence of terms; we wish to define the term obtained by substituting the term ti for thevariable symbol xi , for each i = 1 , . . . , n, throughout t . It is customarily denoted

t(x/t) or t(x1/t1 , . . . , xn/tn).

This is trivially done by the extension property.Given x = x1 , . . . , xn and t = t1 , . . . , tn we define an assignment

v = v(x , t) : X → T (Σ ,X )

by

v(x ) =

x if x 6∈ x1 , . . . , xn;yi if x = xi .

Then by the method of extending v to v we obtain the term evaluation map

v : T (Σ ,X )→ T (Σ ,X )

8.5. EQUATIONS 277

derived from v which carries out the required substitution of ti for xi , for i = 1 , . . . , n, for allterms. Thus,

v(t) = t([x1/t1 ], . . . , [xn/tn ]).

This can be refined further by a new map

subn : T (Σ ,X )× X n × T (Σ ,X )→ T (Σ ,X )

which substitutes any n-tuple t = t1 , . . . , tn of terms for any n-tuple x = x1 , . . . , xn of variablesinto a term t . This is defined as follows: given x = x1 , . . . , xn and t = t1 , . . . , tn we define theassignment v(x , t) : X → T (Σ ,X ) and uniformising v we define

subn(t , x , t) = v(t).

Example (Change of Variables) The extension property can also be used to define a changein variable names. Let X and Y be sets of variables of the same cardinality. Suppose X =xi | i ∈ I and Y = yi | i ∈ I . We can define the effect on terms of the transformation ofvariables from X to Y as follows.

Consider the term algebras T (Σ ,X ) and T (Σ ,Y ). On choosing a variable transformationas an assignment v : X → Y which is a bijection say v(xi) = yi we obtain, by the extensionproperty, a map v : T (Σ ,X )→ T (Σ ,Y ) that transforms all the terms. It is possible to provethat v is a bijection.

8.5 Equations

Armed with our knowledge of terms we can begin to analyse the idea of an equation in greatgenerality.

8.5.1 What is an equation?

An equation is a pair of algebraic formulae separated by an equality sign =. Equations aroseafter Robert Recorde (c1510–1558) invented the equality sign in his book The Whetstone ofWitte (1557). The equality symbol must rank among the most widely used of all mathematicalsymbols.

Typical equations from basic mathematics involving real numbers are:


circumference C = 2πrarea A = πr 2

Golden ratio a/b = a + b/ag = 1 + 1/g

straight line y = mx + cquadratic ax 2 + bx + c = 0circle x 2 + y2 = r2

parabola y2 = 4axellipse x 2/a2 + y2/b2 = 1hyperbola x 2/a2 − y2/b2 = 1conics ax 2 + 2hxy + by2 + 2gx + 2fy + c = 0squares’ formula sin2 (x ) + cos2 (x ) = 1addition formulae sin(x + y) = sin(x ). cos(y) + cos(x ). sin(y)

cos(x + y) = cos(x ). cos(y)− sin(x ). sin(y)product formulae sin(2x ) = 2 . sin(x ). cos(x )

cos(2x ) = cos2 (x )− sin2 (x )

To these may be added thousands more equations, derived over millennia, in geometry,analysis, calculus, number theory, combinatorics, logic and, of course, algebra.

In Physics, among thousands of equations, perhaps the most well-known are the formulae:Newton’s Law F = maEinstein’s Equation E = mc2

To which may be added mathematically elementary equations such as the Inverse SquareLaw, Boyle’s Law, Snell’s Law, Ohm’s Law, Kinetic and Potential Energy Equations, . . . andmathematically advanced equations such as the Wave Equation, the Heat Equation, Maxwell’sEquations, Boltzman’s equations, Schrodinger’s Equation, . . . .

In Engineering, whole fields rely on equations such as the Navier-Stokes’ Equation for fluidflow.

Closer to Computer Science, in communications, we haveShannon’s formulae I = −p. log2 (p)

C = W . log2 (1 + S/N ).Other subjects, such as Chemistry, Biology and Economics, have begun to accumulate large

numbers of equations.The equations above sometimes define quantities and operations; for example the formulae

for potential and kinetic energy define these physical concepts. Or sometimes they specifyprecise relationships between quantities and operations; for example, the formulae for the conicshold just for the points in 2 dimensions that lie on these type of curves. Or sometimes theyexpress laws true of operations and all data of a certain kind; for example, the addition andproduct formulae for sin and cos are valid for all real numbers. Thus, depending upon thesituation, the equations either have

(i) one solution, or

(ii) more than one solution, or

(iii) are valid for all data.

8.5. EQUATIONS 279

Much of our knowledge of the world is expressed in equations.Of course, the question arises

Where are the equations in computer science?

The short answer is: Everywhere!Equations are syntactic objects. They are used to define data and operations or to specify

laws that data and operations should satisfy. To see this, simply consider the ubiquitous ideasof

recursion and specifications.

Recursion

Throughout computer science, we write recursive definitions in which a function, procedure,method etc. is defined “in terms of itself”. Such definitions are fundamentally equations. Forexample, the function that returns the greatest common divisor of two natural numbers is oftengiven as a recursive definition follows:

gcd(x , 0 ) = x

gcd(x , y) = gcd(y , (x mod y)) if y > 0 .

What we have written is nothing other than two equations for which the function

gcd : N× N→ N

is the solution! In fact, the equations for gcd are a little complicated because they involve

(i) variables for both natural numbers (x and y) and for a function (gcd); and

(ii) conditions on when an equation is to be used (y > 0 ).

Specification

Throughout computer science we write specifications in which appear laws that operations ondata must satisfy. In Part I, on Data, we have seen several examples of specifications thatused equations — recall the laws of commutative rings, fields, groups and Boolean algebra inChapter 5, Indeed, we gave a concrete syntax for equational specifications in Section 11.4.3.

In the integers, we expect that addition

+ : int × int → int

is associative and commutative, and so satisfies the equations

(x + y) + z = x + (y + z )

x + y = y + x

for all numbers x , y , and z .In a storage medium, we expect that the operation

in : data × address × store → store


that puts data in store and the operation

out : address × store → data

that retrieves data from store satisfies the equation

out(a, in(d , a, s)) = d

for all data d , addresses a, and stores s . In the particular case of the array, the equation mightread

read(update(a, i , x ), i) = x

and, in that of the stack, the equation might read

pop(push(x , s)) = x .

More demanding problems in computer science lead to more complicated equations. Inprogramming we used equations such as

repeat S until b od = S ;while not b do S od

to express laws about semantics. We will see many equations in our study of semantics inPart III: equations to define the behaviour of whole languages and equations that state acompiler is correct.

8.5.2 Equations, Satisfiability and Validity

Now we will define the general idea of an equation and explain what it means for an equationto be true.

Definition (Equations) Let Σ be a signature. An equation e over Σ , or simply a Σ -equation,is a pair e ≡ (t , t ′) of Σ -terms where t , t ′ are of the same sort s , which we always write

t = t ′.

Let Eqn(Σ ,X ) denote the set of all Σ -equations with variable from X .

The set var(e) of variables of an equation e ≡ t = t ′ is the set of variables contained in itsterms, i.e.,

var(e) = var(t) ∪ var(t ′).

It should be clear that all the examples we gave in the previous section are equations insense of the definition above. To see this, in each case, we must define a signature Σ over whichthe left and right hand sides of the equality sign are Σ -terms.

Example Consider the two sorted signature ΣReals for real numbers in Section 8.1.3. A simpleexample is the equation

x 2 + x + 1 = 0 .

The left hand side is the term x 2 +x +1 and the right hand side is the constant 0 which is alsoa term. Recall that this is a simplification since we have dropped brackets. Indeed, accordingto the rules of term formation, x 2 + x + 1 is either

((x .x ) + (x + 1)) or ((x .x ) + x ) + 1.

8.5. EQUATIONS 281

Clearly var(e) = x.Since ΣReals contained sin and cos (and again dropping brackets) the following is also a

ΣReals equation:sin(x + y) = sin(x ). cos(y) + cos(x ). sin(y)

The mixture of infix and postfix is familiar to users of trigonometry. Clearly, var(e) = x , y.We noted earlier that some equations such as

x 2 + x + 1 = 0

have one or more solutions, while other equations such as

sin(x + y) = sin(x ). cos(y) + cos(x ). sin(y)

are true for all data. What does it mean to have a solution of an equation or for an equationto be valid?

Definition (Satisfaction of Equations) Let Σ be a signature. Let

t = t ′

be a Σ -equation with variables from X . We say that the equation is satisfied by, or is valid at,assignment v : X → A if

[[t ]]A(v) = [[t ′]]A(v)

The assignment v to variables can be called a solution to the equation. If the equation has asolution then it may be said to be satisfiable.

These equations are used to specify and calculate particular data. Let

Soln(e) = v | e is satisfied by v.

If Soln(e) has one and only one solution in A then we say that e has a unique solution.The idea of a solution can be expressed in a more familiar way by replacing the valuation

v : X → A with a list a1 , . . . , ak of data from A. Each term and, hence, each equation containsonly finitely many variables. When working with particular terms and equations it is helpfulto use this fact to simplify notation as follows.

Let X = x1 , . . . , xk. Then an assignment v : X → A is a finite association of the form

v(x1 ) = a1 , . . . , v(xk) = ak .

Thus, we can replace the assignment v by its representation by a vector

(a1 , . . . , ak)

and say that the vector (a1 , . . . , ak ) is a solution of the equation if

[[t ]]A(a1 , . . . , ak ) = [[t ′]]A(a1 , . . . , ak).

Example


1. The Golden Ratio is the equation e,

a/b = a + b/a

with variables var(e) = a, b. Setting g = a/b, we obtain the equation

g = 1 + 1/g

and hence the quadratic equation e ′

g2 − g − 1 = 0 ,

with variables var(e ′) = g. This is satisfied by two valuations

v(g) = (1 +√

5 )/2 and v(g) = (1 −√

5 )/2 .

Since the Golden Ratio is positive, the required solution is

(√

5 + 1 )/2 .

2. The equation x = x + 1 does not have any solution in the number systems such as thenatural numbers, integers, rationals and reals. The equation is not satisfiable.

Definition (Validity of Equations) Let Σ be a signature. Let

t = t ′

be a Σ -equation with variables from X . We say that the equation is universally valid, or simplyvalid, if for all assignments v : X → A

[[t ]]A, (v) = [[t ′]]A, (v).

These equations are used to define axioms or laws that all data in A must satisfy. Onceagain we will often restrict attention to finitely many variable to simplify notation.

8.5.3 Equations are Preserved by Homomorphisms

A Σ -homomorphism is, by definition, a mapping between two Σ -algebras that preserves thevalues of the operations named in Σ . Here we will prove that a Σ -homomorphism also preservesthe

(i) values of Σ -terms, and

(ii) satisfiability of Σ -equations

The word “preserve” is suggestive but vague. A Σ -homomorphism between two Σ -algebrasA and B is a map φ : A→ B which satisfies the following equations:

(i) for each constant c ∈ Σ ,φ(cA) = cB

and

8.5. EQUATIONS 283

(ii) for each operation f ∈ Σ ,

φ(f A(a1 , . . . , an)) = f B(φ(a1 ), . . . , φ(an)).

This property extends to all Σ -terms as we see in the next theorem.

Theorem Let Σ be a signature and let A and B be any Σ -algebras. Suppose φ : A → B is aΣ -homomorphism. Then, for any Σ -term t ∈ T (Σ ,X ), we have

φ(JtKA(a1 , . . . , an)) = JtKB(φ(a1 ), . . . , φ(an)).

Proof This is proved by induction on terms.

Basis

(i) Constants t ≡ c. Then

φ(JcKA) = φ(cA) by definition of JcKA;

= cB since φ is a homomorphism;

= JcKB by definition of JcKB .

(ii) Variables t ≡ xi . Then

φ(JxiKA(a1 , . . . , an)) = φ(ai) by definition of JxiKA;

= JxiKB(φ(a1 ), . . . , φ(an)) by definition of JxiKB .

Induction Step t ≡ f (t1 , . . . , tn).As Induction Hypothesis, suppose the equation is true for each sub-term t1 , . . . , tk , i.e.,

φ(JtiKA(a1 , . . . , an)) = JtiKB(φ(a1 ), . . . , φ(an))

for 1 ≤ i ≤ k .We show the equation is true for t .To shorten notation, let

a = (a1 , . . . , an) and φ(a) = (φ(a1 ), . . . , φ(an)).

Then,

φ(JtKA(a)) = φ(Jf (t1 , . . . , tk)KA(a)) by definition of t ;

= φ(f A(Jt1 KA(a), . . . , JtkKA(a))) by definition of J KA ;

= f B(φ(Jt1 KA(a)), . . . , φ(JtkKA(a))) since φ is a homomorphism;

= f B(Jt1 KB(φ(a)), . . . , JtkKB(φ(a))) by induction hypothesis;

= Jf (t1 , . . . , tk)KB(φ(a)) by definition of J KB ;

= JtKB(φ(a)).

Hence the equation is true for t if it is true for all sub-terms.By the Principle of Structural Induction on terms, the equation is true for all terms. 2


Since equations are made from terms, the theorem allows us to derive the next result.

Theorem Let Σ be a signature and let A and B be any Σ -algebras. Suppose φ : A → B is aΣ -homomorphism. Let im(φ) be the image of φ in B. Then, for any equation

t(X ) = t ′(X )

where X = X1 , . . . ,Xn are variables. We have

(i) t(X ) = t ′(X ) is satisfiable in A implies that t(X ) = t(X ′) is satisfiable in im(φ); and

(ii) t(X ) = t ′(X ) is valid in A implies that t(X ) = t(X ′) is valid in im(φ).

In particular, if φ is surjective then

(iii) t(X ) = t ′(X ) is valid in A implies that t(X ) = t(X ′) is valid in B.

Proof Consider (i). Suppose t(X ) = t(X ′) is satisfied in A by a = a1 , . . . , an , i.e.,

JtKA(a1 , . . . , an) = Jt ′KA(a1 , . . . , an).

Applying φ : A→ B to both sides, we have

φ(JtKA(a1 , . . . , an)) = φ(Jt ′KA(a1 , . . . , an)).

Applying the theorem to both sides, we have

JtKB(φ(a1 ), . . . , φ(an)) = Jt ′KB(φ(a1 ), . . . , φ(an)).

This means that t(X ) = t ′(X ) is satisfied in im(φ).Cases (ii) and (iii) are easy to deduce from case (i). 2

8.6 Term Algebras

We have introduced the general idea of a term and shown how to use structural induction

(i) to parse terms using trees, and

(ii) to evaluate terms, and hence compute with them, on any algebra of data.

We have also discussed how the general idea of a term is the basis of the general idea of anequation.

Terms and equations are the very foundation of algebra: they are both tools and objects ofstudy. In this and the next section, we will establish a few of the basic algebraic properties ofterms using the algebraic ideas we have explained in Part I. Now, to students of the algebraictheory of data in Part I, this step ought to seem a natural thing to do. Why? Because syntaxsuch as strings, terms and, indeed, programs, form

data types of syntax

which are modelled by

8.6. TERM ALGEBRAS 285

algebras of syntax.

The algebraic concepts in Part I we will use are

algebra and homomorphism

and, for one theorem,

factor algebra.

Our first task is to make T (Σ ,X ) into a Σ -algebra.If terms form a data type, what are the basic operations on terms? There are many, but we

will start with the simplest. The process of building terms is algebraic. For example, for eachoperation symbol f , given terms t1 , . . . , tn we construct the new term f (t1 , . . . , tn). Actually,we are using a term constructor operation F that creates the new term from the given terms,i.e.,

t1 , . . . , tn 7→ f (t1 , . . . , tn)

or, simply,F (t1 , . . . , tn) = f (t1 , . . . , tn).

This observation leads us to create an

algebra of terms

by adding term constructor operations to the set T (Σ ,X ) of terms. The idea is that we canbuild all the Σ -terms by applying the term constructor operations of the algebra.

We will go through the construction twice, for both single- and many-sorted terms. In eachcase, for any signature Σ , we turn the set T (Σ ,X ) of terms into a Σ -algebra.

Definition (Single-Sorted Term Algebras) Let Σ be a single-sorted signature with sort s .The algebra T (Σ ,X ) is defined to have:

(i) the carrier set T (Σ ,X );

(ii) for each constant symbol c :→ s in the signature Σ , there is a constant term

cT (Σ ,X ) :→ T (Σ ,X )

which is defined bycT (Σ ,X ) = c;

(iii) and for each n–ary function symbol f : sn → s in the signature Σ and any termst1 , . . . , tn ∈ T (Σ , X), there is a term constructor function

f T (Σ ,X ) : (T (Σ ,X ))n → T (Σ ,X )

defined byf T (Σ ,X )(t1 , . . . , tn) = f (t1 , . . . , tn).

In the usual way, we may display the algebra as follows:


algebra Σ terms

carriers T (Σ ,X )

constants...cT (Σ ,X ) : → T (Σ ,X )...

operations...f T (Σ ,X ) : T (Σ ,X )n → T (Σ ,X )...

Example Consider again the signature ΣNaturals1 of the natural numbers example of Sec-tion 8.1.2 and its ΣNaturals1 -algebra of terms:

algebra ΣNaturals1 terms

carriers T (ΣNaturals1,X )

constants 0T (ΣNaturals1,X ) : → T (ΣNaturals1,X )

operations succT (ΣNaturals1,X ) : T (ΣNaturals1,X )→ T (ΣNaturals1,X )

addT (ΣNaturals1,X ) : T (ΣNaturals1,X )× T (ΣNaturals1,X )→ T (ΣNaturals1,X )

multT (ΣNaturals1,X ) : T (ΣNaturals1,X )× T (ΣNaturals1,X )→ T (ΣNaturals1,X )

Here, given any terms t , t1 , t2 ,∈ T (ΣNaturals1 ,X ), for example we can take:

0T (ΣNaturals1 ,X ) = 0

succT (ΣNaturals1 ,X )(t) = succ(t)

addT (ΣNaturals1 ,X )(t1 , t2 ) = add(t1 , t2 )

multT (ΣNaturals1 ,X )(t1 , t2 ) = mult(t1 , t2 )

Thus,

addT (ΣNaturals1 ,X )(mult(x , succ(y)), succ(succ(y))) = add(mult(x , succ(y)), succ(succ(y)))).

Now we extend our single-sorted definition to many sorts.

Definition (Many-Sorted Term Algebras) Let Σ be a many-sorted signature with sort setS . The algebra T (Σ ,X ) is defined to have:

(i) for each sort s ∈ S in the signature Σ , the carrier set T (Σ ,X )s of all s-sorted terms;

(ii) for each constant symbol c :→ s in the signature Σ , there is a constant term

cT (Σ ,X ) :→ T (Σ ,X )s

8.6. TERM ALGEBRAS 287

which is defined bycT (Σ ,X ) = c;

(iii) and for each function symbol f : s(1 )× · · ·× s(n)→ s in the signature Σ there is a termconstructor function

f T (Σ ,X ) : (T (Σ ,X ))s(1 ) × · · · × (T (Σ ,X ))s(n) → T (Σ ,X )s

which given any terms

t1 ∈ T (Σ ,X )s(1 ), . . . , tn ∈ T (Σ ,X )s(n),

is defined byf T (Σ ,X )(t1 , . . . , tn) = f (t1 , . . . , tn).

We may display the algebra as follows:

algebra Σ terms

carriers . . . ,T (Σ ,X )s , . . .

constants...cT (Σ ,X ) : → T (Σ ,X )s...

operations...f T (Σ ,X ) : T (Σ ,X )s(1 ) × · · · × T (Σ ,X )s(n) → T (Σ ,X )s...

Example Consider again the signature ΣNat+Tests of Section 8.1.3 and its ΣNat+Tests -algebraof terms:


algebra ΣNat+Tests terms

carriers T (ΣNat+Tests ,X )nat

T (ΣNat+Tests ,X )bool

constants zeroT (ΣNat+Tests ,X ) : → T (ΣNat+Tests ,X )nat

trueT (ΣNat+Tests ,X ) : → T (ΣNat+Tests ,X )boolfalseT (ΣNat+Tests ,X ) : → T (ΣNat+Tests ,X )bool

operations succT (ΣNat+Tests ,X ) : T (ΣNat+Tests ,X )nat → T (ΣNat+Tests ,X )nat

predT (ΣNat+Tests ,X ) : T (ΣNat+Tests ,X )nat → T (ΣNat+Tests ,X )nat

addT (ΣNat+Tests ,X ) : (T (ΣNat+Tests ,X )nat)2 → T (ΣNat+Tests ,X )nat

multT (ΣNat+Tests ,X ) : (T (ΣNat+Tests ,X )nat)2 → T (ΣNat+Tests ,X )nat

equalsT (ΣNat+Tests ,X ) : (T (ΣNat+Tests ,X )nat)2 → T (ΣNat+Tests ,X )bool

less thanT (ΣNat+Tests ,X ) : (T (ΣNat+Tests ,X )nat)2 → T (ΣNat+Tests ,X )bool

notT (ΣNat+Tests ,X ) : T (ΣNat+Tests ,X )bool → T (ΣNat+Tests ,X )bool

andT (ΣNat+Tests ,X ) : (T (ΣNat+Tests ,X )bool)2 → T (ΣNat+Tests ,X )bool

orT (ΣNat+Tests ,X ) : (T (ΣNat+Tests ,X )bool)2 → T (ΣNat+Tests ,X )bool

8.7 Homomorphisms and Terms

8.7.1 Structural Induction, Term Evaluation and Homomorphisms

The data type of Σ -terms is modelled by the term algebra T (Σ ,X ). We now reconsiderstructural induction on terms. Speaking roughly, we will show:

Theorem Let φ : T (Σ ,X )→ A be any function. The following are equivalent:

(i) φ is definable by structural induction on T (Σ ,X );

(ii) φ is a term evaluation map on T (Σ ,X ); and

(iii) φ is a homomorphism on T (Σ ,X ).

Proof First, we show (i) implies ii). Recall the definition of structural induction in Section ??.Let T (Σ ,X ) be the algebra of Σ -terms. Let A be a set. We say the function

φ : T (Σ ,X )→ A

is defined by structural induction or recursion from elements

. . . , cA, . . . ,

operations. . . , f A, . . .

8.7. HOMOMORPHISMS AND TERMS 289

on A and assignments. . . , xA, . . . ,

if

φ(c) = cA

φ(x ) = xA

φ(f (t1 , . . . , tn)) = f A(φ(t1 ), . . . , φ(tn)).

This recursive definition is a form of term evaluation (see Section 8.4). To see this, let

v : X → A

be the assignmentv(x ) = xA

and, using the equations for term evaluation

v : T (Σ ,X )→ A

we get the same equations as in the definition of φ, i.e., v = φ. Thus (i) implies (ii).Next, to see that (ii) implies (iii), let us note that the important equations that are used in

the definition of the extensionv : T (Σ ,X )→ A

ofv : X → A

in term evaluation can be rewritten so as to involve formally the algebraic structure of thealgebra of terms T (Σ ,X ), as follows:

v(cT (Σ ,X )) = v(c)

= cA

v(x ) = v(x )

= v(x )

v(f T (Σ ,X )(t1 , . . . , tn)) = v(f (t1 , . . . , tn))

= f A(v(t1 ), . . . , v(tn))

Similarly, the elements . . . , cA, . . . and functions . . . , f A, . . . make A into a Σ -algebra. Wesee that the map v preserves a relationship between the constants and operations on the term al-gebra T (Σ ,X ) and on the algebra A. This structure preserving mapping is a Σ -homomorphismfrom T (Σ ,X ) to A. 2

Thus, term evaluation provides another example of a homomorphism between algebras.

Lemma (Term Evaluation is a Homomorphism) Let A be a Σ -algebra. The term evalu-ation function

v : T (Σ ,X )→ A

is a Σ -homomorphism.


To complete the cycle of equivalences, we must check that the homomorphism equationsimply the structural induction equations and hence that (iii) implies (i). We leave this as aneasy exercise.

8.7.2 Initiality

There are, in fact, some more properties hidden in this observation — properties that turnout to have profound consequences in the theory of syntax and semantics. We reformulate thelemma very carefully.

Theorem (Initiality of Term Algebras) Let A be any Σ -algebra. Let v : X → A beany map assigning values in A to the variables in X . Then there is a one, and only one,Σ -homomorphism

v : T (Σ ,X )→ A

which extends v from X to T (Σ ,X ).

Proof The existence of a Σ -homomorphism is clear from the equations for term evaluation.We have to prove that there is only one Σ -homomorphism, i.e., that v is unique.

Suppose that φ, ψ : T (Σ ,X )→ A are two Σ -homomorphisms that both extend the assign-ment map v : X → A. This means that for every variable x ∈ X ,

φ(x ) = ψ(x ) = v(x ).

We prove that for every Σ -term t ∈ T (Σ ,X ),

φ(t) = ψ(t) = v(t).

We do this by structural induction on terms.

Basis

(i) Constants, t ≡ c. Here, since φ and ψ are Σ -homomorphisms, they preserve constants:

φ(c) = cA and ψ(c) = cA

and φ(t) = ψ(t).

(ii) Variables, t ≡ x . Here, since φ and ψ extend v , we have:

φ(x ) = v(x ) and ψ(x ) = v(x )

and φ(t) = ψ(t).

Induction Step In the general case, we consider any Σ -term, say t ≡ f (t1 , . . . , tn) wheret1 , . . . , tn are Σ -sub-terms. The Induction Hypothesis is that:

φ(t1 ) = ψ(t1 ), . . . , φ(tn) = ψ(tn).


We calculate:

φ(t) = φ(f (t1 , . . . , tn))

= f A(φ(t1 ), . . . , φ(tn)) since φ is a Σ -homomorphism;

= f A(ψ(t1 ), . . . , ψ(tn)) since φ = ψ on sub-terms by Induction Hypothesis;

= ψ(f (t1 , . . . , tn)) since ψ is a Σ -homomorphism;

= ψ(t).

By the Principle of Structural Induction, we have that φ(t) = ψ(t) for all t ∈ T (Σ ,X ). Sinceany two maps φ and ψ extending v are equal,

φ = ψ = v .

2

8.7.3 Representing Algebras using Terms

Let us review more closely the role of the term evaluation homomorphism.Given an assignment

v : X → A

of data from Σ -algebra A to variables from X , each term t ∈ T (Σ ,X ) can be evaluated asv(t) ∈ A. The term t defines a sequence of operations from Σ that applied in the Σ -algebraA, constructs the element v(t) ∈ A; we say that

the term t constructs or represents the element v(t), given the assignment v.

Definition (Generators) The assignment v : X → A generates the Σ -algebra A if, for alla ∈ A, there is some t ∈ T (Σ ,X ) such that,

v(t) = a.

Equivalently, v generates A if the term evaluation Σ -homomorphism is surjective.

Two terms t1 , t2 ∈ T (Σ ,X ) can represent different constructions of the same element ifv(t1 ) = v(t2 ) in A.

Recall the kernel of the mapping v is defined for terms t1 and t2 by

t1 ≡v t2 if, and only, if v(t1 ) = v(t2 )

and is an equivalence relation on terms. Let T (Σ ,X )/ ≡v be the set of all equivalence classesof terms under ≡v .

Since v is a Σ -homomorphism, we know that if

t1 , . . . , tn represent v(t1 ), . . . , v(tn)

thenf (t1 , . . . , tn) represents v(f (t1 , . . . , tn))

because of the Σ -homomorphism equation for the operation symbol f ∈ Σ :

v(f (t1 , . . . , tn)) = f A(v(t1 ), . . . , v(tn)).

Since v is a homomorphism, the kernel is a Σ -congruence and T (Σ ,X )/ ≡v is a factor Σ -algebra.


Theorem Let A be any Σ -algebra. Let X be a set of variables and v : X → A an assignment.Suppose that v generates A. Then A is Σ -isomorphic to a factor algebra T (Σ ,X )/ ≡v of theterm algebra T (Σ ,X ). In particular, the congruence is the kernel ≡v of the term evaluationhomomorphism, i.e.,

A ∼= T (Σ ,X )/ ≡v

where for t1 , t2 ∈ T (Σ ,X ),

t1 ≡v t2 if, and only, if v(t1 ) = v(t2 ).

Proof This follows immediately from an application of the Homomorphism Theorem in Sec-tion 7.9. 2



1. Here is a signature Σ Integers1 using a prefix notation for addition and subtraction of inte-gers.

signature Integers1

sorts int

constants zero, one : → int

operations add : int × int → intminus : int → int

endsig

Show that every integer can be specified by a closed term. Is the term unique?

2. Here is a signature Σ Integers2 for the integers that uses the standard infix notation foraddition, subtraction and multiplication:

signature Integers2

sorts int

constants 0 , 1 : → int

operations + : int × int → int− : int → int. : int × int → int

endsig

Are all terms equivalent to a polynomial of the form

an .xn + an−1 .x

n−1 + · · ·+ a1 .x + a0

where for each 1 ≤ i ≤ n, ai ∈ Z?

3. Show that given any two polynomials p(x ) and q(x ) with rational coefficients the equation

p(x ) = q(x )

is a ΣReals equation. In what circumstances is this equation equivalent to

p(x )− q(x ) = 0 .

4. Consider this signature ΣMachine that models a simple interface for abstract machines:


signature Machine

sorts state, input , output

constants

operations next : state × input → statewrite : state × input → output

endsig

Write down terms that define the following:

(a) the state, and

(b) the output

of the machine after receiving one, two and three inputs, denoted by variables x1 , x2 , x3 ,respectively.

Show how to list all the terms of sort state, input , output .

5. Write down terms over signatures for the following data structures:

(a) the array;

(b) the stack.

6. Define by structural induction on terms the size function

Size : T (Σ ,X )→ N

for terms.

7. Write the signature ΣNaturals of the natural numbers example of Section 8.1.2 in set format.

8. Write the signature ΣNat+Tests of Examples 8.1.3 in set format.

9. List, with reasons, which of the following are terms over the signature Σ Nat+Tests ofExamples 8.1.3, and which are not:

(a) succ;

(b) and(x , not(false);

(c) equals(0 , succ(0 ));

(d) pred(succ(0 ), 0 );

(e) mult(succ(0 ), pred(x (mult(0 , succ(0 )))));

(f ) add(mult(x , x ),mult(mult(succ(succ(0 )), y), x ));

(g) less than(or(false, true), succ(add(x , y)));

(h) or(and(true, b), less than(x ,mult(x , 0 ))).

10. Give a tree representation for each of the terms of Exercise 9.


11. Using the Principle of Structural Induction on single-sorted terms, prove the Unique-ness Lemma of Section 8.2.3, that functions defined by structural induction are unique.(Compare with the Uniqueness Lemma of Section 7.4.3).

12. Let Σ be a single-sorted signature. Show that the map v : T (Σ ,X )→ A is a homomor-phism.

13. Let Σ be an S -sorted signature. Show that the S -indexed family

v = 〈v s : T (Σ ,X )s → As | s ∈ S 〉

of maps is a homomorphism.

14. Consider the closed term algebra T (Σ ) = T (Σ , ∅). What information do we need toknow in this case to define the evaluation map v from the term algebra to an algebra A?

15. Consider the signature ΣRing :

signature Ring

sorts ring

constants 0 : → ring1 : → ring

operations + : ring × ring → ring− : ring × ring → ring. : ring × ring → ring

Let X = x. What is T (Σ ,X )?Let v : X → Z be defined by v(x ) = 0 . Evaluate v(t) where:

(a) t = ((x + (x .x )).x ) + (1 + 1 ); and

(b) t = ((x .x + 1 ) + ((x .x ).x ) + 1 ).

16. Show that for any t ∈ T (Σ , x) there exists t ′ ∈ T (Σ , x) such that

t ′ ≡ anxn + an−1x

n−1 + · · ·+ a1x + a0 .

where x n is the multiplication of x by itself n times, and an is the addition of 1 to itselfan times.

17. Generalise the term substitution function subn to the term evaluation function

ten : T (Σ ,X )× X n × An → A

which substitutes the values a = (a1 , . . . , an) ∈ An for the variables x = (x1 , . . . , xn) int ∈ T (Σ ,X ) and computes the value

ten(t , a, x ) = t(a/x ).

(Warning: beware the case where t contains a variable not in the list x .)


18. Given the constant symbol zero, variables x , y , z and operation symbols:

succ : nat → natsum : nat × nat → nattimes : nat × nat → natmuldiv : nat × nat → nat

where muldiv(x , y , z ) is the function x ∗ y/z :

(a) give four terms in the term algebra using each operation;

(b) give four terms not in the term algebra.

19. Give a valuation map for the above term algebra into an algebra

(N ; 0 ; add one, add ,mult ,md)

define the operations carefully.

20. Let Σ be a signature. A conditional equation over Σ , or simply a conditional Σ -equationis a formula made of Σ -terms, which we write

t1 = t ′1 ∧ . . . ∧ tk = t ′k → t = t ′.

Define the concept of satisfiability and universal validity for conditional equations. Byadapting the proof of the Theorem in Section 8.5.3, show that Σ -homomorphisms preserveconditional Σ -equations.

21. Use your valuation map to evaluate (carefully) the value of the following expressions,given that v(x ) = 5 , v(y) = 10 and v(z ) = 7 :

(a) succ(sum(x , z ));

(b) muldiv(succ(x ), y , add(Succ(0 ), z ));

(c) times(x , add(y , succ(succ(0 ))));

(d) add(succ(0 ), times(succ(0 ), z ))).

Chapter 9

Abstract Data Type of Real Numbers

What are numbers? Numbers are very abstract data. Number systems are data types thatare used to measure quantities and to calculate new quantities from old ones. The naturalnumbers, integers, rational numbers, real numbers and complex numbers are a sequence

N, Z, Q, R, C

of number systems, one extending the next, that either measure more, or calculate better, orboth. This is a rather utilitarian view, and seems too simple to explain the depth of our interestand knowledge of numbers in mathematics. However, it is a good starting point for thinkingabout the origins of number systems, and, therefore, about the origins and foundations of data.

When we gave examples of algebras of real numbers in Section 3.5, we discussed briefly thereal numbers and described it as a data type that models the line allowing us to measure anyline segment. Recall how the rational number system leaves many gaps in measurements.

The rational numbers are a number system designed to model the process of measuringquantities. Operationally, in order to measure, some unit is chosen and subdivided to subunits,corresponding with the whole and fractional parts of rational numbers. It is easy to constructgeometrical figures that cannot be measured exactly. For example, even in measuring linesegments a problem arises: the hypotenuse of a right-angled triangle with unit sides is

√2

units. Or the circumference of a circle of diameter 1 is known to be π. Neither√

2 or π arerational numbers. Thus, these lengths indicate fundamental gaps in the rational number modelof measurements. Specifically, we think of the real numbers as a number system designed toassign a number to measure every length exactly.

In this chapter we will investigate the data type of real numbers in the same manner thatwe investigated the data type of natural numbers. We will begin by considering the problem:

To design and build a data type to measure the line.

As with the natural numbers, the key ideas about building representations of the real numberswere discovered only in the nineteenth century. We will introduce the different methods ofRichard Dedekind and Georg Cantor, both of which construct approximations of real numbersfrom rational numbers.

Dedekind’s and Cantor’s are just two among many representations of the real numbers. Thequestion arises: Are they equivalent? More generally, we ask:

When are two constructions or representations of the real numbers equivalent?

297

298 CHAPTER 9. ABSTRACT DATA TYPE OF REAL NUMBERS

In our mathematical theory of data, this becomes the mathematical question:

When are two algebras of real numbers isomorphic?

We answer this question in stages. First, we prove a beautiful theorem that shows thereis a set of axioms that is sufficient to characterise the real numbers up to isomorphism. Theaxioms define algebras called

complete ordered fields

and the theorem says that

all complete ordered fields are isomorphic.

The axioms define the properties of the basic algebraic operations + and . and the order relation< on the real numbers.

Next, we will give the full construction of the data type of real numbers using the methodof Cantor, who represented a real number by an infinite process of approximate measurementsby rational numbers. This is an involved construction involving infinite sequences of rationalnumbers. We prove that the algebra of Cantor real numbers satisfies the axioms of a completeordered field. Hence, we can conclude that

all algebras satisfying the axioms of a complete ordered field are isomorphic to thealgebra of Cantor real numbers.

In Sections 9.1 and 9.2, we explain these ideas in some detail, before devoting ourselvesto the proofs of the theorems in Sections 9.3, 9.4 and 9.5. Finally, in Section 9.6, we look atcomputing with the real numbers using so called “exact” representations and the familiar, butflawed, fixed and floating point representations.

In this chapter we will barely scratch the surface of the theory of the real numbers. There aremany excellent books available; for example, Burrill [1967] gives a full and excellent mathemat-ical account of the naturals, rationals and reals, which we have largely followed in Sections 9.3,and 9.4 of this chapter. For details and related results see Further Reading.

9.1 Representations of the Real Numbers

The development of a number system that meets the requirements of measuring exactly theline or continuum has proved to be a long and complex process. Our understanding of thereal numbers has progressed slowly, influenced by the needs of mathematics and scientificcalculations. Indeed, we can follow its development in some detail over 2500 years of history.Establishing theoretical foundations for mathematical developments such as the calculus haveinvolved sorting out a number of subtle, conflicting and inconsistent ideas about the reals. Forexample, the development of the calculus introduced infinitesimals and raised the fundamentalquestion:

To what extent are reals finite quantities, or can be represented by finite objects?

9.1. REPRESENTATIONS OF THE REAL NUMBERS 299

The development of mathematics and its applications in physics has taken place with goodtechniques and intentions, but poor foundations and understanding. Only in the middle of thenineteenth century did mathematics achieve the level of precision, rigour and abstractness thatis necessary to be able to settle the basic questions about the reals and other numerical datatypes. This conceptual maturity is exactly what is needed to understand data in computing.The concept of the continuum has been fundamental for millennia and is immensely rich. Re-search on the foundations of the continuum continues: for example, on models for computationwith the reals, and on the uses of infinitesimals (see Further Reading).

We will discuss some of the methods of constructing representations of the reals from therationals. As we will see in the next section, these constructions, representations or implemen-tation methods for the real numbers can be proved to be equivalent. Let us review the problemin detail.

9.1.1 The Problem

The line or continuum is not adequately represented by the data type of rational numbers:there are gaps such as

√2 and

√3; indeed, there are infinitely many gaps and there are several

ways of formulating and proving properties that show that

most points on the line cannot be measured exactly by rational numbers.

The problem is:

To create new numbers which will allow us to represent faithfully the line, and whichconstitute a number system that measures and calculates exactly with all points anddistances.

The required number system R is called the real number system or the data type of realnumbers.

The rational numbers Q form the data type we use to make measurements. To create the realnumbers R we will use rational numbers that measure the line approximately. The real numberswill be constructed from the rationals, just as the rationals were constructed from the integers,and the integers from the natural numbers. The reals must have certain algebraic properties.In fact the number system R must support a great deal more, namely many centuries worth ofconcepts, methods and theorems in geometry and analysis! Let us summarise the requirementsfor the data type as follows:


Informal Algebraic Requirements

1. R will be an extension of the rational number system Q.

2. R will have algebraic operations, including addition, subtraction, multiplication anddivision for calculation.

Informal Geometric Requirements

3. R will have an ordering relation < that corresponds with the ordering of the line.

4. Every element of R can be approximated by an element of Q.

5. The number system R must represent the line completely.

Analytical Requirements

6. The number system R must be able to represent geometric objects and methods in afaithful way, through the theory of coordinate geometry based on algebraic equations ofcurves and surfaces.

7. The number system R must be able to represent analytic objects and methods in afaithful way, through the theory of the differential and integral calculus.

Obviously, it is an important task to build this data type R and a huge enterprise to check onall these requirements, some of which are not precisely defined, such as 6 and 7. We will merelysketch two methods of data representation, due to Richard Dedekind and Georg Cantor, andcomment on why they meet Requirements 1–5 At first sight Dedekind’s method seems moreabstract, but it is based directly on a geometric intuition. It was discovered in 1858 andpublished rather later, in 1872.

9.1.2 Method of Richard Dedekind (1858)

The idea that a line is full of points and that there are no gaps caused by missing points, is anaxiom of geometry. The analysis of Dedekind focuses on a precise geometric characterisationof this idea which is called the continuity or completeness of the line. The axiom by which heattributed to the line its continuity is this:

Axiom (Dedekind’s Continuity Axiom) If all points of the straight line fall into two classessuch that every point of the first class lies to the left of every point of the second class, thenthere exists one and only one point which produces this division of all points into two classes,this severing of the straight line into two portions.

The property is illustrated in Figure 9.1.The idea is adapted to build a representation of a point using two sets of rational numbers.

Definition A cut is a pair (A1 ,A2 ) of sets of rational numbers such that

(i) A1 ∪ A2 = Q; and


A1 A2

•)[

A1 A2

](•

Figure 9.1: Dedekind’s continuity axiom; A1 is the first class and A2 the second class.

(ii) for any a1 ∈ A1 and a2 ∈ A2 , a1 < a2 .

Let DCut be the set of all cuts.A cut consists of two sets of rational numbers, and so models the division of the line into

two classes of measurable points. Dedekind’s axiom leads us to propose that:

Each cut represents a point on the line.

In particular,

a point on the line can be modelled by the sets of all measurable points to its leftand right.

Let RDedekind be the set of all points represented by DCut .First, let us look at some simple examples of cuts.

Example (Rational Cuts) Notice that any rational number pq

can define a cut in two simpleways: a right cut

A1 = a ∈ Q | a < p

q and A2 = a ∈ Q | a ≥ p

q

where the rational p

qis the least among the second class A2 , or a left cut

A1 = a ∈ Q | a ≤ p

q and A2 = a ∈ Q | a > p

q

where the rational p

qis the greatest among the first class A1 . For example, consider the cuts

for a commonly used rational approximation to π:

A1 = a ∈ Q | a < 227 and A2 = a ∈ Q | a ≥ 22

7;

here 227

is the least number in A2 . Or,

A1 = a ∈ Q | a ≤ 227 and A2 = a ∈ Q | a > 22

7;

here 227

is the greatest number in A1 . Clearly, 227

has two different cuts representing it.

Example (Irrational Cuts) Of course, the purpose of cuts is to define numbers that are notrational. Here are cuts that are not determined by rational points:

A1 = a ∈ Q | a2 < 2 and A2 = a ∈ Q | a2 ≥ 2


orA1 = a ∈ Q | a2 ≤ 2 and A2 = a ∈ Q | a2 > 2

There does not exist a unique rational number which is either a least upper bound or a greatestlower bound for these sets. Those bounds are

√2 which is not an element of either A1 or A2 .

The idea is that these cuts represent√

2.

However, the cut itself is an object that represents either the greatest among the lowerbounds for A1 or least among the upper bounds for A2 .

With reference to the requirements, we have seen that the cuts are made from rationals andthat the rational numbers Q can be embedded into DCut . To proceed we must define:

(i) when two cuts are equivalent representations of the same point;

(ii) how to add, subtract, multiply and divide cuts; and

(iii) how to order two cuts.

We will omit this discussion and turn instead to the second method of representing real numbers;for further material on cuts see the exercises and Further Reading.

9.1.3 Method of Georg Cantor (1872)

Imagine a process that can measure approximately a point with increasing and unlimited accu-racy. The process generates an infinite sequence of points that can be measured by the rationalnumbers

a1 , a2 , . . . , an , . . . , am , . . . ,∈ Q.

At each stage n, it is possible to improve on an ∈ Q at a later stage m > n with a closermeasurement am ∈ Q.

The idea of Cantor’s method is to use an infinite sequence of rationals, to approximate apoint to an arbitrary degree of accuracy. A point will be represented by an infinite sequence

a1 , a2 , . . . , an , . . .

of rational numbers that get closer and closer as the sequence grows. This means that thedifference

an − am

becomes increasingly small as n and m become increasingly large. The type of sequences Cantorused are now known as Cauchy sequences. The real numbers will be represented using thesesequences.

Definition (Cauchy Sequence) A sequence of rationals a1 , a2 , . . . , an , . . . is a Cauchy se-quence of rationals if for any rational number ε > 0 , there is a natural number n0 , such thatfor all n,m > n0 , we have

|an − am | < ε.

Let CSeq be the set of all Cauchy sequences of rationals.


Each Cauchy sequence is a measuring process that represents a point on the line. LetRCantor be the set of points represented by CSeq . With each Cauchy sequence a1 , a2 , . . . theapproximation process is this: given any measure of accuracy ε > 0 , however small, it is possibleto choose a stage N in the sequence such that for all later stages n,m > N , the accuracy iswithin ε, i.e.,

|an − am | < ε.

9.1.4 Examples

(1) Now each rational number pq

measures itself. Define a sequence

ai = p

q

for i = 1 , 2 , . . .. Clearly, for any accuracy ε > 0 and N > 1 , we have for all n,m > N ,

|an − am | = |pq −p

q|

= 0

< ε.

(2) The idea of a Cauchy sequence is implicit in the standard decimal notation for realnumbers.

Lemma The decimal representation of a number defines a Cauchy sequence of real num-bers.

Proof Letd .d1d2 . . . dndn+1 . . .

be any decimal representation, where

d ∈ Z and dn ∈ 0 , 1 , . . . , 9 for n = 1 , 2 , . . ..

Note that d is the “whole number part” before the decimal point, and the dn form the“decimal part” after the decimal point.

Define a new sequence a = a1 , a2 , . . . , an , . . . of rational numbers by

a1 = d .d1

a2 = d .d1d2

...

an = d .d1d2 . . . dn

...

These finite decimals are rational numbers, e.g.,

an = d +d1

10+

d2

10 2+ · · ·+ dn

10 n .

We show that a is a Cauchy sequence.


Given any ε > 0 , choose N such that

10−N = 0 .00 · · · 01 N decimal places

< ε.

Now, for any m, n > N , if (say) n > m, then

|an − am | = 0 .00 · · · 0dm+1 . . . dn

< 0 .00 · · · 01 N decimal places

< 10−N

< ε.

Thus, a is a Cauchy sequence. 2

Unlike the case of Dedekind cuts, the set CSeq contains infinitely many representations forthe same point or real. To proceed we must define:

(i) when two Cauchy sequences are equivalent representations of the same point;

(ii) how to add, subtract, multiply and divide Cauchy sequences; and

(iii) how to order two Cauchy sequences.

These are not straightforward. We will explain each relation and operation in turn inSection 9.4.

9.2 The Real Numbers as an Abstract Data Type

A real number can be defined or represented in several ways. Standard methods, such as thoseof Cantor, Dedekind, or of infinite decimals, nested intervals and continuous fractions, are basedon approximating real numbers to any degree of accuracy using rational numbers. The rationalnumbers and integers also have a number of representation methods. Thus, we are led to thequestion

When are two representations of the real numbers equivalent?

In this section we will apply the line of thought concerning the naturals as an abstract datatype, expressed in Section 7.5, to the real numbers. We expect to answer the question using

(i) algebras to model concrete representations of the reals, such as

ACantor and ADedekind

and

(ii) isomorphisms between algebras to model the equivalence of concrete representations ofthe reals, such as

φ : ACantor → ADedekind .

9.2. THE REAL NUMBERS AS AN ABSTRACT DATA TYPE 305

Furthermore, we will explain how the real numbers can be characterised axiomatically interms of properties of their operations and ordering.

The reals are definable uniquely (up to isomorphism) as any algebra satisfying the axiomsof a

complete ordered field,

a concept we will explain in this section. This means the following:

All the commonly used properties of the reals can be proved from the axioms of acomplete ordered field; and so these common properties will hold in any algebra sat-isfying the axioms.All complete ordered fields are isomorphic.The standard constructions of the reals by Cantor’s method, and so on, form com-plete ordered fields.

We will give a detailed account, with full proofs, of these facts.

9.2.1 Real Numbers as a Field

Let us begin our explanation of these concepts and theorems by considering the basic operationson reals of addition, subtraction, multiplication, and division. We will specify these operationsusing axioms, as we did in Chapter 5. First, we may name these operations in the signature:

signature Field

sorts field

constants 0 , 1 : → field

operations + : field × field → field− : field → field. : field × field → field−1 : field → field

Among the wide range of algebras with this signature, we wish to classify those Σ Field algebrasthat have precisely the algebraic properties of the real numbers. The axioms of a field specifysome very simple and familiar properties of the operations on reals and are given below.


axioms Field

Associativity of addition (∀x )(∀y)(∀z )[(x + y) + z = x + (y + z )]

Identity for addition (∀x )[x+ 0 = x ]

Inverse for addition (∀x )[x+ (−x ) = 0 ]

Commutativity of addition (∀x )(∀y)[x+ y = y + x ]

Associativity of multiplication (∀x )(∀y)(∀z )[(x .y).z = x .(y .z )]

Identity for multiplication (∀x )[x .1 = x ]

Inverse for multiplication (∀x )[x 6= 0 ⇒ x .(x−1 ) = 1 ]

Commutativity of multiplication (∀x )(∀y)[x .y = y .x ]

Distribution (∀x )(∀y)(∀z )[x .(y + z ) = x .y + x .z ]

Distinctness 0 6= 1

Definition (Field) Let A be any ΣField algebra satisfying the axioms of a field. Then A issaid to be a field.

Let TField be the set of these ten axioms and let

Alg(ΣField ,TField)

be the class of all fields.Remember that there are many algebras that satisfy these axioms including infinite algebras

such as rational numbers, real numbers and complex numbers, and finite algebras such asmodulo p arithmetic Zp where p is a prime number.

9.2.2 Real Numbers as an Ordered Field

To the operations of a field we add the ordering relation. To do this we will create a newsignature by combining the signatures of the field and the Booleans and adding a Booleanvalued operation to test elements in the ordering.

signature Ordered Field

import Field ,Booleans

sorts

constants

operations ≥: field × field → Bool

To the axioms of a field, we add the axioms that express the properties of the Booleansand, in particular, axioms that express the properties of the ordering. The operation x ≥ y isspecified by these six axioms.


axioms Ordered Field

Reflexivity (∀x )[x ≥ x ]

Antisymmetry (∀x )(∀y)[x ≥ y and y ≥ x ⇒ x = y ]

Transitivity (∀x )(∀y)(∀z )[x ≥ y and y ≥ z ⇒ x ≥ z ]

Total Order (∀x )[x ≥ 0 or x = 0 or − x ≥ 0 ]

Addition (∀x )(∀y)(∀x ′)(∀y ′)[x ≥ y and x ′ ≥ y ′ ⇒ x + x ′ ≥ y + y ′]

Multiplication (∀x )(∀y)(∀z )[x ≥ y and z ≥ 0 ⇒ z .x ≥ z .y ]

Definition We derive from ≤ some new relations as follows:

(i) the function ≤: field × field → Bool which we define for all x , y by

x ≤ y if, and only if, y ≥ x ;

(ii) the function >: field × field → Bool which we define for all x , y by

x > y if, and only if, x ≥ y and x 6= y ;

(iii) the function <: field × field → Bool which we define for all x , y by

x < y if, and only if, x > y .

An element x of an ordered field is non-negative if x ≥ 0 . An element x of an ordered fieldis positive if x > 0 .

The connection between the positive elements and the ordering ≥ is simply:

Lemma (1) For all x , y,

(i) x ≥ y ⇔ x − y ≥ 0 ;

(ii) x > y ⇔ x − y > 0 .

Proof It is easy to deduce these properties from the axioms of an ordered field. Consider theimplication ⇒ of (i). Suppose x ≥ y . Then if we add −y to both sides, we know from theaddition axiom for ordered fields that

x + (−y) ≥ y + (−y).

By the additive inverse axiom for fields, we have

x − y ≥ 0 .

The reverse implication ⇐ of (i) is a similar argument (in which y is added to x − y ≥ 0 ).Statement (ii) follows from (i) thus:

x > y ⇔ x ≥ y and x 6= y by definition of >;

⇔ x − y ≥ 0 and x 6= y by statement (i);

⇔ x − y > 0 by definition of >.

2


Definition (Ordered Field with Booleans) Let A be any ΣOrdered Field algebra satisfyingthe axioms of a field and an ordered field. Then A is said to be an ordered field with Booleans.

Let TOrdered Field be the set of these 16 field and ordering axioms and

Alg(ΣOrdered Field ,TOrdered Field)

the class of all ordered fields.Again, there are many algebras that satisfy these axioms including the rational numbers

and real numbers. However, the complex numbers do not form an ordered field! To prove thisfact, we must get better acquainted with the ordering axioms by deducing some properties fromits axioms.

Lemma (3) Let A be an ordered field with Booleans. In FA we have

(i) 1 > 0 ;

(ii) n1 = 1 + 1 + · · ·+ 1 (n times) > 0 ;

(iii) for all x ∈ FA, if x 6= 0 then x 2 > 0 .

Proof (i) Suppose, for a contradiction, that 1 < 0 . Then, by the total order axiom, andsince 1 6= 0 , we have

−1 > 0 .

By the multiplication axiom,(−1 ).(−1 ) > 0

But, by a lemma in Section 5.3.2

(−1 ).(−1 ) = 1 .1 .

So

1 = 1 .1

> 0 ,

which contradicts our assumption. Hence, in fact, 1 > 0 .

(ii) Applying the addition axiom to 1 > 0 , we deduce immediately that

1 + 1 > 0 , 1 + 1 + 1 > 0 , . . . , 1 + 1 + · · ·+ 1 > 0

and hence for any n ∈ Z,n1 > 0 .

(iii) Let x ∈ FA and x 6= 0 . If x > 0 then x 2 > 0 by the multiplication axiom. If x < 0 then−x > 0 by the total ordering axiom and

(−x ).(−x ) > 0

by the multiplication axiom. However,

x 2 = (x ).(x )


by the lemma in Section 5.3.2, and so if x < 0 we also have

x 2 > 0.

2

This lemma enables us to prove the following:

Theorem There does not exist an ordering on the complex numbers C that allows it to be anordered field.

Proof The complex numbers form a field containing the number i such that

i2 = −1 .

Suppose, for a contradiction, there exists an ordering ≤C on C that satisfies the ordered fieldaxioms.

By the Soundness of Deduction Principle (in Section 5.1.2), the properties of ordered fieldsproved in the above lemma are true of ≤C. In particular, we will examine the facts that

(i) 1 C >C 0C;

(ii) for all z ∈ C, z 6= 0 C,z >C 0C implies z 2 >C 0C.

Let us ask: Is i >C 0C or 0C >C i?If i >C 0C then by property (iii),

i2 >C 0C.

But i2 = −1 , and so−1 >C 0C and 0 C <C 1C,

which contradicts property (i).Thus, we have shown i is neither positive nor negative and no such ordering <C can exist.

2

9.2.3 Completeness of the ordering

The ordering plays a crucial role in thinking about the continuum. Dedekind’s continuity axiomin Section 9.1.2 is based on ordering.

Definition (Lower Bound and Greatest Lower Bound) Let F be an ordered field andlet S be a subset of F .

(i) The element b ∈ F is a lower bound for S if, for all s ∈ S , b ≤ s .

(ii) The element b ∈ F is a greatest lower bound for S if,

(a) b is a lower bound; and

(b) there is no other lower bound c ∈ F for S , such that b < c.

The condition (b) is equivalent to the following:


(b ′) for all c ∈ F , if for all s ∈ S , c < s , then c ≤ b.

We write glb(S ) for the greatest lower bound of S , if it exists. Another commonly usedterm for the greatest lower bound is infimum, which we denote inf (S ).

Example Here is an example of a set

S = 1 .5 , 1 .42 , 1 .415 , 1 .4143 , 1 .41422 , . . .

of rational numbers that has lower bounds in the rationals Q but no greatest lower bound.However, if S is a subset of the real numbers, then we have

√2 = glb1 .5 , 1 .42 , 1 .415 , 1 .4143 , 1 .41422 , . . ..

Definition (Upper Bound and Least Upper Bound) Let F be an ordered field and let Sbe a subset of F .

(i) The element b ∈ F is an upper bound for S if, for all s ∈ S , b ≥ s .

(ii) The element b ∈ F is a least upper bound for S if,

(a) b is an upper bound; and

(b) there is no other upper bound c ∈ F for S , such that c < b.

The condition (b) is equivalent to the following:

(b ′) for all c ∈ F , if for all s ∈ S , c > s , then c ≥ b.

We write lub(S ) for the least upper bound of S , if it exists. Another commonly used termfor the least upper bound is supremum, which we denote sup(S ).

Example The setS = 1 .4 , 1 .41 , 1 .414 , 1 .4142 , 1 .41421 , . . .

of rational numbers has upper bounds in the rationals Q but no least upper bound. However,if S is a subset of the real numbers, then we have

√2 = lub1 .4 , 1 .41 , 1 .414 , 1 .4142 , 1 .41421 , . . ..

Definition (Completeness) An ordered field F is said to be complete if every non-emptysubset S that has a lower bound also possesses a greatest lower bound.

Clearly, from the examples of√

2 , neither the lower bound, nor the upper bound, need bein S .

This condition is a little complicated to write out as an axiom in a specification. First, wemake some formulae for the ideas of lower bound and greatest lower bound.

That x is a lower bound for a set S is defined by the formula

lower bound(x , S ) ≡ (∀s)[s ∈ S ⇒ x ≤ s ]

That x is a greatest lower bound for a set S is defined by the formula

greatest lower bound(x , S ) ≡ lower bound(x , S ) and (∀c)[lower bound(c, S )⇒ c ≤ x ].

Here is the specification:

9.3. UNIQUENESS OF THE REAL NUMBERS 311

axioms Completeness

Completeness (∀S )[∅ 6= S ⊆ F and (∃c)[lower bound(c, S )]⇒ (∃b)[greatest lower bound(b, S )]]

Definition (Complete Ordered Field with Booleans) Let A be any ΣOrdered Field algebrasatisfying the axioms of a field, an ordered field and a complete ordered field. Then A is saidto be a complete ordered field with Booleans.

LetTComplete Ordered Field = TField ∪ TOrdered Field ∪ TCompleteness

be the set of these seventeen axioms and

Alg(ΣOrdered Field ,EComplete Ordered Field )

be the class of all complete ordered fields.

9.3 Uniqueness of the Real Numbers

The axioms of a complete ordered field constitute an axiomatic specification of real numbers.Indeed, the properties of being a complete ordered field characterise the real numbers in thefollowing way:

Theorem (Uniqueness) Let A and B be any algebras of signature ΣOrdered Field satisfying theaxioms of a complete ordered field. Then A ∼= B.

Later, we will use this Uniqueness Theorem to prove the following:

Theorem All ΣOrdered Field algebras satisfying the axioms of a complete ordered field are iso-morphic with ACantor

Any algebra that is a complete ordered field is a representation of the real numbers.

9.3.1 Preparations and Overview of Proof of the Uniqueness of theReals

Consider, in detail, the concept of a ΣOrdered Field -isomorphism. Applying the general definitionof a homomorphism, given for arbitrary algebras in the previous chapter, to Σ Ordered Field -algebras results in the following concept.

Definition (Ordered Field Homomorphism) Let A and B be any ΣOrdered Field -algebras.Let the carriers of sort field be FA and FB , respectively.

A ΣOrdered Field -homomorphism is a pair

φ = (φfield , φBoolean)

of mappingsφfield : FA → FB

and the identity mapφBool = idBool : B→ B

that preserves the constants and operations of ΣOrdered Field as follows: for all a, b ∈ F ,


(i) Constants φfield(0A) = 0B and φfield(1A) = 1B

(ii) Addition φfield(a + b) = φfield(a) + φfield(b)

(iii) Additive Inverse φfield(−a) = −φfield(a)

(iv) Multiplication φfield(a.b) = φfield(a).φfield(b)

(v) Multiplicative Inverse φfield(a−1 ) = (φfield(a))−1

(vi) Order a < b implies φfield(a) < φfield(b)

Lemma (Injectivity of Order Preserving Maps) Let φ : A→ B be order-preserving, i.e.,

a < b implies φ(a) < φ(b).

Then φ is injective.

Proof If a 6= b then either a < b or b < a, since < is a total order. Suppose a < b. Thenφ(a) < φ(b), since φ is order-preserving and φ(a) 6= φ(b). Similarly, if b < a we can deducethat φ(a) 6= φ(b).

Hence, if a 6= b then φ(a) 6= φ(b). Thus, φ is injective. 2

The standard models of the real numbers are built from the rational numbers in differentways. In the proof, we will show that any algebra that satisfies the axioms has a similarstructural property, and that this enables us to construct an isomorphism between any twoalgebras that satisfies the axioms. The proof is in three stages, as follows.

Let A and B be any ΣOrdered Field -algebras satisfying the axioms.

Stage 1: Recovering the Rationals in A and B

Using the field Q of rational numbers as our model, we construct subalgebras

ARat and BRat

of A and B using the ΣOrdered Field operations applied to identity elements 1A and 1B , of A andB respectively. Then we define a map

φ : ARat → BRat

and prove that φ is an ordered field isomorphism between these subalgebras, which are subfieldsof A and B .

Stage 2: Approximating A and B by ARat and BRat

Using the Completeness Property, we show that the elements of A can be approximated by theelements of ARat ; and similarly for B and BRat .


Stage 3: Constructing the Isomorphism between A and B

Using this approximation property of Stage 2, and the isomorphism φ : ARat → BRat , we definea map

Φ : A→ B

and prove it is an ordered-field isomorphism.In particular, we say that Φ is a lifting of φ from ARat to A in the sense that, for a ∈ ARat ,

Φ(a) = φ(a).

9.3.2 Stage 1: Constructing The Rational Ordered Subfields

Let A be any ΣOrdered Field -algebra satisfying the axioms of an ordered field. We will build aspecial ΣOrdered Field -subalgebra ARat of A that also satisfies the axioms and is called the rationalsubfield of the ordered field A.

Let the carriers of A beFA and B

of sorts field and Bool , respectively. The two carriers of ARat will be denoted

RA ⊆ FA and B

of sorts field and Bool , respectively.Now, ARat is the subalgebra of A built from the identity elements 0A and 1A by repeatedly

applying all the operations to them. For example, starting with 0A and 1A we can build thefollowing elements of RA:

0A, 1A, 1A + 1A, 1A + 1A + 1A, . . .(1A)−1 , (1A + 1A)−1 , (1A + 1A + 1A)−1 , . . .

(1A).(1A + 1A)−1 , (1A + 1A).(1A + 1A)−1 , (1A + 1A + 1A).(1A + 1A)−1 , . . .((1A).(1A + 1A + 1A)−1 ).((1A + 1A).(1A + 1A + 1A)−1 ), . . .

To better understand the subset RA of FA, we introduce some notation that simplifies theseexpressions.

Definition (Formal Sums) Let a ∈ FA and n ∈ Z.

If n > 0 definena = a + a + · · ·+ a (n times).

If n = 0 define0a = 0A.

If n < 0 definena = −((−n)a).

For example,3a = a + a + a


and

−3a = −((−(−3 ))a)

= −(3a)

= −(a + a + a).

Here is a useful set of properties, when we need to calculate using this notation.

Lemma (1) For any a ∈ FA and any integers n,m ∈ Z, we have:

(i) na + ma = (n + m)a;

(ii) n(a.b) = (na).b;

(iii) n(ma) = (nm)a; and

(iv) if n > 0 and a > 0A then na > 0A.

Proof We leave the proof as an exercise. The equations are proved by induction in the casen ≥ 0 , and the field and ordering axioms are used to complete the argument in the case n < 0 .2

In building RA, we use this sum notation na in the case a = 1A.

Definition (Formal Integers) Let 1A ∈ FA be the identity and n ∈ Z. Then the elementn1A ∈ FA is called a formal integer.

Lemma (2) For any a ∈ FA and any integers n,m ∈ Z, we have:

(i) (n1A).(m1A) = (nm)1A; and

(ii) (n1A) = (m1A) if, and only if, n = m.

Proof Equation (i) is proved using (ii) and (iii) of Lemma 1 as follows:

(n1A).(m1A) = n(1A.(m1A))

by Equation (ii);

= n(m1A)

by identity axiom;

= (nm)1A by Equation (iii).

Property (ii) uses Lemma 1(iv) and the axioms. 2

Next, we extend the notation to include division.

Definition (Formal Rationals) Let 1A ∈ FA be the identity and m, n ∈ Z. If n > 0 thendefine

m1A

n1A

= (m1A).(n1A)−1

which exists because n1A 6= 0A in an ordered field.


So

m1A

n1A

=

m times︷︸︸︷

1A + 1A + · · ·+ 1A

1A + 1A + · · ·+ 1A︸︷︷︸

n times

.

These elements we call formal rationals.

The set RA of elements of FA can now be defined by

RA =

m1A

n1A

| m, n ∈ Z and n > 0

.

The requirement that n > 0 is to ensure that the formal rationals have a standard representa-tion.

We first consider order and equality for formal rationals.

Lemma (3) For any m, n, p, q ∈ Z with n > 0 and q > 0 , we have

(i)m1A

n1A

<p1A

q1A

in FA if, and only if, mq < np in Z.

(ii)m1A

n1A

=p1A

q1A

in FA if, and only if, mq = np in Z.

Proof We prove condition (i). By definition,

m1A

n1A

<p1A

q1A

⇔ (m1A).(n1A)−1 < (p1A).(q1A)−1 .

Since n1A > 0A and q1A > 0A, we can multiply both sides by both elements, to get

⇔ (m1A).(q1A) < (p1A).(n1A).

By Lemma 2(i),⇔ (mq)1A < (np)1A.

Rearranging, by Lemma 2 in Section 9.3.2

⇔ 0A < (np)1A − (mq)1A.

By Lemma 1(i),

⇔ 0A < (np −mq)1A

⇔ 0 < (np −mq)

⇔ mq < np.

2

It is easy to deduce from Lemma 3 that

Lemma (4) For any m, n ∈ Z with n > 0 we have


(i)m1A

n1A

= 0A if, and only if, m = 0 .

(ii)m1A

n1A

= 1A if, and only if, m = n.

Proof Consider (i). Take p = 0 , q = 1 in Lemma 3(ii), we have

m1A

n1A

= 0A ⇔ m.1 = n.0

⇔ m = 0 .

Consider (ii). Take p = 1 , q = 1 in Lemma 3(ii), we have

m1A

n1A

= 1A ⇔ m.1 = n.1

⇔ m = n.

2

Next, we consider the operations on formal rationals.

Lemma (5) For any m, n, p, q ∈ Z with n > 0 and q > 0 , we have

(i)m1A

n1A

+p1A

q1A

=(mq + np)1A

nq1A

;

(ii)m1A

n1A

.p1A

q1A

=mp1A

nq1A

;

(iii) −m1A

n1A

=(−m)1A

n1A

;

(iv) If m > 0 then

(m1A

n1A

)−1 =n1A

m1A

;

If m < 0 then

(m1A

n1A

)−1 =(−n)1A

(−m)1A

.

Proof We use the various lemmas and axioms to calculate these identities.

(i) Addition By definition,

m1A

n1A

+p1A

q1A

= (m1A).(n1A)−1 + (p1A).(q1A)−1 .

Now multiplying the first and second terms by

1A = (q1A).(q1A)−1 and 1A = (n1A).(n1A)−1

respectively, and rearranging using the field axioms, we get

= [(m1A).(q1A)].[(n1A).(q1A)]−1 + [(n1A).(p1A)].[(n1A).(q1A)]−1 .


By Lemma 2(i),= [(mq)1A].[(nq)1A]−1 + [(np)1A].].[(nq)1A]−1 .

By distribution law,= [(mq)1A + (np)1A].[(nq)1A]−1 .

By Lemma 1(i),= [(mq + np)1A].[(nq)1A]−1 .

Therefore, by definition,

=(mq + np)1A

(nq)1A

.

(ii) Multiplication By definition,

(m1A

n1A

)(p1A

q1A

) = [(m1A).(n1A)−1 ].[(p1A).(q1A)−1 ].

By commutativity and inverse properties,

= [(m1A).(p1A)].[(n1A).(q1A)]−1 .

By Lemma 2(i),= [(mp)1A].[(nq)1A]−1 .

Therefore, by definition,

=(mp)1A

(nq)1A

.

(iii) Additive inverse. By definition,

(m1A

n1A

) + (−m1A

n1A

) =m1A

n1A

+(−m)1A

n1A

By case (i) of this Lemma,

=(mn − nm)1A

nn1A

=01A

nn1A

By Lemma 4(i),= 0A.

(iv) The two cases for multiplicative inverse we leave as exercises.

2

Theorem (1) Let A be any ΣOrdered Field -algebra that is an ordered field with Booleans. ThenARat with carriers RA and B constitutes a ΣOrdered Field -subalgebra of A that is also an orderedfield with Booleans.


Proof First, we show that RA is closed under the operations of ΣOrdered Field . Clearly, B isclosed under the operations and and not .

The constants 0A and 1A are in RA by Lemma 4. RA is closed under the operations of+, .,−,−1 by Lemma 5(i)–(iv), respectively.

Obviously, ARat and B are closed under < since both truth values are in B.Now all the axioms of an ordered field are true of all elements of FA. Thus, they are true

for all elements of the subfield RA of FA. For example, since

x + y = y + x

holds for all of FA, it must hold for all of the subset of RA. 2

Theorem (2) Let A and B be ΣOrdered Field -algebras that are ordered fields with Booleans. LetARat and BRat be the ΣOrdered Field -subalgebras of rational elements that are ordered subfields ofA and B, respectively. Then the map

φ = (φfield, φBool) : ARat → BRat

that is defined on field elements by φfield : RA → RB , where

φfield(m1A

n1A

) =m1B

n1B

and by the identity function φBool = idBool : B→ B, is a ΣOrdered Field -isomorphism.

Proof We have to prove that φ preserves all the operations in ΣOrdered Field and is a bijection.

(i) Addition:

φfield(m1A

n1A

+p1A

q1A

) = φfield((mq + pn)1A

nq1A

) by Lemma 5(i);

=(mq + pn)1B

nq1B

by definition of φfield ;

=m1B

n1B

+p1B

q1B

by Lemma 5(i);

= φfield(m1A

n1A

) + φfield(p1A

q1A

) by definition of φfield .

(ii) Multiplication:

φfield(m1A

n1A

.p1A

q1A

) = φfield(mp1A

nq1A

) by Lemma 5(ii);

=mp1B

nq1B

by definition of φfield ;

=m1B

n1B

.p1B

q1B

by Lemma 5(ii);

= φfield(m1A

n1A

).φfield(p1A

q1A



The constants, and additive and multiplicative inverses, are easy exercises.Consider the order relation. By Lemma 3,

m1A

n1A

<p1A

q1A

⇔ mq < np applying Lemma 3 to A ;

⇔ m1B

n1B

<p1B

q1B

applying Lemma 3 to B ;

⇔ φfield(m1A

n1A

) < φfield(p1A

q1A


This shows that φfield preserves the ordering.Furthermore, we see that φfield is injective since it is order-preserving (Lemma of Sec-

tion 9.3.1). Clearly, φfield is surjective by the definitions of RB and φfield . Since φBool is theidentity, it is a bijection. Hence, φ = (φfield , φBool) is a ΣOrdered Field -isomorphism. 2

9.3.3 Stage 2: Approximation by the Rational Ordered Subfield

We will now show that all the elements of A can be approximated by the elements of ARat , i.e.,the elements of FA can be approximated by the elements of RA. The approximation propertywe will prove is that

RA is dense in FA,

which means that

for any a, b ∈ FA with a < b, there exists some r ∈ RA such that a < r < b.

Lemma (Archimedes’ Property) Let A satisfy the complete ordered field axioms. Let a, b ∈FA and a > 0A. Then there exists m ∈ Z, m > 0 such that

ma > b.

Proof Suppose, for a contradiction, the property does not hold. Then, there is an a > 0A suchthat for all m ∈ Z, m > 0 ,

ma < b.

Clearly,S = ma | m ∈ Z and m > 0

is bounded above by b ∈ FA. Since A is complete, the set S has a least upper bound c andc ≤ b. For all m > 0 ,

(m + 1 )a ∈ S

and(m + 1 )a ≤ c.

Thus, for all m > 0 ,ma ≤ c − a.

This implies that c − a is also an upper bound of S and, since a > 0A, we have that

c − a < c.

This contradicts that c is the least upper bound, so the property holds. 2


Corollary For all a ∈ FA and n ∈ Z with n > 0 , there exists m ∈ Z such that

a <m1A

n1A

Proof Let b =1 1A

n1A

. By Archimedes’ Property, since b > 0A there exists m ∈ Z such that

a < mb.

Now

mb = m[(1 1A).(n1A)−1 ] by definition of b;

= (m(1 1A)).(n1A)−1 by Lemma 1(ii);

= (m1A).(n1A)−1 by Lemma 1(iii);

=(m1A)

(n1A)by definition.

2

Theorem (Density) Let A be a complete ordered field with Booleans, and ARat be its subfieldof formal rationals with Booleans. For any a, b ∈ FA with a < b, there exists r ∈ RA such that

a < r < b.

Proof Now b − a > 0 so, by Archimedes’ Property, there exists an n > 0 such that

n(b − a) > 1A.

Hence, multiplying by (n1A)−1 , we have that

b − a >11A

n1A

. (∗)

Let

S = m | b ≤ m1A

n1A

.

By the Corollary to Archimedes’ Property, S 6= ∅. Now, again by the Corollary, S is boundedbelow because there is some k such that

−b <k1A

n1A

and, multiplying by −1 , for all m ∈ S ,

(−k)1A

n1A

< b ≤ m1A

n1A

.

By Lemma 3(i) of Section 9.3.2,−k < m


for all m ∈ S . Since S is a set of integers that is bounded below, it has a minimum m0 + 1 .Thus,

m01A

n1A

< b ≤ (m0 + 1 )1A

n1A

since m0 < m0 + 1 and m0 6∈ S . Now

a <m01A

n1A

follows from (∗), for rearranging

b− 11A

n1A

> a

so

(m0 + 1 )1A

n1A

− 11A

n1A

=m01A

n1A

> a.

Thus, the required element r ∈ RA ism01A

n1A

. 2

Corollary For any a ∈ FA, there exists r , r ′ ∈ RA such that

r < a < r ′.

Proof Now a − 1A < a < a + 1A, and applying the Density Theorem twice produces

a − 1A < r < a and a < r ′ < a + 1A,

as required. 2

Definition (Upper and lower cuts) In a complete ordered field A, we can define the lowercut for a ∈ FA,

La = r ∈ RA | r < aand the upper cut for a ∈ FA,

Ua = r ∈ RA | a ≤ r.Theorem (Cut Equivalence) For every a ∈ FA,

a = supLa = inf Ua .

Proof Now La is not empty by the Corollary. Since La is bounded above by a, supLa existsand

supLa ≤ a.

Suppose, for a contradiction, thatsupLa < a.

Then, by the Density Theorem, there exist r ∈ RA such that

supLa < r < a.

This means that r ∈ La but r is bigger than the supremum, which is a contradiction. Thus,

supLa = a.

The argument for a = inf Ua is similar. 2


9.3.4 Stage 3: Constructing the Isomorphism

We know that if A and B are ΣOrdered Field algebras that are complete ordered fields withBooleans, then ARat and BRat are subalgebras of A and B , respectively, and that they are:

(i) ordered fields with Booleans (by Theorem 1);

(ii) ΣOrdered Field -isomorphic under φ : ARat → BRat (by Theorem 2); and

(iii) dense in A and B , respectively (by Density Theorem).

We will prove that φ can be lifted to a ΣOrdered Field -isomorphism Φ : A→ B , i.e., for r ∈ ARat ,Φ(r) = φ(r).

Theorem (4) There is a ΣOrdered Field isomorphism

Φ = (Φ, id) : A→ B .

Proof First, we lift φ to a mapping φ between all subsets of RA and RB . Let P(RA) and P(RB)be the power sets of RA and RB , respectively. For any S ∈ P(RA), we define

φ(S ) = φ(r) | r ∈ S.

Clearly, φ(S ) ∈ P(RB).

Lemmaφ : P(RA)→ P(RB)

is a bijection.

Proof Exercise. 2

Now, for any a ∈ FA, let La and Ua be the lower and upper cuts of a. So by the CutEquivalence Theorem in Section 9.3.3,

a = supLa = inf Ua .

Clearly, every element of φ(La) is less than or equal to every element of φ(Ua), since φ isorder-preserving, and so we can deduce that

sup φ(La) ≤ inf φ(Ua).

Lemmasup φ(La) = inf φ(Ua).

Proof For a contradiction, suppose that

sup φ(La) < inf φ(Ua).

By applying the Density Theorem twice, we know that there exist φ(r1 ) and φ(r2 ) in RB suchthat

sup φ(La) < φ(r1 ) < φ(r2 ) < inf φ(Ua).


For any s ∈ La , φ(s) ∈ φ(La), so φ(s) < φ(r1 ) and, hence, s < r1 . It follows that

a = supLa ≤ r1 .

Similarly,r2 ≤ inf Ua = a.

Combining, we get r1 = r2 , which contradicts the fact that φ(r1 ) < φ(r2 ) (since φ isorder-preserving). Hence, the lemma holds. 2

Here is the definition of the extension Φ of φ. For any a ∈ FA,

Φ(a) = sup φ(La) = inf φ(Ua).

We will now check that Φ has the necessary ordering and algebraic properties.

Lemma Φ is an order-preserving bijection.

Proof Suppose a, b ∈ FA and a < b. First, we show that Φ(a) < Φ(b).Let r1 , r2 ∈ RA satisfy

a < r1 < r2 < b.

Then r1 ∈ Ua and φ(r1 ) ∈ φ(Ua) and, hence,

Φ(a) = inf φ(Ua) ≤ φ(r1 ).

Similarly, r2 ∈ Lb and φ(r2 ) ∈ φ(Lb) and

Φ(b) = sup φ(Lb) ≥ φ(r2 ).

Together we haveΦ(a) ≤ φ(r1 ) < φ(r2 ) ≤ Φ(b)

since φ is order-preserving.Next, we prove that Φ is a bijection. Since Φ is order-preserving, it is injective. We show

that the range of Φ is FB .Let c ∈ FB and define

T = a | a ∈ FA and φ(a) ≥ c.By the Corollary to the Density Theorem, there are φ(r1 ) and φ(r2 ) in RB with

φ(r1 ) < c < φ(r2 ).

Hence, r2 ∈ T and since r1 is a lower bound of c, we know an infimum of T exists. Let thisbe b.

Suppose Φ(b) < c. Then some φ(r) ∈ RB exists with

Φ(b) < φ(r) < c.

This implies r is a lower bound of T and that b < r , which is not possible.Now suppose c < Φ(b) so that for some φ(s) ∈ RB ,

c < φ(s) < Φ(b).

Then s ∈ T and s < b which is not possible.Therefore, Φ(b) = c and the range of Φ is FB . 2


Lemma For all a, b ∈ FA,

(i) Φ(a + b) = Φ(a) + Φ(b);

(ii) Φ(−a) = −Φ(a); and

(iii) Φ(a.b) = Φ(a).Φ(b).

Proof (i) Addition Given any a, b ∈ FA, for any r , s ∈ RA with r < a and s < b, we haver + s ≤ a + b (by the ordered field axioms). Applying the order-preserving map Φ, weget

Φ(r + s) ≤ Φ(a + b)

andφ(r + s) ≤ Φ(a + b)

since Φ is φ on RA. Now φ : RA → RB is a ΣOrdered Field -homomorphism, so

φ(r + s) = φ(r) + φ(s).

Substitutingφ(r) + φ(s) ≤ Φ(a + b)

and rearranging,φ(r) ≤ Φ(a + b)− φ(s)

for all r ∈ La . This implies that

Φ(a) = supφ(La) ≤ Φ(a + b)− φ(s).

Rearranging,φ(s) ≤ Φ(a + b)− Φ(a)

for all s < b. Hence,Φ(b) = supφ(Lb) ≤ Φ(a + b)− Φ(a).

Hence, rearrangingΦ(a) + Φ(b) ≤ Φ(a + b).

By a similar argument, using inf φ(Ua) and inf φ(Ub), we can deduce that

Φ(a + b) ≤ Φ(a) + Φ(b).

Hence, we deduce they are equal and (i) is true.

(ii) Additive Inverse Clearly, because of (i),

Φ(a) + Φ(−a) = Φ(a − a)

= Φ(0A)

= 0B .

SoΦ(−a) = −Φ(a).

9.4. CANTOR’S CONSTRUCTION OF THE REAL NUMBERS 325

(iii) Multiplication Let a, b ∈ FA and suppose a, b > 0A. For any r , s ∈ RA with

0A < r ≤ a and 0A < s ≤ b

we have that rs ≤ ab and thatΦ(r .s) ≤ Φ(a.b)

since Φ is order-preserving. Now Φ = φ on RA and φ is a homomorphism so

Φ(r .s) = φ(r .s)

= φ(r).φ(s)

= Φ(r).Φ(s),

and we deduce thatΦ(a).Φ(b) ≤ Φ(a.b).

Similarly, we can obtainΦ(a.b) ≤ Φ(a).Φ(b)

and henceΦ(a).Φ(b) = Φ(a.b)

for a, b > 0 .

Suppose a = 0 or b = 0 . Then trivially,

Φ(a.b) = Φ(0A)

= 0B

= Φ(a).Φ(b).

If a < 0 , b > 0 , then we argue as follows:

Φ(a.b) = −Φ(−a.b)

= −(Φ(−a).Φ(b))

= Φ(a).Φ(b).

The cases of a > 0 , b < 0 and a < 0 , b < 0 are treated similarly.2

This completes the proof of the lemma and the proof of the Uniqueness Theorem. 2

9.4 Cantor’s Construction of the Real Numbers

We will build a representation of the data type

(R; 0 , 1 ; + ,− , . , −1 , < )

of real numbers using Cauchy sequences of rational numbers. Now the set CSeq of Cauchysequences contains infinitely many representations of the same point, and this fact will requireconstant attention, for when we operate on different representations of the same real num-ber, we need to ensure that the results do not represent different answers, i.e., operations onrepresentations are “well-defined”, or extend to what they represent. We begin with equality.


9.4.1 Equivalence of Cauchy Sequences

Definition (Equality of Representations of Reals) Consider two Cauchy sequences

a = a1 , a2 , . . . , an , . . . and b = b1 , b2 , . . . , bn , . . .

of rationals. We say that the sequences a and b are equivalent if for any rational number ε > 0 ,there is a natural number n0 , such that for all n > n0 , we have |an − bn | < ε. Let a ≡ b denotethe equivalence of Cauchy sequences a and b.

Two Cauchy sequences are equivalent if, for any degree of accuracy ε > 0 , there is somepoint n0 , after which the elements of the sequence are within ε of each other.

For example, the sequence

1 ,1

2,1

3,1

4, . . . ,

1

n, . . .

is equivalent with0 , 0 , 0 , 0 , . . . ,

because for any number ε = αβ> 0 , there is a natural number n0 = d1 + β

αe such that |1

n−0 | < ε

for all n > n0 .

Lemma The relation ≡ is an equivalence relation on CSeq.

Proof We will show that ≡ satisfies the three necessary properties.Let

a = a1 , a2 , . . . , b = b1 , b2 , . . . and c = c1 , c2 , . . .

be any three Cauchy sequences.

(i) ≡ is reflexive. Clearly, for any ε > 0 and all n = 1 , 2 , . . .

|an − an | = |0 |= 0

< ε

and so a ≡ a.

(ii) ≡ is symmetric.

Suppose a ≡ b. Then, by definition, for any ε > 0 , there exists a stage n0 , such that forall later stages n > n0 ,

|an − bn | < ε.

Clearly, we have that for any ε > 0 , there exists n0 such that for all n > n0 ,

|bn − an | = |an − bn |< ε

and, by definition, b ≡ a.


(iii) ≡ is transitive. Suppose a ≡ b and b ≡ c. Then, by definition, for any ε > 0 , there existsa stage n0 , such that for all later stages n > n0 ,

|an − bn | <ε

2and |bn − cn | <

ε

2.

For such n, adding the inequalities, we deduce that

|an − cn | ≤ |an − bn |+ |bn − cn |<ε

2+ε

2< ε

and, so by definition, a ≡ c.

2

Lemma Let a = a1 , a2 , . . . be a Cauchy sequence satisfying the following condition:

For any δ > 0 and any n, there exists some m > n such that |am | < δ. (∗)

Then a ≡ 0 , the zero Cauchy sequence.

Proof Since a is a Cauchy sequence, for any δ > 0 , there is an n0 , such that for all m, n > n0 ,

|an − am | <δ

2.

If a satisfies the condition ∗, then given n0 , there exists some m > n0 , such that

|am | <δ

2.

To prove the Cauchy sequence is equivalent to 0 , we calculate for any n > m,

|an − 0 | = |an |= |an − am + am |≤ |an − am |+ |am |

<δ

2+δ

2< δ

Hence, a ≡ 0 . 2

Corollary Let a = a1 , a2 , . . . be a Cauchy sequence such that a 6≡ 0 . Then there exists someδ > 0 and k such that for all n > k,

|an | ≥ δ.

Each real number has a unique representation by means of an equivalence class of Cauchysequences.


Definition A Cantor real number is an equivalence class of Cauchy sequences under the equiv-alence relation ≡. If a = a1 , a2 , . . . is a Cauchy sequence, then the equivalence class

[a] = b ∈ CSeq | b ≡ a

is the Cantor real number it represents. The set of Cantor real numbers is the set

RCantor = CSeq/ ≡= [a] | a ∈ CSeq

We will build the data type of real numbers from the data type of Cauchy sequences. Wewill define operations on Cauchy sequences that “lift” to equivalence classes, i.e., to Cantorreal numbers. Typically, to define, say, a binary operation, we define a function f on Cauchysequences and check the following two conditions apply:

Operations on Representations The operation f maps Cauchy sequences to Cauchy se-quences, i.e.,

a, b ∈ CSeq implies f (a, b) ∈ CSeq .

Lifting Operations and Congruences The operation f preserves equivalent Cauchy se-quences, i.e.,

a ≡ a ′ and b ≡ b ′ implies f (a, b) ≡ f (a ′, b ′).

We say that ≡ is congruence with respect to operation f .We begin with the first task.

9.4.2 Algebra of Cauchy Sequences

We begin by equipping CSeq with operations to form the algebra

(CSeq ; 0 , 1 ; + , − , . , −1 , < ).

The operations on Cauchy sequences are point-wise extensions of the corresponding operationson the rationals, although some care is needed in the case of the multiplicative inverse.

First, we give an important property of Cauchy sequences that we often use.

Lemma (Boundedness) Each Cauchy sequence a = a1 , a2 , . . . is bounded in the followingsense: there is an M ∈ Q such that

|an | ≤ M .

Proof Given a Cauchy sequence a = a1 , a2 , . . ., we choose ε = 1 and let n0 be such that forall n,m > n0 ,

|an − am | < 1 .

For any n > n0 + 1 , we have

|an | = |an0+1 + an − an0+ 1 |

≤ |an0+1 |+ |an − an0+ 1 |

< |an0+1 |+ 1 .


Take M to be the maximum of

|a1 |, |a2 |, . . . |an0|, |an0+1 |+ 1

then|an | < M

for all n = 1 , 2 , . . .. 2

Definition (Addition) Let

a = a1 , a2 , . . . , an , . . . and b = b1 , b2 , . . . , bn , . . .

be sequences of rationals. We add two sequences of rational numbers by applying the additionoperation of rational numbers in a point-wise fashion: we define the sum a+b of these sequencesto be the sequence

a + b = a1 + b1 , a2 + b2 , . . . , an + bn , . . .

of rationals.

Lemma (Addition) If a and b are Cauchy sequences then a + b is a Cauchy sequence.

Proof If a and b are Cauchy sequences then, by the Boundedness Lemma, we know they arebounded by rationals Ma > 1 and Mb > 1 . Take

M = max(Ma ,Mb).

Then we have that for all n,|an | < M and |bn | < M .

Since they are Cauchy sequences, for any ε > 0 , there are n0 (a) and n0 (b), such that for allm, n > n0 = max(n0 (a), n0 (b)), we have

|an − am | <ε

2Mand |bn − bm | <

ε

2M.

For such m, n it follows that

|(an + bn)− (am + bm)| ≤ |an − am |+ |bn − bm |<

ε

2M+

ε

2M

=ε

M< ε.

2

Definition (Multiplication) Let

a = a1 , a2 , . . . , an , . . . and b = b1 , b2 , . . . , bn , . . .

be sequences of rationals. We multiply two sequences of rational numbers by applying themultiplication operation of rational numbers in a point-wise fashion: we define the product a.bof these sequences to be the sequence

a.b = a1 .b1 , a2 .b2 , . . . , an .bn , . . .

of rationals.


Lemma (Multiplication) If a and b are Cauchy sequences then a.b is a Cauchy sequence.

Proof If a and b are Cauchy sequences then, by the Boundedness Lemma, we can find acommon bound M such that

|an | ≤ M and |bn | ≤ M

for all n = 1 , 2 , . . .. And since they are Cauchy sequences, for any ε > 0 , there is a commonstage n0 such that for m, n > n0 , we have

|an − am | <ε

2Mand |bn − bm | <

ε

2M.

Then, for such m, n it follows that

|anbn − ambm | = |an(bn − bm) + bm(an − am)|≤ |an ||bn − bm |+ |bm ||an − am |< M

ε

2M+ M

ε

2M

<ε

2+ε

2< ε

2

Definition (Subtraction) Let

a = a1 , a2 , . . . , an , . . . and b = b1 , b2 , . . . , bn , . . .

be sequences of rationals. We define the additive inverse of a sequence of rational numbers by

−a = −a1 ,−a2 , . . . ,−an , . . . .

We define the subtraction a − b of the two sequences to be the sequence

a − b = a + (−b)

= a1 − b1 , a2 − b2 , . . . , an − bn , . . .

of rationals.

Lemma (Subtraction) If a and b are Cauchy sequences then −a and a − b are Cauchysequences.

Proof These properties follow easily from the lemmas above. Note that

c = −1 ,−1 , . . .

is a Cauchy sequence. By the Multiplication Lemma,

c.a = −1 .a1 ,−1 .a2 , . . . ,−1 .an , . . .

= −a1 ,−a2 , . . . ,−an , . . .

= −a


is a Cauchy sequence.Similarly, by the Addition Lemma,

a + (c.b) = a1 +−(b1 ), a2 +−(b2 ), . . . , an + (−bn), . . .

= a1 − b1 , a2 − b2 , . . . , an − bn , . . .

= a − b

is a Cauchy sequence. 2

Definition (Division) Let

a = a1 , a2 , . . . , an , . . . and b = b1 , b2 , . . . , bn , . . .

be sequences of rationals. Ifan 6= 0

for all n, then we define the multiplicative inverse of a by

(a)−1 = a−11 , a−1

2 , . . . , a−1n , . . . .

Ifbn 6= 0

for all n, then we define the division ab

of the two sequences to be the sequence

a

b= a.(b)−1

= a1 .b−1 , a2 .b

−12 , . . . , an .b

−1n , . . .

of rationals.

For the definitions to work, the operation of inverse must preserve Cauchy sequences. Ingeneral, it does not. However, we can add a further hypothesis:

Lemma (Division) Let a = a1 , a2 , . . . , an , . . . be a Cauchy sequence satisfying these two prop-erties:

(i) an 6= 0 for all n; and

(ii) there exists δ > 0 and stage k, such that for all n > k,

|an | ≥ δ

i.e., a 6≡ 0 .

Then the sequence a−1 is a Cauchy sequence.

Proof Condition (i) enables us to define the sequence a−1 . Let δ and k be as in Condition (ii).For any ε > 0 , since a is a Cauchy sequence, there exists n0 such that

|an − am | < δ2 ε


for all m, n > n0 . If N = max(k , n0 ), then for all m, n > N , we have

|a−1n − a−1

m | = |am − ananam

|

=|am − an ||an ||am |

<δ2 ε

δδ= ε.

Hence, a−1 is a Cauchy sequence. 2

The case of division ab

is left as an exercise.Finally, we consider the ordering.

Definition (Non-negative Cauchy Sequence) A Cauchy sequence a ∈ CSeq is said to benon-negative if for any ε > 0 there is a stage N such that

−ε < an

for all n > N .Let NNCSeq be the set of all non-negative Cauchy sequences.

From this we can define an ordering on Cauchy sequences.

Definition (Ordering) Let a, b ∈ CSeq . Define

a ≤ b if, and only if, a − b ∈ NNCSeq .

We will establish a number of basic properties of non-negative Cauchy sequences that willbe useful later.

Lemma If a is a Cauchy sequence, then either a or −a is a non-negative Cauchy sequence.

Proof Suppose, for a contradiction, neither a nor −a is non-negative. Then there is someε > 0 , such that for each n0 , there are m, n > n0 and for which,

an ≤ −ε and − am ≤ −ε.

Adding these inequalities gives

an − am ≤ −2 ε and |an − am | ≥ 2 ε.

This contradicts that a is a Cauchy sequence. 2

Lemma If a and −a are non-negative Cauchy sequences then a ≡ 0 .

Proof Suppose a and −a are non-negative. Then, by definition, for any ε > 0 , there is an n0

such that−ε < an and − ε < −an

or, equivalently,|an | < ε

for n > n0 . This is the condition for a ≡ 0 . 2


Lemma (Operations on Non-Negative Cauchy Sequences) If a and b are non-negativeCauchy sequences, then

(i) a + b is a non-negative Cauchy sequence, and

(ii) a.b is a non-negative Cauchy sequence.

Proof (i) By the Boundedness Lemma, there is a bound M > 0 such that

|an | ≤ M and |bn | ≤ M

for all n. Without loss of generality, we may assume M ≥ 2 .

Since a and b are non-negative, for any ε > 0 there is a stage n0 such that

− ε

M< an and − ε

M< bn

for all n > n0 . Hence, for such n,

an + bn > −2 ε

M> −ε,

since M ≥ 2 .

This proves that the sum is non-negative.

(ii) We need to show that an .bn > −ε for all n > n0 where n0 is as in (i). This breaks downinto four sub-cases.

(a) an ≥ 0 and bn ≥ 0 . Clearly an .bn ≥ 0 > −ε.(b) an < 0 and bn < 0 . Clearly an .bn ≥ 0 > −ε.(c) an < 0 and bn > 0 . Then

an .bn > (− ε

M).bn

> −ε.

(d) an > 0 and bn < 0 . Similarly.

In each case, we see that an .bn > −ε, and this proves that the product an .bn is non-negative.

2

9.4.3 Algebra of Equivalence Classes of Cauchy Sequences

The Cantor reals are equivalence classes of Cauchy sequences. To define the algebraic operationson Cantor reals, we have to lift the operations on Cauchy sequences to equivalence classes ofCauchy sequences. We will consider the operations in turn, giving a particularly full explanationin the first case of addition.


Addition

Let [a] and [b] be any Cantor reals, given by representatives of the equivalence classes a, b ∈CSeq . We wish to define the addition of the Cantor reals by

[a] + [b] = [a + b]

where a + b is defined in Section 9.4.2. However, there is a problem with the idea of definingoperations on equivalence classes using representations. How do we know that different choicesof representations, say a ′, b ′ ∈ CSeq for [a] and [b] yield the same result. The question is this:

Is the operation of + on equivalences properly defined? That is, given a, a ′, b, b ′ ∈CSeq , do we know that

[a] = [a ′] and [b] = [b ′] implies [a] + [b] = [a ′] + [b ′]?

Equivalently, by definition, this condition can be written as

[a] = [a ′] and [b] = [b ′] implies [a + b] = [a ′ + b ′],

or, simply, as the congruence condition

a ≡ a ′ and b ≡ b ′ implies a + b ≡ a ′ + b ′.

We will verify this last property.

Lemma (Congruence) For any a, a ′, b, b ′ ∈ CSeq,

a ≡ a ′ and b ≡ b ′ implies a + b ≡ a ′ + b ′.

Proof Suppose a ≡ a ′ and b ≡ b ′. For any ε > 0 , there exists a common stage N such that

|an − a ′n | <

ε

2and |bn − b ′

n | <ε

2,

for all n > N . For such n, we can deduce that

|(an + bn)− (a ′n + b ′

n)| = |an − a ′n + bn − b ′

n |≤ |an − a ′

n |+ |bn − b ′n |

<ε

2+ε

2< ε.

Hence, a + b ≡ a ′ + b ′. 2

Thanks to the Congruence Lemma for addition on CSeq , we may now conclude that thefollowing idea works:

Definition Given any Cantor reals [a], [b] with any representations a, b ∈ CSeq , then we definetheir addition by

[a] + [b] = [a + b].


Additive Inverse

We wish to define the additive inverse of a Cantor real [a], represented by a ∈ CSeq , by

−[a] = [−a].

To ensure this definition works, we must prove that for any a, a ′ ∈ CSeq ,

[a] = [a ′] implies [−a] = [−a ′].

Equivalently, we must prove a congruence lemma for − on CSeq .

Lemma (Congruence for −) For any a, a ′ ∈ CSeq,

a ≡ a ′ implies − a ≡ −a ′.

Proof If a ≡ a ′ then, by definition, for any ε > 0 there is an N such that

|an − a′n | < ε

for all n > N . Now, rearranging the expression, we have

|(−an)− (−a ′n)| = |a ′

n − an |= |an − a ′

n |< ε.

Hence −a ≡ −a ′. 2

Definition Given any Cantor real [a] with any representation a ∈ CSeq , we define the additiveinverse by

−[a] = [−a].

Multiplication

We wish to define the multiplication of two Cantor reals [a] and [b], represented by a, b ∈ CSeqrespectively, by

[a].[b] = [a.b].

To ensure that this definition works, we must prove a congruence lemma for . on CSeq .

Lemma (Congruence for .) For any a, a ′, b, b ′ ∈ CSeq,

a ≡ a ′ and b ≡ b ′ implies a.b ≡ a ′.b ′.

Proof Since Cauchy sequences are bounded (Boundedness Lemma), there exists a commonbound M > 0 for a and b ′, i.e., for all n,

|an | ≤ M and |b ′n | < M .

Since a ≡ a ′ and b ≡ b ′, for any ε > 0 there is a common stage N such that for n > N ,

|an − a ′n | <

ε

2Mand |bn − b ′

n | <ε

2M.


Hence, for such n,

|an .bn − a ′n .b

′n | = |an(bn − b ′

n) + b ′n(an − a ′

n)|≤ |an ||bn − b ′

n |+ |b ′n ||an − a ′

n |< M (

ε

2M) + M (

ε

2M)

< ε

Hence, a.b ≡ a ′.b ′. 2

Definition Given any Cantor reals [a] and [b], with representations a, b ∈ CSeq , then we definetheir multiplication by

[a].[b] = [a.b].

Multiplicative Inverse

We wish to define the multiplicative inverse of a Cantor real [a] 6= [0 ], represented by a ∈ CSeq .This is not as straightforward as the previous cases. Now if [a] 6= [0 ], then

a 6≡ 0 .

By the Corollary in Section 9.4.1, there exists some ε > 0 and an n0 such that for all n > n0 ,

|an | ≥ ε.

We define a new sequence c by

cn =

ε if n ≤ n0 ;

an if n > n0 .

It is easy to check the following:

Lemma c is a Cauchy sequence and c ≡ a, and so

[c] = [a].

Proof Exercise. 2

Now for all n, cn 6= 0 and so, by the Division Lemma in Section 9.4.2, we have c−1 ∈ CSeq .We wish to define

[a]−1 = [c−1 ]

but must first prove a congruence lemma.

Lemma (Congruence for −1 ) For any a, a ′ ∈ CSeq such that

a 6≡ 0 and a ′ 6≡ 0

and for all n,an 6= 0 and a ′

n 6= 0 .

Thena ≡ a ′ implies a−1 ≡ a ′−1

.


Proof Since a 6≡ 0 and a ′ 6≡ 0 , by the Corollary in Section 9.4.1, there is a δ > 0 and stage k ,such that

|an | ≥ δ and |a ′n | ≥ δ

for all n > k .Since a ≡ a ′, for any ε > 0 , there is an n0 such that

|an − a ′n | < δ2 ε

for all n > n0 . If N = max(k , n0 ) then for all n > N ,

|a−1n − a ′−1

n | =∣∣∣∣

a ′n − an

ana ′n

∣∣∣∣

=|a ′

n − an ||an ||a ′

n |

<δ2 ε

δ.δ< ε.

Hence a−1 ≡ a′−1 . 2

Definition Given any Cantor real [a] 6= [0 ], represented by a ∈ CSeq , we define the multi-plicative inverse by

[a]−1 = [c]−1

where c ∈ CSeq is the transformation of a defined above.

We have defined +, −, . and −1 on CSeq/ ≡ and what remains is the ordering.We wish to define the ordering of two Cantor reals [a] and [b], represented by a, b ∈ CSeq ,

by[a] ≤ [b] if, and only if, a ≤ b

or, equivalently,

[a] ≤ [b] if, and only if, b − a is a non-negative Cauchy sequence.

We need the following fact.

Lemma (Congruence for ≤) For any a, a ′, b, b ′ ∈ CSeq,

a ≡ a ′, b ≡ b ′ and a ≤ b implies a ′ ≤ b ′.

Proof By the Congruence Lemma for subtraction,

a − b ≡ a ′ − b ′.

By the following lemma, we know that a ′ − b ′ is non-negative, and hence that a ′ ≤ b ′. 2

Lemma If a is a non-negative Cauchy sequence and a ≡ a ′ then a ′ is a non-negative Cauchysequence.


Proof For any ε > 0 there is a stage n0 such that

− ε2< an ,

and|an − bn | <

ε

2

for all n > n0 . Thus, for such n,

− ε2< bn − an

and

bn = an + (bn − an)

> − ε2− ε

2> −ε

. 2

9.4.4 Cantor Reals are a Complete Ordered Field

A complete ordered field is an algebra with five operations named in the signature ΣOrdered Field ,that satisfy the fourteen axioms in the set TComplete Ordered Field . The Uniqueness Theorem inSection 9.3 says that

A,B ∈ Alg(ΣOrdered Field ,TComplete Ordered Field )⇒ A ∼= B

Now, in Section 9.4.2 we constructed a ΣOrdered Field -algebra

(CSeq ; 0 , 1 ; + ,− , . , −1 , ≤ )

of Cauchy sequences of rational numbers to model and implement the real numbers, guidedby ideas about measuring the line. As we saw in Section 9.4.2, this algebra is not a completeordered field, although it is a commutative ring with multiplicative identity.

Then in Section 9.4.3, we used the equivalence relation ≡ to construct the ΣOrdered Field -algebra

RCantor = (CSeq/ ≡; [0 ], [1 ]; + ,− , . , −1 , ≤ )

of equivalence classes of Cauchy sequences of rationals. This required us to prove ≡ was aΣOrdered Field -congruence.

What we have to prove is this:

Theorem The ΣOrdered Field -algebra RCantor is a complete ordered field, i.e.,

RCantor ∈ Alg(ΣOrdered Field ,TComplete Ordered Field ).

An immediate corollary of the Uniqueness Theorem is this:

Corollary Any complete ordered field is isomorphic with RCantor .


Proof of the Theorem

We will prove that the ΣOrdered Field -algebra RCantor satisfies

(a) the field axioms;

(b) the ordering axioms; and

(c) the completeness axiom.

We begin with (a), which follows from the following:

Lemma For all a, b, c ∈ CSeq,

Associativity of addition ([a] + [b]) + [c] = [a] + ([b] + [c])

Identity for addition [a] + [0 ] = [a]

Inverse for addition [a] + (−[a]) = [0 ]

Commutativity for addition [a] + [b] = [b] + [a]

Associativity for multiplication ([a].[b]).[c] = [a].([b].[c])

Identity for multiplication [a].[1 ] = [a]

Inverse for multiplication [a] 6= [0 ]⇒ [a].([a]−1 ) = [1 ]

Commutativity for multiplication [a].[b] = [b].[a]

Distribution [a].([b] + [c]) = [a].[b] + [a].[c]

Distinctness [0 ] 6= [1 ]

Proof Each of these properties follows from the corresponding property of the rational numbers.

Associativity of addition We calculate

([a] + [b]) + [c] = [a + b] + [c] by definition;

= [(a + b) + c] by definition;

= [a + (b + c)] associativity of rationals addition;

= [a] + [b + c] by definition;

= [a] + ([b] + [c]) by definition.

Identity for addition We calculate

[a] + [0 ] = [a + 0 ] by definition;

= [0 + a] commutativity of rationals addition;

= [a] rationals identity.


Inverse for addition We calculate

[a] + (−[a]) = [a] + [−a] by definition;

= [a − a] by definition;

= [0 ] rationals addition.

The remaining axioms are verified similarly. We tackle the inverse for multiplication andleave the rest as exercises.

Inverse for Multiplication Given [a] 6≡ 0 we choose a Cauchy sequence a ′ such that

[a ′] = [a] and a ′n 6= 0 for all n.

Then we can calculate

[a].([a])−1 = [a ′].([a ′])−1 replacing a by a ′;

= [a ′].[a ′−1] by definition of −1 ;

= [a ′.a ′−1] by definition of .;

= [1 ] by inverse property of the rationals.

2

The next task (b) is to show that RCantor is an ordered field. This follows from the followinglemma.

Lemma For all a, b, c ∈ CSeq,

Reflexivity [a] ≤ [a]

Antisymmetry [a] ≤ [b] and [b] ≤ [a]⇒ [a] = [b]

Transitivity [a] ≤ [b] and [b] ≤ [c]⇒ [a] ≤ [c]

Total Order Either [a] ≤ [b] or [b] ≤ [a]

Addition [a] ≤ [b]⇒ [a] + [c] ≤ [b] + [c]

Multiplication [a] ≤ [b] and [0 ] ≤ [c]⇒ [a].[c] ≤ [b].[c]

Proof

ReflexivityNow [a]− [a] = [a − a] = [0 ], which is non-negative. Thus, [a] ≤ [a] by definition of ≤ on

Cauchy sequences.Antisymmetry

Suppose [a] ≤ [b] and [b] ≤ [a]. Then b − a and a − b are both non-negative. By Lemma?? in Section 9.4.2,

a − b ≡ 0 and b − a ≡ 0 .

In particular, [a] = [b].


TransitivitySuppose [a] ≤ [b] and [b] ≤ [c]. Then b − a and c − b are non-negative and, by Lemma ??

in Section 9.4.2,c − a = (b − a) + (c − b)

is non-negative. Thus, [a] ≤ [c].Total Order

Given any a, b, consider b−a. By Lemma ??, either b−a is non-negative, or −(b−a) = a−bis non-negative. Hence, either

[a] ≤ [b] or [b] ≤ [a].

AdditionIf [a] ≤ [b] then b − a is non-negative. Now

(b + c)− (a + c) = b − a

is non-negative, and so

[a + c] ≤ [b + c]

[a] + [c] ≤ [b] + [c].

MultiplicationIf [a] ≤ [b] and [0 ] ≤ [c] then b − a and c − 0 = c are non-negative. Multiplying,

(b − a).c = bc − ac

is non-negative, and so

[ac] ≤ [bc]

[a].[c] ≤ [b].[c].

2

The final task (c) is to prove the completeness axiom holds for RCantor . To do this, somepreparations are necessary.

The rational numbers play an important role throughout, of course, but here we are toprove that the approximation process is complete in a precise sense. Consider how the rationalnumbers are embedded in RCantor .

If p

q∈ Q, we write

(p

q) =

p

q,p

q, . . . ,

p

q, . . .

for the constant sequence where every element is p

q. Clearly, (p

q) ∈ CSeq . Further, we write

[p

q] = a ∈ CSeq | a ≡ (

p

q)

for the equivalence class of all Cauchy sequences equivalent to ( p

q). Strictly speaking, we have

a mappingφ : Q→ RCantor


defined for p

q∈ Q by

φ(p

q) = [(

p

q)]

that embeds Q inside RCantor .Recalling the properties of a ΣOrdered Field -homomorphism from Section 9.3.1 we show that:

Theorem The mapping φ : Q→ RCantor is a ΣOrdered Field -homomorphism that is injective.

We leave the proof of this theorem to Exercise 10. Of particular use is this order-preservingproperty

p

q<

p ′

q ′ implies φ(p

q) < φ(

p ′

q ′ )

orp

q<

p ′

q ′ implies [p

q) < [

p ′

q ′ ]

Lemma For any [a] ∈ RCantor , there exists c, d ∈ Z such that

[(c)] < [a] < [(d)].

Proof Since a is a Cauchy sequence, choosing ε = 1 , there exists N such that

|an − am | < 1

for all n,m > N . Let k > N and c, d ∈ Z such that

c ≤ ak − 1 and d ≥ ak + 1 .

Then for all n > k ,c < an < d .

It follows that(an − c) and (d − an)

are non-negative, and hence,

[(c)] < [a] and [a] < [(d)].

2

Lemma Let c, d ∈ CSeq and suppose c ≡ d. For any a, b ∈ CSeq, if

[(cn)] ≤ [a] ≤ [b] ≤ [(dn)]

for all n, then a ≡ b and[a] = [b].

Proof Since c ≡ d , for any ε > 0 , there is some n0 such that

|dn0− cn0

| < ε

3.

By hypothesis, [(cn0)] ≤ [(dn0

)], so cn0≤ dn0

and

0 ≤ dn0− cn0

<ε

3.


By hypothesis, [(cn0)] ≤ [a] and [b] ≤ [(dn0

)], and thus,

(an − cn0) and (dn0

− bn)

are non-negative sequences. Therefore, there exists some n1 such that for all n > n1 ,

− ε3≤ an − cn0

and − ε

3≤ dn0

− bn .

Rearranging,

−an ≤ε

3− cn0

and bn ≤ε

3+ dn0

and adding,

bn − an ≤2 ε

3+ (dn0

− cn0)

<2 ε

3+ε

3

< ε.

Finally, we note from the hypothesis that

[a] ≤ [b]

that the sequence (bn − an) is non-negative. This means there is some n2 such that for alln > n2 ,

−ε < bn − an .

Setting N = max(n1 , n2 ) we have that

|an − bn | < ε

for all n > N . This means that a ≡ b. 2

Lemma (Completeness) Any non-empty subset B of RCantor that is bounded below has agreatest lower bound.

Proof Given any lower bound [a] of B , we construct a Cauchy sequence

c = c0 , c1 , c2 , . . . , ct , . . .

of rational numbers such that [c] is a greatest lower bound of B . We define c inductively on t .

Basis t = 0 .The first element c0 is defined as follows. Suppose B is bounded below by [a]. Then, for

all [b] ∈ B , [a] ≤ [b]. By Lemma..., there exist integers m and n such that

[(m)] ≤ [a] ≤ [b] ≤ [(n)].

In particular, m determines a class that is a lower bound for B , and n determines a class thatis not a lower bound. Therefore, there must exist a largest integer c0 such that m ≤ c0 < nand

[(c0 )] is a lower bound for B but [(c0 + 1 )] is not a lower bound for B .

This c0 is the first element of c.


Induction Step From the element ct , we define ct+1 as follows. Suppose ct is a rationalnumber such that

[(ct)] is a lower bound for B but [(ct + 110 t )] is not a lower bound for B .

Consider the expression

ct +n

10 t+1.

We know this determines a lower bound for n = 0 and that it is not a lower bound for n = 10 .Therefore, there exists some 0 < nt < 10 such that nt is the largest integer such that

ct+1 = ct +nt

10 t+1

determines a lower bound. This defines the element ct+1 from ct .

Lemmac = c0 , c1 , c2 , . . . ∈ CSeq

Proof Given any ε > 0 , choose some t such that

1

10 t < ε.

Now, if t < m < n, then we may show by induction on the formula defining ct ,

ct ≤ cm ≤ cn < ct +1

10 t .

Hence, for all m, n > t , we have

|cm − cn | <1

10 t < ε

and c is a Cauchy sequence. 2

Consider [c]. Now for each t ,ct ≤ ct+1 .

For any choice k , the constant sequence (ck) satisfies

[(ck)] ≤ [c] (1 )

because

[c]− [(ck)] = [c − (ck)]

= [(ct)− (ck)]

= [(ct − ck)]

and ct − ck is non-negative.Now, for any t > k we have, by definition,

ct < ck +1

10k

and, hence,

[c] ≤ [(ck +1

10 k)]. (2 )

Combining inequalities (1) and (2) gives for all k ,

[(ck)] ≤ [c] ≤ [(ck +1

10 k)]. (3 )

9.5. COMPUTABLE REAL NUMBERS 345

Claim [c] is a lower bound of B , i.e., for all [b] ∈ B ,

[c] ≤ [b].

Proof of Claim Suppose, for a contradiction, that there is [b] ∈ B such that [b] < [c]. Hence,using (3 ) we have for all k ,

[(ck)] ≤ [b] < [c] ≤ [(ck +1

10 k)].

However, it is easy to see that

(ck) ≡ (ck +1

10 k).

By Lemma..., [b] = [c], which is a contradiction.

Claim [c] is the greatest lower bound of B .

Proof of Claim Suppose, for a contradiction, that there exists a lower bound [d ] for B suchthat

[c] < [d ].

Since for every k , the constant sequence class

[(ck +1

10 k)]

is not a lower bound, we have from (3) above that for all k ,

[(ck)] ≤ [c] < d ≤ [(ck +1

10 k)].

By Lemma ..., this again implies [c] = [d ] which is a contradiction. Hence, [c] is the greatestlower bound. 2

9.5 Computable Real Numbers

In each method for constructing the real number representation of the line, a point is representedby an infinite object; for example, the number

√2 is represented by two infinite sets of rationals

(Dedekind), or by one of infinitely many infinite sequences of rationals (Cantor).

How can we compute with these infinite objects that represent real numbers?

To give the representation of a real number, such as√

2 , we need an algorithm.To add two real numbers a and b we need an algorithm that inputs algorithms for the

representations of a and b and returns an algorithm for the representation of a + b.A real number is a computable real number if there is an algorithm that allows us to compute

a representation of the number, i.e. a rational number approximation to the real number toany given degree of accuracy. This idea must be made exact by applying it to a definition of arepresentation of real numbers that spells out the sense of approximation. Consider Dedekindand Cantor’s ways of making the idea of approximation exact.


Definition (Computable numbers following Dedekind) A real number is computable ifthere exists a computable Dedekind cut that represents it. This means that there exists apair (A1 ,A2 ) of sets of rational numbers and decision algorithm that determine for any p

q∈ Q

whether or not p

q∈ A1 or p

q∈ A2 .

Definition (Computable numbers following Cantor) A real number is computable if thereexists a computable Cauchy sequence that represents it. This means:

(i) there is an algorithm that computes a function a : N→ Q that enumerates a sequence ofrationals a1 , a2 , . . . , an , . . ., that is, for n ∈ N, a(n) = an ; and

(ii) an algorithm that computes a function N : Q → N such that for any rational numberε > 0 , the natural number N (ε) has the property that

for all n,m > N (ε), we have |an − am | < ε.

A number of questions arise concerning the algebraic properties of these computable num-bers, for example:

If a and b are computable real numbers, then are a + b and a.b computable realnumbers?

The answer is yes and in fact the familiar simple functions on reals return real numbers:

Theorem The set of computable real numbers forms an ordered subfield of the ordered field ofreal numbers.

The field of computable real numbers has many important properties but it is not a completeordered field. Of primary interest are questions about computing with computable numberssuch as:

Are there algorithms that transform the algorithms that enumerate approximationsof a and b to algorithms that enumerate an approximation for a + b, a.b, −a, anda−1 ?

This leads to the questions about the computability of the set of reals. The field of reals isnot a computable algebra in the sense of Section 7.6, though it has algorithms for its opera-tions. The whole subject demands a careful study that will look at the computability of datarepresentations and the calculus.

Most of the real numbers we know and use come from solving equations (e.g., the algebraicnumbers) and evaluating equationally defined sequences (e.g., e and π) and are computable.

However, most real numbers are non-computable. This is because the set of computablereals is infinite and countable, since the set of algorithms is countable. However, the set of realsis infinite and not countable by Cantor’s diagonal method. The infiniteness of the set of realsis strictly bigger than the infiniteness of the set of computable reals. The fact that the set ofreals is uncountable implies there does not exist a method of making finite representations orcodings for all real numbers. Real numbers are inherently infinite objects.

9.6. REPRESENTATIONS OF REAL NUMBERS AND PRACTICAL COMPUTATION347

9.6 Representations of Real Numbers and Practical Com-

putation

Our requirements analysis for the data type of real numbers focused on the problem of measuringthe line. We have developed a lot of theory which has given us

1. Insight into the fundamental roles of measurement, approximations and the rational num-bers.

2. An axiomatic specification of the data type of real numbers, i.e.,

(ΣOrdered Field ,TComplete Ordered Field ).

3. A proof that all representations or implementations of the axiomatic specification areequivalent, i.e., the uniqueness of algebras satisfying the axioms up to isomorphism.

4. A particular representation based on Cauchy sequences of rational numbers.

5. Questions about the scope and limits of computing with real numbers.

We will conclude this chapter by addressing rather briefly the following questions:

Can we perform practical computations with Cauchy representations of real num-bers?

and:

What is the relationship between Cauchy representations and floating point repre-sentations?

9.6.1 Practical Computation

Now a Cantor real number is represented by two functions

(a,M )

in whicha : N→ Q

generates the Cauchy sequencea(1 ), a(2 ), . . .

of rational numbers andM : Q→ N

defines the accuracy of the sequence by means of the Cauchy property

(∀ε)(∀n)(∀m)[n > M (ε) and m > M (ε)⇒ |a(n)− a(m)| < ε].

The function M is called the modulus of convergence for the sequence a.


Given a Cantor real number with representation (a,M ), we can define an approximationfunction

approx : Q→ Q

such that for any error margin ε ∈ Q,

approx (ε) = some rational number within ε of the real number represented by (a,M )

byapprox (ε) = a(M (ε)).

More formally, in terms of equivalence classes of Cauchy sequences, if [approx (ε)] and [ε] arethe classes of constant sequences approx (ε) and ε, then

[a]− [ε] < [approx (ε)] < [a] + [ε].

Definition (Fast Cauchy Sequences) A Cauchy sequence a is said to be a fast convergingsequence if

(∀n)(∀m)[m > n ⇒ |a(n)− a(m)| < 2−n ]

This means that the rational a(n) approximates the real to within 2 −n . More formally, wehave that the modulus function has the property

M (2−n) = n

andapprox (2−n) = a(n).

Theorem For every Cauchy sequence a, there is a fast Cauchy sequence b that represents thesame Cantor real number. I.e.,

[a] = [b].

A computable Cantor real number is represented by two algorithms

(αa , αM )

for computing these functions a and M where αa computes the sequence a and αM computesthe modulus of convergence M . By sequentially composing these algorithms αa and αM we getan algorithm

αapprox = αM ;αa

for computing approx . Furthermore, there is an algorithm that transforms the algorithms αa ,αM and αapprox to algorithms for a fast Cauchy sequence that represents the same real number.

To represent computable Cantor real numbers and program their operations in practice,however, fast Cauchy sequences are not enough. Space and time resources demand severalrefinements. For example, the operations on rational numbers quickly lead to rationals p

qwith

large numerators p and denominators q . The space needed to store a rational number is

log2 (|p|) + log2 (|q |) + c

where c is a constant. Indeed, some iterative processes can yield an exponential growth in the

lengths of p and q . Squaringp

qto get

p2

q2can double the space required to

2 (log2 (|p|) + log2 (|q |)) + c.

This is a problem that can be ameliorated by using dyadic rationals.


Definition (Dyadic rationals) A rational number r ∈ Q is a dyadic rational if it is of theform

r =b

2 k

where b ∈ Z and k ∈ Z. Let QDyadic be the set of all dyadic rationals.

A dyadic number is a rational number with a finite binary expansion. The space needed tostore a dyadic rational b.2−k is

log2 (|b|) + log2 (|k |) + c

where c is a constant. In practice, k will often be smaller than b.A first refinement is to restrict the representations to Cauchy sequences of dyadic rationals.

To accomplish this, we need the following facts.

Theorem The set QDyadic of dyadic rationals is closed under

+, −, and .

and hence forms a sub-ring of the commutative ring Q of rational numbers. Furthermore, theset QDyadic is a countably infinite set that is dense in Q, i.e., for any r1 , r2 ∈ Q there isb.2−k ∈ QDyadic such that

r1 <b

2 k< r2 .

It is essential that b and k are not bounded for these properties. Clearly, QDyadic is notclosed under multiplicative inverse since

3 =3

2 0∈ QDyadic

but1

36∈ QDyadic

Another refinement is to restrict further the set of Cauchy sequences to be used, e.g., byadding extra error information.

9.6.2 Floating Point Representations

Floating point representations of real numbers are designed to use standard memory structuresof computers in a straightforward way. In practice, there are many variations and refinements,but one can think of them as a form of dyadic representation.

The connection with floating point representations is made by

(i) placing bounds on the b and k ; and

(ii) replacing an infinite sequence of rationals by a single rational

b

2 k.


The floating point representation based on b and k define a subset Float(b, k) of the realnumbers. The subset has two problems:

(i) the distribution of the real numbers leaves serious gaps; and

(ii) the field axioms fail to hold for the basic operations.

The fact that Float(b, k) is not a subfield of R is particularly disappointing: the user’s ideasand expectations of real number algebra collapse. Indeed, creating the algebra of Float(b, k)seems to be a mathematical challenge that seems to offers little mathematical reward, butconsiderable technological reward.



1. Prove the following. Let m, n ∈ N and suppose m 6= k n for any k ∈ N. Then n√m is an

irrational number.

2. Prove that for all rational numbers x , y ∈ Q,

(a) |x + y | = |y + x |;(b) |x − y | = |y − x |;(c) |x .y | = |y .x |;(d) |x .y | = | − x .y | = |x .− y | = | − x .− y |;(e) |x 2 | = |x |2 = x 2 ;

(f ) −|x | ≤ x ≤ |x |;(g) |x + y | ≤ |x |+ |y |;(h) |x − y | ≤ |x |+ |y |;(i) ||x | − |y|| ≤ |x − y |;

3. Prove that the set Q of all rational numbers is countably infinite.

4. Show that for any rational numbers x , y ∈ Q,

x <1

2(x + y) < y .

Hence, deduce that for any x , y ∈ Q, if x 6= y then there are infinitely many rationalnumbers between x and y . Deduce that

r ∈ Q | x < r < y

is countably infinite.

5. Give a geometric construction that uses an (unmarked) ruler and compass that dividesany line segment into n equal parts, for any n ∈ N and n ≥ 1 .

6. Which of the following statements are true in any field? If the statement is true, deduceit from the axioms. If it is false, give an example of a field in which it fails.

(a) 1 .0 = 0 ;

(b) (∀x )(∀y)(∀z )[(x + y).z = x .z + y .z ];

(c) (∀x )(∀y)(∀z )[(x + y)2 = x 2 + (1 + 1 ).x .y + y2 ];

(d) (∀x )(∀y)(∀z )[(x + y).(x − y) = x 2 − y2 ];

(e) (∀x )(∃y)[x = y2 ];

(f ) (∀x )(∀y)[x .y = 0 ⇔ x = 0 or y = 0 ];

(g) all fields are infinite.


7. Show that the set Seq of all sequences of rational numbers

Seq = a = (a1 , a2 , . . .) | an ∈ Q, n ≥ 1

forms a commutative ring with multiplicative identity under the point-wise operations.

Show that Seq is not a field.

8. Show that the set CSeq of all Cauchy sequences of rational numbers forms a commutativering with multiplicative identity under the point-wise operations. Show that CSeq is nota field.

9. Define a map φ : CSeq → CSeq/ ≡ by

φ(a) = [a]

for a ∈ CSeq . Show φ is a ring homomorphism that is surjective.

10. Prove that the mappingφ : Q→ RCantor

is an injective field homomorphism. .

11. A sequence c = c1 , c2 , . . . is a fast Cauchy sequence if, for all m, n with m < n,

|cn − cm | < 2−n .

Show that every fast Cauchy sequence is a Cauchy sequence.

Show that for every Cauchy sequence a, there exists a fast Cauchy sequence c, such that

c ≡ a.

Part II

Syntax

353

355

Introduction

There are countless types of written texts that are defined by special notations and rules thatgovern their composition. The symbols, notations, layout and other graphic conventions, andthe rules that characterise a category of text together form the syntax of that category of text.

English, like most natural languages, has many examples of particular types of text: novels,essays, poems, prayers, letters, postal addresses, shopping lists, forms, laws, contracts, lastwills and testaments, examination papers, school reports, etc. It is rarely easy to define suchcategories of text in natural languages. Often, it is easy to give a set of basic rules and refinethem by contemplating examples. However, to give a list of rules that completely defines acategory of natural language texts is difficult, if not impossible.

Texts are not confined to natural languages. In artificial languages, classes of texts that areof great interest are written in notation systems. Among the oldest and most imporrtant arethe notation systems of mathematics in which are written mathematical formulae, deductionsand proofs. Other familiar examples of notation systems are chemical formulae and musicalnotation. These texts — along with poems and postal addresses — are two dimensional.

Languages for programming also have many categories. Examples are programs, procedures,data types, objects, interfaces, specifications, proofs, files, hyper-texts, identifiers, names, andaddresses. There are many tools for processing these texts such as compilers, interpreters, wordprocessors, communication protocols, editors, and browsers. Amazingly, it is possible to givecomplete sets of rules for the description of artificial languages, and often it is not terriblydifficult.

In Part II we will study the elements of the theory of syntax with an emphasis on applicationsto programming languages. Among the topics we meet are: the method of defining concretesyntax using a regular and a context-free grammar, the limitations of purely regular and context-free definitions, kernel languages and their extensions, and the use of algebraic methods tospecify abstract syntax.

We will begin with the problem of defining precisely the visible form of texts. In Chapter 10,we introduce grammars, derivations and formal languages, and consider some simple examplesof syntactic specifications such as addresses (both mail and internet). In Chapter 11, we look atthe specification of interface definition languages and imperative programming languages. Thus,with a great stack of interesting examples of grammars and languages, we embark on the theoryof grammars. In Chapter 12, we define and study the regular grammars, which have extremelysimple rules. In Chapter 13, we explain the idea of a finite automaton and its connection withregular grammars. Regular grammars are too simple to define a whole programming language.In Chapter 14, we develop some of the mathematical theory of context-free grammars andapply results, such as the Pumping Lemma, to establish an adequate theoretical framework forthe syntax of programming languages. Finally in Chapter 15, we introduce a more abstractapproach to syntax based on term algebras.

Chapter 10

Syntax and Grammars

Languages are systems in which we express ourselves and communicate with one another.The languages we speak are called natural languages and the majority have a written form.In computing we also communicate with machines. To instruct a machine we must expressour intentions with great precision. Since the early days of computing, thousands of writtenlanguages have been developed to do just this. The programming languages in which we writeprograms to control machines are defined formally by strict rules.

A program is a text. It is made from different kinds of notations and linguistic notions— such as types, identifiers, variables, expressions, constructs, procedures and so on. Thenotations and notions obey rules that specify how to form texts that are valid programs in thelanguage. In an analogy with natural languages, we say that the notations, notions and rulesform the grammar of the language. The grammar and the texts it defines together form thesyntax of the programming language.

There are other kinds of languages of interest in computer science, in addition to program-ming languages. For example, there are languages for writing formal specifications, describingsystem architectures, and reasoning about components, in both software and hardware. Logi-cal languages, like the language of first order logic, are used extensively in computing. In fact,logical languages were the first languages to be defined formally with a syntax and a semantics.They were invented to analyse the foundations of mathematical reasoning, to answer ques-tions such as: “What are mathematical statements, calculations and proofs?” Our theoreticalunderstanding of programming languages originates with concepts developed in logic.

There are also languages consisting of special types of texts with restricted uses that mustbe precisely defined. For example, addresses are small pieces of text that are always governedby strict formation rules.

We will study syntax in order to understand how to specify programming languages, andother languages of interest. Our first theory of syntax views a program as a

string of symbols

and a programming language as a set of such strings. It is based on the mathematical ideas ofa

formal language,

which is a set of strings made from an arbitrary alphabet, and a

357

358 CHAPTER 10. SYNTAX AND GRAMMARS

grammar,

which is a set of rules for forming strings. The theory has proven to be of immense practicalvalue.

In this chapter and the next, we will develop just enough of the theory of grammars tounderstand the basic ideas involved in the practical definition of the syntax of programminglanguages. Specifically, in this chapter, in Section 10.1, we introduce alphabets and formallanguages and show just how widely applicable they are. In Section 10.2, we define grammarsand explain how to use their rules to make formal derivations of strings that generate formallanguages. In Section 10.3, we introduce some practical techniques used to define the syntax oflanguages, including a modular method of writing grammars, and the BNF notation. Program-ming languages are neither small nor simple and the modular grammars allow us to introducesyntactic categories in stages, while the BNF provides a notation that is easy to read and re-member. We illustrate the process of formally defining languages with these tools by explainingtwo case studies, both of which are examples of addresses. In Section 10.4, we define, in somedetail, the following:

• postal addresses; and

• World Wide Web addresses.

The next chapter is devoted to examples of the formal definition of programming languages.

10.1 Alphabets, Strings and Languages

We begin by collecting lots of examples of languages with different alphabets. As usual, wedefine some general concepts first.

10.1.1 Formal Languages

Definition (Alphabet) An alphabet is a finite non-empty set T . We shall consider the ele-ments of T to be symbols.

Simple familiar examples are the alphabet

TDigit = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

of decimal digits and the lower-case alphabet

TEnglish Lowercase Alphabet = a, b, c, d, e, f, g, h, i, j, k, l,m, n, o, p, q, r, s, t, u, v,w, x, y, z

of the English language.

Definition (String) Given any symbols t1 , t2 , . . ., tn ∈ T , we can form the string, or word,

w = t1 t2 · · · tnover the alphabet T . The set of all strings or words over the alphabet T is

T ∗ = t1 t2 · · · tn | n ≥ 0 , t1 , t2 , . . . , tn ∈ T.

10.1. ALPHABETS, STRINGS AND LANGUAGES 359

For example, the numbers1984 and 2000

are strings or words over the alphabet TDigit of decimal digits, and

forwards and sdrawkcab

are strings or words over the lower-case English alphabet TEnglish Lowercase Alphabet .Length is a function

| | : T ∗ → N.

The length |w | of a string w ∈ T ∗ is the number of symbols from the alphabet that it contains.Thus, if t1 , . . . , tn ∈ T and w = t1 · · · tn then |w| = n. For example,

|2000| = 4 and |forwards| = 8 .

The empty string uses no symbols from T and is represented by the symbol ε. The length|ε| = 0 . In the definition of T ∗, the empty string is t1 · · · tn for n = 0 .

We refer to the set of all non-empty strings over an alphabet T as T +. Thus,

T+ = T ∗ \ ε = t1 t2 · · · tn | n ≥ 1 , t1 , t2 , · · · , tn ∈ T.

Definition (Concatenation) Given two strings

u = u1 · · · um and v = v1 · · · vn

over T , the concatenation uv of u and v is the string:

uv = u1 · · · umv1 · · · vn

For example, concatenating the numbers 1984 and 2000 gives the string

19842000.

Concatenation is a function. : T ∗ × T ∗ → T ∗.

The length function satisfies|uv | = |u|+ |v |.

We write wn for the string made by concatenating n copies of w , i.e.,

wn = w · · ·w︸︷︷︸

n times.

For example:(forwards)3 = forwardsforwardsforwards

A language is made of strings over an alphabet. How can we define exactly those stringsthat are in the language? We can define a formal language L by defining a dictionary thatlists all the strings of L. Alternatively, we can describe a formal language L by describing thegeneral form or pattern of the strings that are in L. Mathematically, we can model both ofthese methods of describing a formal language with the idea of forming a set of strings.


Definition (Formal Language) A formal language L over an alphabet T is simply somesubset

L ⊆ T ∗

of the set T ∗ of all possible strings over T . For brevity, we usually say language for formallanguage.

With these general ideas defined, let us look at many example languages.

10.1.2 Simple Examples

Let us start with a very simple alphabet Tab consisting of just two letters:

Tab = a, b.

Here is an enumeration

T ∗ab = ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, . . ..

of the set of all possible strings over Tab . Notice that there are 2 n+1 − 1 words of length atmost n. (Exercise.)

At first sight, this alphabet may seem to be very limited. However, if we had chosen thesymbols 0 and 1 instead of a and b then it is obvious strings of two letters have unlimited usesince they include all binary numbers.

Let us consider just a few examples of languages over Tab .

L = ∅ L = a, b∗L = ε L = aL = a, b L = abbaL = an | n is even L = ap | p is primeL = anbn | n ≥ 1 L = anbn+1 | n ≥ 1L = (ab)n | n ≥ 0 L = (ba)n | n ≥ 0

There are many languages over Tab which have useful practical properties. For instance, ifwe considered the symbols a and b to represent some form of left and right parentheses, thenthis leads us to examples of bracketing methods all of the form anbn | n ≥ 1:

Alphabet Bracketing Language

T() = (, ) L() = (n)n | n ≥ 1Tbegin end = begin, end Lbegin end = beginn endn | n ≥ 1Tplease thank-you = please, thank-you Lplease thank-you = pleasenthank-youn | n ≥ 1Thello goodbye = hello, goodbye Lhello goodbye = hellongoodbyen | n ≥ 1

10.1.3 Natural Language Examples

The concept of a formal language was originally developed by Noam Chomsky in 1956 (Chomsky [1956])to model aspects of natural languages, though strings had been studied mathematically muchearlier. Here, we look at some examples of natural languages.


English words If we take the alphabet

TEnglish Lowercase Alphabet = a, b, . . . , z,

then we can consider the language

LEnglish Words = w ∈ T ∗English Lowercase Alphabet | w is an English word

consisting of all English words over the alphabet TEnglish Lowercase Alphabet . So,

island ∈ LEnglish Words but ynys 6∈ LEnglish Words .

Consonant words If we remove the vowels from the alphabet TEnglish Lowercase Alphabet , to give

TEnglish Lowercase Consonants = a, b, . . . , z − a, e, i, o, u.

and consider the language

LEnglish Consonant Words = w ∈ T ∗English Lowercase Consonants | w is an English word

then we have a much smaller language. It is non-empty though, as for example,

rhythm ∈ LEnglish Consonant Words .

Welsh The lower-case alphabet for Welsh is

TWelsh Lowercase Alphabet

= a, b, c, ch, d, dd, e, f,ff, g, ng, h, i, l, ll,m, n, o, p, ph, r, rh, s, t, th, u,w, y.

Notice it does not have j, k, q, v, x or z. Considering the language

LWelsh Words = w ∈ T ∗Welsh Lowercase Alphabet | w is a Welsh word,

then, we find that the wordynys ∈ LWelsh Words ,

which means “island” in English. If we also add spaces to the alphabet of Welsh letters, wecan form the language

LWelsh Phrases = w ∈ (TWelsh Lowercase Alphabet ∪ )∗ | w is a Welsh word

of Welsh phrases. Then the string

gwyddor cyfrifiadur ∈ LWelsh Phrases

which means “computer science” in English.


Lower Case Upper Case Name Alpha Beta Gamma Delta Epsilon Zeta Eta Theta Iota Kappa Lambda Mu Nu Xi Omicron Pi ! Rho

σ or " # Sigma$ % Tau$ % Tau& ' Upsilon( ) Phi* + Khi/Chi, -Psi. / Omega

Figure 10.1: The upper-case (majiscule) alphabet of Attic Greek of the Classical Period (5th–4th century BC). Interestingly, the earliest lower-case (miniscule) alphabet of Greek is to befound in the Byzantine Period (8th century AD).

Ancient Greek The lower-case alphabet for ancient Greek is

TAncient Greek Lower Case Alphabet

= 10 0230 0 0 0 0 0450 0 0 0260 0770 07 80 σ 0 " 0 $ 0 & 0 ( 04*90 , 0 .

and the upper-case alphabet for ancient Greek is

TAncient Greek Upper Case Alphabet

= :0 07;0 0 0 0 0 02<0 0 0 04=0 0>?0 07!@0 # 0 % 0 ' 0 ) 02+?0 - 0 / .

(See Figure 10.1.) If we define the language

LAncient Greek Upper Case Phrases

= w ∈ (TAncient Greek Upper Case Alphabet ∪ )∗ | w is a phrase of ancient Greek.


Then, the phrase

/ # ' % ∈ LAncient Greek Upper Case Phrases .

which was one of the inscriptions around the door of the Temple of Apollo at Delphi. It means“know thyself”. It is written in upper-case as the lower-case Greek alphabet developed muchlater. The equivalent lower-case form of this phrase is:

1 ˆ. & $ ∈ LAncient Greek Lower Case Phrases

Speech Processing Restrictions of natural languages are also used for special announce-ments such as weather forecasts, train announcements at stations, telephone enquiries, etc. Forexample, consider the alphabetTShipping = Dover, Fisher, White, Portland, Plymouth, . . . ,

North, South, East, West, Northerly, Southerly, Easterly, Westerly,Northeast, Northwest, Southeast, Southwest,Northeasterly, Northwesterly, Southeasterly, Southwesterly,calm, light, fresh, strong, violent, gale, storm, force,increasing, decreasing, variable, veering, backing, cyclonic,low, lows, high, complex, rising, falling, moving, change,good, poor, moderate, haze, mist, fog, patches, smoke,precipitation, rain, showers, squally, wintry, sleet, snow, icing,slow, slowly, quickly, steadily, spreading, little,one, two, . . . , thousand, metre, metres, mile, miles, sight, centre,today, yesterday, tomorrow, now, then, hours, expected, soon, imminent,becoming, occasionally, mainly, possibly, otherwise, times,weather, forecast, shipping, issued, reports, warnings, coastal, stations,met., office, general, synopsis, area, forecasts,Monday, Tuesday, . . . , Sunday, January, February, . . . Decemberwith, within, in, and, or, but, for, at, to, of, the,, ,, ..

It is easy to imagine that this could be used to give the language

LShipping Weather Forecasts = w ∈ T ∗ | w is a “legal” shipping weather forecast

of shipping weather forecasts issued by the UK meteorological office.

10.1.4 Languages of Addresses

We examine some examples of languages that we can form by considering the concept of anaddress.

Postal Addresses Let

TPostal Address = a, b, c, d, e, f, g, h, i, j, k, l,m, n, o, p, q, r, s, t, u, v,w, x, y, z,A,B,C,D,E,F,G,H, I, J,K, L,M,N,O,P,Q,R, S,T,U,V,W,X,Y,Z

∪ 0, 1, . . . , 9, –, ’, ”, “, , !, ..


Let

LPostal Address = w ∈ T ∗Postal Address | w is a postal address in the United Kingdom.

Then both

Department of Computer ScienceUniversity of Wales SwanseaSingleton ParkSWANSEASA2 8PP

and, since Welsh is an official language in the UK,

Adran Gwyddor CyfrifiadurPrifysgol Cymru AbertaweParc SingletonABERTAWESA2 8PP

are strings in LPostal Address .

URLs Continuing the idea of addresses, uniform resource locators (URLs) of the internetprovide some interesting examples of languages.

LetT = a, b, . . . , z, A, B, . . . , Z, 0, 1, . . . , 9, -, , /, ., : .

LetL = w ∈ T ∗ | w is an hypertext transfer protocol address.

Now, simple addresses such as

http://www.w3.org http://www.swansea.ac.uk

are in L, but the alphabet needs extending to capture all hypertext transfer protocol addresses.More complex addresses, such as

http://www.google.com/search?q=swansea+university+computer+science

and

http://www.google.com/search?as q=&num=10&btnG=Google+Search&as oq=&as epq=Swansea+University&as eq=&as occt=title&lr=lang en&as dt=i&as sitesearch=&safe=off

are not in L, because, for example, + and ? are not in T .

10.1.5 Programming Language Examples

Now we consider some examples concerning data and programming languages.


Infix Numbers LetTInfix Arithmetic = 0, 1,+, ., (, ).

The expressions of type natural number are strings such as:

(0+1)+1 0+(1+1)0+1 1+0

0+((0+1)+1) (0+(0+1)).(0+1)

However,1+2 6∈ T ∗

Infix Arithmetic and x.1 6∈ T ∗Infix Arithmetic

since 2 6∈ TInfix Arithmetic and x 6∈ TInfix Arithmetic.

Prefix Numbers Let us consider some more forms of arithmetic expressions. Let

TPrefix Arithmetic = zero, succ, add,mult, , , (, ).

The expressions of type natural number are strings such as

succ(succ(zero)) add(zero,succ(succ(zero)))mult(add(zero,succ(zero)),succ(zero)) add(succn(zero),succm(zero)).

LetLPrefix Arithmetic = w ∈ T ∗

Prefix Arithmetic | w is an arithmetic expression.

Identifiers Let

TIdentifier = a, b, c, d, e, f, g, h, i, j, k, l,m, n, o, p, q, r, s, t, u, v,w, x, y, z,A,B,C,D,E,F,G,H, I, J,K, L,M,N,O,P,Q,R, S,T,U,V,W,X,Y,Z,0, 1, 2, 3, 4, 5, 6, 7, 8, 9, .

Identifiers are strings over this alphabet that start with a letter. Thus, examples are

x and Count 0

but2 6∈ LIdentifier , 1Count 6∈ LIdentifier and Flag 6∈ LIdentifier .

LetLIdentifier = w ∈ T ∗

Identifier | w is a program identifier.

Arithmetic Expressions with Variables Let

TArithmetic Expression = TInfix Arithmetic ∪ TIdentifier .

The infix arithmetic expressions of type natural number are strings such as

(x+0) and Count 0+1.

LetLArithmetic Expression = w ∈ T ∗

Arithmetic Expression | w is an arithmetic expression.


Interfaces Consider the signatures that name the data and operations in algebras. Eachsignature is a string. Let

TSignature = signature, sorts, constants, operations, endsig, ,, ;, :,×,→,a, b, . . . , z ,A,B , . . . ,Z , , 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9.

LetLSignature = w ∈ T ∗

Signature | w is a “legal” signature.Then the string

signature countsorts natconstants zero :→ natoperations successor : nat → natendsig

is in LSignature .

Imperative Programs Consider the while programs we want to define exactly. Each pro-gram is a string over an alphabet. Let

T = program, var, :,nat,bool,begin, end, skip, :=, ;, read,write,if, then, else,fi,while,do, od

+, -, *, /,mod,=, >,<, ,, (, ), true, false, not, and, or,0, 1, . . . , 9, a, b, . . . , z,A,B, . . . ,Z,

andL = w ∈ T ∗ | w is a “legal” while program.

10.1.6 Designing Syntax using Formal Languages

There are countless examples of languages, natural and artificial, large and small, complicatedand trivial. To specify a syntax for a language, we must think of

(i) a written representation of the words, phrases, formulae, programs, etc., of the language;and

(ii) the words, phrases, formulae, programs, etc., as strings over an appropriate alphabet.

Technically, the first step in specifying a language whose words, phrases, etc., form a set L issimply this:

Choose an alphabet T such that L ⊆ T ∗.

This is shown in Figure 10.2. The second step is to provide a good explanation of how L isformed.

An essential question for any language is this:

Definition (Recognition Problem) Let L ⊆ T ∗ be a formal language over T . The recogni-tion problem for L is: given any w ∈ T ∗ to decide whether or not

w ∈ L or w 6∈ L.

Along with an understanding of how L is formed, we need algorithms for solving the recognitionproblem for L.

10.2. GRAMMARS AND DERIVATIONS 367

T ∗

L

Figure 10.2: Designing a language.

10.2 Grammars and Derivations

The examples illustrate the idea that many kinds of syntax are formed by making strings ofsymbols and notations. But some of the formal languages we have seen in Section 10.1, suchas the set of English words, are difficult to define exactly. The problem is to find methods ofdefining formal languages. Grammars are a specification method for formal languages.

10.2.1 Grammars

A grammar is a mathematical idea designed to specify formal languages. Essentially, it is acollection of rules to generate the strings of a language. The rules of a grammar define how wecan form a string by means of step by step substitutions or rewrites. A grammar G has fourcomponents:

Definition (Grammar) A grammar G = (T ,N , S ,P) consists of:

(i) a finite set T called the alphabet of terminal symbols ;

(ii) a finite set N of non-terminal symbols, or variable symbols, with N ∩ T = ∅;

(iii) a special non-terminal symbol S ∈ N called the start symbol ; and

(iv) a finite set P of substitution or rewrite rules, or productions, each of which has the form:

u → v

where

• the left-hand string u ∈ (T ∪ N )+ is non-empty, and

• the right-hand string v ∈ (T ∪ N )∗.

Note that both the left- and right-hand strings can contain terminals and/or non-terminals.

We present a grammar as a 4 -tuple

G = (T ,N , S ,P)

and also use a displayed version, particularly for examples:


grammar G

alphabet T

nonterminals N

start S

rules P

The purpose of a grammar G is to define a formal language

L(G) ⊆ T ∗

over the alphabet T by applying the production rules of P beginning with start symbol S .A sequence of applications of the production rules is called a derivation. The non-terminals

in the rules permit substitutions in strings. When all non-terminals have been eliminated, thenthe derivation ends and we are left with a string of symbols from the alphabet. Hence thealternate word “terminal” for elements of the alphabet: terminals remain when a derivationterminates and all the non-terminals have been removed. We will explain how this works ingeneral shortly, but first we present the formal definition, then we will look at some examples.

10.2.2 Examples of Grammars and Strings

1. Consider a grammar: G0∗1∗ = (0, 1, S, S , S → 1, S → 0S , S → 1S). This isdisplayed as:

grammar G0∗1∗

terminals 0, 1

nonterminals S

start symbol S

productions S → 1 S → 0S S → 1S

We show how the rules generate strings such as:

1001 and 011.

First, we number the production rules for ease of reference:

Rule 1. S → 1

Rule 2. S → 0S

Rule 3. S → 1S

Then, applying Rule 3 to the start symbol S , we get:

S → 1S


Applying Rule 2 twice to S:

→ 10S

→ 100S

Finally, applying Rule 3 to S :→ 1001

Another alternative is to apply Rule 2 to the start symbol S :

S → 0S

Then to apply Rule 3 to S :S → 01S

And finally to apply Rule 1 to S :S → 011

So a derivation begins with S and proceeds by applying the rules in any order, for anynumber of times; a derivation is a path in the tree illustrated in Figure 10.3.

101 100S

10S

101S

11S

111 110S 111S011 010S 011S

01S00S

000S001 001S

11

1S

01

0S1

S

Figure 10.3: Possible derivations from the grammar G0∗1∗

.

What are the strings over 0, 1 that these production rules generate? The rules generatethe set of all strings over 0 and 1 which end in a 1:

L(G0∗1∗) = w1 | w ∈ 0, 1∗.

2. Let Ganbn

= (a, b, S, S , S → ab, S → aSb). This is displayed as:

grammar Ganbn

terminals a, b

nonterminals S

start symbol S

productions S → ab S → aSb


aSb

aaSbb

S

aaaSbbbaaabbb

aabb

ab

Figure 10.4: Possible derivations from the grammar Ganbn

.

Here derivations proceed as a path in the tree in Figure 10.4.

What are the strings over a, b that these rules generate? The productions generate theset

L(Ganbn

) = anbn | n ≥ 1of strings.

3. Let Ga2n

= (a, S, S , S → ε, S → aSa). This is displayed as:

grammar Ga2n

terminals a

nonterminals S

start symbol S

productions S → ε S → aSa

Here derivations begin with S and proceeds as a path in the tree in Figure 10.5. Notice

S

aSaε

aaSaaaa

Figure 10.5: Possible derivations from the grammar Ga2n

.

that we suppress any empty string ε generated as part of a non-empty string; this is aconvention which is observed for clarity as concatenating the empty string ε has no effect.

Thus, the productions generate the set

L(Ga2n

) = a2n | n ≥ 1

of strings over a.


10.2.3 Derivations

Now let us consider the substitution process that make up derivations. First, we examine asingle derivation in which we substitute the string v for the string u using a single productionrule u → v . Then we examine a sequence of such substitutions.

Definition (One-Step Derivation) Given a grammar G = (T ,N , S ,P), we derive a (possi-bly empty) word w ′ ∈ (T ∪ N )∗ from a non-empty word w ∈ (T ∪ N )+:

w ⇒G w ′

in one step, or, equivalently, we have a one-step derivation, if, and only if, there is a productionu → v ∈ P and we can write

w = sut and w ′ = svt

where s , t ∈ (T ∪N)∗ are possibly empty strings.

We also use the phrases immediately generated or one-step generated in place of one-stepderived. And we simply write

w ⇒ w ′

in place of w ⇒G w ′, when there is no danger of ambiguity arising.We want to consider the effect of performing a number of derivations. We define a relation

w ⇒∗ w ′,

to say that a word w ′ can be produced from a word w by a sequence of derivations:

Definition (Derivation) Given a grammar G = (T ,N , S ,P), we derive a (possibly empty)word w ′ ∈ (T ∪ N )∗ from a non-empty word w ∈ (T ∪ N )+:

w ⇒∗G w ′

if, either

(i) w = w ′, or

(ii) there is a sequence of non-empty words w0 ,w1 , . . . ,wn−1 ∈ (T ∪N )+ and a final (possiblyempty) word wn ∈ (T ∪ N )∗, such that

(a) w0 = w ;

(b) for 1 ≤ i ≤ n − 1 , wi ⇒G wi+1 ; and

(c) wn = w ′.

Again, we shall also use the terminology that the string w generates the string w ′ or alter-natively that w ′ is derived from w , and we simply write

w ⇒∗ w ′

in place of w ⇒∗G w ′, when there is no danger of ambiguity arising.

Thus, ⇒ and ⇒∗ are relations on strings. In particular, it tells us how we can rewrite anon-empty string to some string, provided that there is one or a sequence of production rulesthat we can apply, respectively.


So, more formally, we define relations

⇒G ⊆ (T ∪ N )+ × (T ∪ N )∗ and ⇒∗G ⊆ (T ∪ N )+ × (T ∪ N )∗

with respect to a grammar G = (T ,N , S ,P). The relation ⇒∗G is the transitive closure of the

relation ⇒G .

10.2.4 Language Generation

We now have all the components that we need to explain how we generate a language from agrammar. Recall the form of a typical grammar:

grammar G

terminals T

nonterminals N

start symbol S

productions P

We generate a language L(G) from a grammar G by considering all the possible stringsw ∈ T ∗ of terminal symbols that we can generate from the start symbol S. In such derivationswe may make use of non-terminal symbols in the substitutions that arise from the use ofproduction rules.

Definition (Language) Let G = (T ,N , S ,P) be a grammar. The language L(G) ⊆ T ∗

generated by the grammar G is defined by

L(G) = w ∈ T ∗ | S ⇒∗G w.

Of course, there is not a single way of defining a language: if we can find one grammar todefine a language, there will be infinitely many other grammars to define it.

Definition (Grammar Equivalence) We will say that two grammars G1 and G2 are equiv-alent if, and only if,

L(G1 ) = L(G2 ).

Of these different grammars for describing how we can construct a language, some will leadto clearer descriptions, whilst others may lead to more efficient implementations. We shall seemore of this idea in later chapters.

Example Consider the grammar Ganbn

of Examples 10.2.2(ii). A typical derivation is:

S ⇒ aSb ⇒ aaSbb ⇒ · · · ⇒ an−1Sbn−1 ⇒ anbn .

This involved the use of n one-step derivations: n − 1 applications of production rule 2 andone application of rule 1 to eliminate the non-terminal symbol S . This demonstrates that

S ⇒∗G anbn

10.3. SPECIFYING SYNTAX USING GRAMMARS: MODULARITY AND BNF NOTATION373

and hence,anbn | n ≥ 1 ⊆ L(Gab).

It remains to show that L(Ganbn

) ⊆ anbn | n ≥ 1 in order to prove

L(Ganbn

) = anbn | n ≥ 1.

10.2.5 Designing Syntax using Grammars

We may now expand on our preliminary remarks in Section 10.1.6. The procedure for specifyingthe syntax of a language involves three steps and is roughly as follows.

Let L be some language in need of a precise specification.

Step 1 First, one chooses an alphabet T such that

L ⊆ T ∗.

Step 2 Secondly, one creates a simple grammar G with alphabet T such that

L ⊆ L(G) ⊆ T ∗

as shown in Figure 10.6. A grammar should give a good explanation of how the language is

L

L(G)

T ∗

Figure 10.6: Specifying a language using a grammar.

formed.

Step 3 Thirdly, one addresses the problem of removing the undesirable strings in the set

L(G)− L

by refining the rules of the grammar G , or by other means. The problem of unwanted stringsis common in language design in practice.

10.3 Specifying Syntax using Grammars: Modularity

and BNF Notation

In this section we introduce two major ways of improving the readability of grammars. Thefirst method is the use of modularity, a powerful and unexpectedly complicated idea that wehave already seen in action in specifying data and interfaces with algebras and signatures. Thesecond method is simply a question of presentation, and leads just to an alternative notation.


10.3.1 A Simple Modular Grammar for a Programming Language

We will show how the concept of a modular grammar allows us to build definitions for a simpleprogramming language from definitions of its components. The main components of a while

programming language for computing with natural numbers are:

(i) identifiers;

(ii) natural numbers;

(iii) arithmetic expressions;

(iv) Boolean expressions; and

(v) programs.

We will give a series of grammars that defines each component in turn and we will put themtogether to form a grammar for the language.

We start with a grammar G Identifier for generating identifiers:

grammar G Identifier

alphabet a, b, . . . , z,A,B, . . . ,Z,

nonterminals Letter , Id

start Id

rules Id → LetterId → Letter IdLetter → aLetter → b

...Letter → zLetter → ALetter → B

...Letter → Z

Now we give a grammar GNumber to generate numbers (in decimal notation):


grammar GNumber

alphabet 0, 1, . . . , 9

nonterminals Number ,Digit

start Number

rules Number → DigitNumber → Digit NumberDigit → 0Digit → 1

...Digit → 9

To make arithmetic expressions we need both identifiers and numbers. Next, we constructa grammar GArithmetic Expression for generating arithmetic expressions from the two previousgrammars.

grammar GArithmetic Expression

import G Identifier ,GNumber

alphabet +, -

nonterminals AExp,AOp

start AExp

rules AExp → IdAExp → NumberAExp → AExp AOp AExpAOp → +AOp → -

The alphabets, start symbols and rules of the grammars G Identifier and GNumber in the gram-mar GArithmetic Expression for arithmetic expressions are included by means of the new constructimport. Essentially this import notation names one or more grammars to be combined andextended. We will explain the meaning of import shortly.

To make Boolean expressions we need arithmetic expressions. We define a grammar GBoolean Expression

for generating Boolean expressions by:


grammar GBoolean Expression

import GArithmetic Expression

alphabet true, false, not, and, or,=, <

nonterminals BExp,BOp1 ,BOp2 ,RelOp

start BExp

rules BExp → BOp1 BExpBExp → BExp BOp2 BExpBExp → AExp RelOp AExpBExp → trueBExp → false

BOp1 → notBOp2 → andBOp2 → or

RelOp → =RelOp → <

The alphabet, start symbol and production rules of the grammar GArithmetic Expression are in-cluded using import.

Finally, we define a grammar Gwhile for generating while programs, using the grammars wedefined for generating arithmetic and Boolean expressions, by:

grammar Gwhile

import GArithmetic Expression ,GBoolean Expression

alphabet skip, if, then, else,fi,while,do, od, :=, ;

nonterminals Program

start Program

rules Program → skip

Program → Id := AExpProgram → Program ; ProgramProgram → if BExp then Program else Program fiProgram → while BExp do Program od

We built the grammar Gwhile for programs from grammars for its subsidiary componentsarithmetic expressions and Boolean expressions. The dependencies are illustrated in Fig-ure 10.7. The picture suggests a modular structure or architecture for the grammar Gwhile

and hence the language. The grammar Gwhile combines and extends all the grammars in thetree. Putting all the alphabets and rules together, we obtain a grammar for the programminglanguage.

Actually, importing the grammar of arithmetic expressions into the grammar for while

programs is redundant since it is already a part of the grammar for Boolean expressions.


GNumberGIdentifier

GArithmetic Expression

Gwhile

GBoolean Expression

Figure 10.7: Component grammars used to construct the grammar Gwhile .

10.3.2 The Import Construct and Modular Grammars

The grammars are combined via the import construct. Let us define what it means in stages.First, we take a simple general case where one grammar imports another.

Definition (Importing Grammars) Let H be a grammar

grammar H

import G

alphabet B

nonterminals M

start R

rules Q

that imports the grammar G :

grammar G

alphabet A

nonterminals N

start S

rules P

Then H denotes the grammar F that is the union of the alphabets, non-terminals and rules ofG with those in H . This grammar F we call the flattened form of H . The flattened form F ofH is defined to be:


grammar Flattened H

alphabet A ∪ B

nonterminals N ∪M

start R

rules Q ∪ P

This simple case is enough to convey the idea of modularity and explain the notational looseends of the example in Section 10.3.1. However, technically, it is superficial and it is necessaryto put the idea of modularity on a firmer foundation. What we have done is proposed a newand more general form of grammar called a

modular grammar,

which extends a grammar of the form

G = (T ,N , S ,P)

with a set I of imported grammars to form

G = (I ,T ,N , S ,P).

There are several ways to define the concept more precisely.

10.3.3 User Friendly Grammars and BNF Notation

Programming languages are rarely simple and the grammars used to define them are rarelysmall. The concise mathematical notation that is suitable for analysis needs to be exchangedfor a notation that is suitable for writing and reading large grammars. We will use this notationfor all our large examples.

Backus-Naur Form (BNF)

A popular method of presenting a special kind of grammar used for defining a programminglanguage is that of

Backus-Naur Form or BNF

which is based on the following conventions.

1. The terminal symbols of the grammar are often written in bold font.

2. The non-terminal symbols of the grammar are familiar terms for syntactic componentsenclosed in angle brackets, e.g., 〈statement〉, 〈expression〉, 〈identifier〉.

3. The start symbol is the non-terminal that is presented first.


4. The symbol ::= replaces and extends → by listing the productions possible for a non-terminal; alternative possibilities for the right-hand sides of a particular production areseparated with the symbol | .

For example, we abbreviate the five rules:

〈BExp〉 ::= true〈BExp〉 ::= false〈BExp〉 ::= 〈BOp1〉〈BExp〉〈BExp〉 ::= 〈BExp〉〈BOp2〉〈BExp〉〈BExp〉 ::= 〈AExp〉〈RelOp〉〈AExp〉

by the “composite” rule:

〈BExp〉 true | false |::= 〈BOp1〉〈BExp〉 | 〈BExp〉〈BOp2〉〈BExp〉 |〈AExp〉〈RelOp〉〈AExp〉

Elementary Examples

We shall find use for some basic grammars in many of the grammars that we consider.The first gives us the letters of the English alphabet:

bnf Letter

rules〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t |

u | v | w | x | y | z〈UpperCase〉 ::= A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q |

R | S | T | U | V | W | X | Y | Z

And the second gives us numbers and sequences of digits: numbers differ from sequences ofdigits by disallowing leading zeroes.

bnf Number

rules〈Number〉 ::= 〈Digit〉 | 〈NonZeroDigit〉〈Digits〉〈Digit〉 ::= 0 | 〈NonZeroDigit〉〈NonZeroDigit〉 ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9〈Digits〉 ::= 〈Digit〉 | 〈Digit〉〈Digits〉

Extended BNF

A common variation of BNF is

extended BNF or EBNF


which adds to BNF the following conventions.

5. An optional occurrence of some portion of a production rule choice is enclosed in squarebrackets [ ]. Thus, [u] means zero or one occurrence of u where u ∈ (T ∪ N )+ is somestring of terminals or non-terminals.

6. An arbitrary number of occurrences of some portion of a production rule choice is enclosedin braces . Thus, u means zero or more occurrences of u where u ∈ (T ∪ N )+ issome string of terminals or non-terminals.

For example, the rule

〈Digits〉 ::=〈Digit〉 | 〈Digit〉〈Digits〉in the BNF Number which generates all possible strings of digits, could be expressed in EBNFas:

〈Digits〉 ::=〈Digit〉 [ 〈Digits〉 ]

Thus, we have collapsed the two separate rules into one EBNF rule that says 〈Digits〉 is a〈Digit〉 followed by zero or one 〈Digits〉.

Alternatively, we could express this same rule

〈Digits〉 ::=〈Digit〉 | 〈Digit〉〈Digits〉in EBNF as:

〈Digits〉 ::=〈Digit〉〈Digit〉 This time we have collapsed the two separate rules into one EBNF rule that says 〈Digits〉 is a〈Digit〉 followed by zero or more 〈Digit〉.

We can use this second alternative, to produce a stream-lined EBNF for numbers thatremoves all reference to 〈Digits〉:

ebnf Number

rules〈Number〉 ::= 〈Digit〉 |

〈NonZeroDigit〉〈Digit〉〈Digit〉 ::= 0 | 〈NonZeroDigit〉〈NonZeroDigit〉 ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Here we have succinctly defined a number to be either

• a single digit, or

• a non-zero digit followed by an arbitrary number of single digits.

EBNF is a notational extension of BNF, designed for convenience. It does not extend theexpressive power of BNF.

Lemma (EBNF has same power as BNF) Let L ⊆ T ∗ be a language. L is definable inEBNF if, and only if, it is definable in BNF.

Proof Exercise. 2

Extended BNF is widely used in giving the large grammars needed for large languages; seeExercise 13.

10.4. WHAT IS AN ADDRESS? 381

10.4 What is an Address?

Addresses are everywhere in computing and communication. Obvious high-level examples are:

(i) variable names for memory locations;

(ii) file names; and

(iii) universal resource locators.

Obvious low-level examples are

(i) IP addresses for computers;

(ii) numbers and other codes for mobile phones; and

(iii) bar-codes for goods.

Addresses are often defined to be inputs to protocols such as HTTP. The idea of an addresshas enormous scope and, hence, is complex. Addresses provide excellent examples of syntax inneed of formal definition by grammars.

10.4.1 Postal Addresses

Let us examine how we can construct British postal addresses for mail that is posted withinthe United Kingdom.

Strings versus Displays

Consider the address:

Department of Computer Science, University of Wales Swansea, Singleton Park, Swansea,SA2 8PP

This string is a postal address in the United Kingdom, (as discussed in Example 2 of Sec-tion 10.1.4).

When addressing an envelope, we could display this string as:

Department of Computer Science,University of Wales Swansea,Singleton Park,Swansea,SA2 8PP

or:

Department of Computer Science,University of Wales, Swansea,

Singleton Park,Swansea, SA2 8PP


Furthermore, in official recommended practice the commas are removed, the postal towncapitalised, all the lines are aligned with each other, and each syntactic component is on aseparate line, as follows:

Department of Computer ScienceUniversity of Wales SwanseaSingleton ParkSWANSEASA2 8PP

In some ways, the display of the address is a part of the address. As we break down theproperties of addresses we will build up a modular grammar to define them.

UK Postal Addresses

A UK postal address can normally be categorised as

(i) a rural address;

(ii) a town address; or

(iii) a business or organisation address.

We will devise grammars to construct addresses of each type so that we can define:

bnf PostalAddresses

import RuralAddress ,TownAddress ,CorporateAddress

rules〈Address〉 ::= 〈RuralAddress〉 | 〈TownAddress〉 | 〈CorporateAddress〉

A picture of the complete construction is shown in Figure 10.8. First, we consider elements

Postcode

Letter

Postal Address

Rural Address Town Address Corporate Address

Newline

Number Phrase

Building

Figure 10.8: Structure of addresses.

that are common to all three types of address, namely:


(i) postcodes;

(ii) phrases; and

(iii) line-breaks.

Postcodes

In the UK, the combination of a postcode and a house name or number, uniquely identifies anypostal address. Currently, a postcode has one of three forms, as described in the rules below.Thus, a postcode may have seven or eight characters, with a separating space before the finalthree characters. An example of a postcode is:

SA2 8PP

bnf Postcode

import Letter ,Number

rules〈Postcode〉 ::= 〈Letter〉〈Letter〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉 |

〈Letter〉〈Digit〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉 |〈Letter〉〈Letter〉〈Digit〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉

〈Space〉 ::=

Note that the non-terminal 〈Space〉 derives the single space character.

Phrases

In order to write an address, we shall want to form strings of characters. Here, we shalldefine a phrase as being formed either entirely from letters, or a mixture of letters and certainpunctuation marks. For example, we want to be able to have phrases such as the following:

John O’Groats Westward Ho! Weston-super-mare “Suncrest”

We will also want to have a restricted form of such phrases whereby any letters used are incapitals.

bnf Phrase

import Letter

rules〈Phrase〉 ::= 〈Letter〉 | 〈Letter〉〈Phrase〉 | 〈Punctuation〉〈Letter〉〈Phrase〉 |

〈Phrase〉〈Punctuation〉〈UpperCasePhrase〉 ::= 〈UpperCase〉 | 〈UpperCase〉〈UpperCasePhrase〉 |

〈Punctuation〉〈UpperCase〉〈UpperCasePhrase〉 |〈UpperCasePhrase〉〈Punctuation〉

〈Punctuation〉 ::= - | ’ | “ | ” | | ! | , | .


We shall have newline characters as a separate type of punctuation:

bnf Newline

rules〈Newline〉 ::=

Rural Addresses

Properties in the UK which are in very rural locations typically are identified with a housename, the nearest village and the postal town which sorts the mail for the surrounding district.

bnf RuralAddress

import Postcode,Phrase,Newline

rules〈RuralAddress〉 ::= 〈BuildingName〉〈Newline〉〈Village〉〈Newline〉

〈PostalTown〉〈Newline〉〈Postcode〉〈BuildingName〉 ::= 〈Phrase〉〈Village〉 ::= 〈Phrase〉〈PostalTown〉 ::= 〈UpperCasePhrase〉

Town Addresses

In less rural locations, single-residence property identifiers may be a house name, number, rangeof numbers, or combination of name and number(s). Multi-residence properties, such as flatsare typically identified by a room number and the name of the property.

bnf Building

import Phrase,Number

rules〈Building〉 ::= 〈Number〉 |

〈Number〉 – 〈Number〉 |〈Number〉〈Letter〉 |〈Number〉〈BuildingName〉 |〈Number〉〈Letter〉〈BuildingName〉 |〈Number〉 – 〈Number〉〈BuildingName〉 |〈BuildingName〉

〈BuildingName〉 ::= 〈Phrase〉

In addition, these less rural properties are typically located on some named road, withina district of a town; properties that are located in the centre of a town frequently omit thedistrict.


bnf TownAddress

import Postcode,Building ,Newline

rules〈TownAddress〉 ::= 〈Building〉〈Road〉〈Newline〉〈District〉〈Newline〉

〈PostalTown〉〈Newline〉〈Postcode〉 |〈Building〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉

〈Road〉 ::= 〈Phrase〉〈District〉 ::= 〈Phrase〉〈PostalTown〉 ::= 〈UpperCasePhrase〉

Corporate Addresses

Corporate addresses, are typically addressed by department, organisation, building, road andpostal town. However, the omission of department or building are frequent variations.

bnf CorporateAddress

import Postcode,Building ,Newline

rules〈CorporateAddress〉 ::= 〈Department〉〈Newline〉〈Organisation〉〈Newline〉

〈Building〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉 |

〈Department〉〈Newline〉〈Organisation〉〈Newline〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉 |

〈Organisation〉〈Newline〉〈Building〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉 |

〈Organisation〉〈Newline〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉

〈Department〉 ::= 〈Phrase〉〈Organisation〉 ::= 〈Phrase〉〈PostalTown〉 ::= 〈UpperCasePhrase〉

Flattened Grammar

Now, if we flatten all the components out into a single entity, we get:


bnf Flattened Postal Addresses

rules〈Address〉 ::= 〈RuralAddress〉 | 〈TownAddress〉 | 〈CorporateAddress〉〈RuralAddress〉 ::= 〈BuildingName〉〈Newline〉〈Village〉〈Newline〉

〈PostalTown〉〈Newline〉〈Postcode〉〈TownAddress〉 ::= 〈Building〉〈Road〉〈Newline〉〈District〉〈Newline〉

〈PostalTown〉〈Newline〉〈Postcode〉 |〈Building〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉

〈CorporateAddress〉 ::= 〈Department〉〈Newline〉〈Organisation〉〈Newline〉〈Building〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉 |

〈Department〉〈Newline〉〈Organisation〉〈Newline〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉 |

〈Organisation〉〈Newline〉〈Building〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉 |

〈Organisation〉〈Newline〉〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉

〈Village〉 ::= 〈Phrase〉〈Road〉 ::= 〈Phrase〉〈District〉 ::= 〈Phrase〉〈Department〉 ::= 〈Phrase〉〈Organisation〉 ::= 〈Phrase〉〈Building〉 ::= 〈Number〉 |

〈Number〉 – 〈Number〉 |〈Number〉〈Letter〉 |〈Number〉〈BuildingName〉 |〈Number〉〈Letter〉〈BuildingName〉 |〈Number〉 – 〈Number〉〈BuildingName〉 |〈BuildingName〉

〈BuildingName〉 ::= 〈Phrase〉〈PostalTown〉 ::= 〈UpperCasePhrase〉〈Postcode〉 ::= 〈Letter〉〈Letter〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉 |

〈Letter〉〈Digit〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉 |〈Letter〉〈Letter〉〈Digit〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉


bnf Flattened Postal Addresses (continued)

rules〈Phrase〉 ::= 〈Letter〉 | 〈Letter〉〈Phrase〉 |

〈Punctuation〉〈Letter〉〈Phrase〉 |〈Phrase〉〈Punctuation〉

〈UpperCasePhrase〉 ::= 〈UpperCase〉 | 〈UpperCase〉〈UpperCasePhrase〉 |〈Punctuation〉〈UpperCase〉〈UpperCasePhrase〉 |〈UpperCasePhrase〉〈Punctuation〉

〈Punctuation〉 ::= - | ’ | “ | ” | | ! | , | .〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s |

t | u | v | w | x | y | z〈UpperCase〉 ::= A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P |

Q | R | S | T | U | V | W | X | Y | Z〈Space〉 ::=

〈Newline〉 ::=

〈Number〉 ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 〈Number〉〈Digits〉〈Digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9〈Digits〉 ::= 〈Digit〉 | 〈Digit〉〈Digits〉

Derivations

Using these rules we can derive the postcode SA2 8PP by the following sequence of one-stepreductions:

〈Postcode〉 ⇒ 〈Letter〉〈Letter〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉⇒ S 〈Letter〉〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉⇒ SA 〈Digit〉〈Space〉〈Digit〉〈Letter〉〈Letter〉⇒ SA2 〈Space〉〈Digit〉〈Letter〉〈Letter〉⇒ SA2 〈Digit〉〈Letter〉〈Letter〉⇒ SA2 8 〈Letter〉〈Letter〉⇒ SA2 8P 〈Letter〉⇒ SA2 8PP

We can use this derivation in deriving a whole address. For example,

〈Address〉 ⇒ 〈CorporateAddress〉⇒ 〈Department〉〈Newline〉〈Organisation〉〈Newline〉

〈Road〉〈Newline〉〈PostalTown〉〈Newline〉〈Postcode〉⇒∗ 〈Department〉〈Organisation〉


〈Road〉〈PostalTown〉〈Postcode〉

⇒∗ Department of Computer ScienceUniversity of Wales SwanseaSingleton ParkSWANSEASA2 8PP

Non-Standard Formats

As we have seen, these rules allow us to generate an address, but they do restrict the formatto the official postal template, which may or may not be desirable.

The precise details of a postal address matter, unfortunately. Clearly, different countrieshave different formats. Indeed, layouts can be dictated by the manufacturers of automaticsorting machines. For example, the once elegant

Department of Computer Science,University of Wales, Swansea,

Singleton Park,Swansea, SA2 8PP,

Wales

with the use of punctuation marks as information separators is no longer in fashion.

10.4.2 World Wide Web Addresses

World Wide Web pages are identified on the internet by an address that is used by the Hyper-text Transfer Protocol (HTTP). One of the strengths of HTTP (as opposed to other internetprotocols such as FTP) is that hyperlinks allow addresses to be abstracted away from systemstowards the user. Addresses, though, are the means of accessing web pages, either indirectlyby following a hyperlink, or directly by entering the text.

There are three kinds of web address that are familiar to users. A typical home page of abusiness or organisation is constructed by:

http://www.apublisher.co.uk︸︷︷︸

host

This is the name of the store or server on which the file is stored. There are of course, normallymany web pages located on a particular host. These files are again distinguished from eachother by names. The storage (and hence retrieval) of these files may be organised by directories.Thus, this gives rise to another form of web address:


host

/books︸︷︷︸

path


and:http://www.apublisher.co.uk

︸︷︷︸

host

/books/computing.html︸︷︷︸

path

The third type of web address is one designed to allow data to be passed back and forthfrom the user to the server. This is used for example in searching for information or form filling,particularly when performed over a series of different web pages:


host

/search?data+syntax+semantics︸︷︷︸

path

Again, as we break down the structure of a web address, we will build up a modular grammarto define them.

Thus, in defining a grammar for HTTP addresses, we need to combine descriptions of hosts,paths and query strings:

bnf HTTP Addresses

import Host ,Path,QueryString

rules〈HTTP Address〉 ::= http:// 〈Host〉 |

http:// 〈Host〉 / 〈Path〉 |http:// 〈Host〉 / 〈Path〉 ? 〈QueryString〉

A picture of the full construction is shown in Figure 10.9.

Path

String

DigitLetter

HTTP Addresses

Host Query String

Figure 10.9: Structure of http addresses.

Hosts

Hosts are usually identified by a name, though there are other means. When a host addressis given as a name, it is resolved to a numerical address known as an Internet Protocol (IP)address, by the internet domain name service. In fact, an alternative means of identifying ahost is to enter the IP address of the host directly.

The retrieval of web pages is just one service that may be provided by a host. Therefore,when requesting a web page, the host needs to be informed that this is the service required.This is achieved through the use of port numbers. Web services are associated with a particular


default port number, so it does not need to be explicitly stated. However, in circumstanceswhere a web service is using a non-standard port number, it needs to be included as part ofthe address; see, for example, Comer [1995] for further details.

bnf Host

import String

rules〈Host〉 ::= 〈HostName〉 | 〈HostName〉 : 〈Port〉 |

〈IPAddress〉 | 〈IPAddress〉 : 〈Port〉〈HostName〉 ::= 〈Letter〉〈String〉 |

〈Letter〉〈String〉 . 〈HostName〉〈IPAddress〉 ::= 〈Digits〉 . 〈Digits〉 . 〈Digits〉 . 〈Digits〉〈Port〉 ::= 〈Number〉

Paths

A path determines the location of a file on a host.

bnf Path

import String

rules〈Path〉 ::= 〈Segment〉 |

〈Segment〉 / |〈Segment〉 / 〈Path〉

〈Segment〉 ::= 〈String〉

If the path given determines a directory on the host, rather than a specific file, then adefault file for that directory is returned. In fact corporate home page addresses are simply analternative for specifying the default file located on the host. For example, an address

http://www.apublisher.co.uk/books

actually locates the file:

http://www.apublisher.co.uk/books/index.html

or

http://www.apublisher.co.uk/books/default.htm

depending on the server used.


Data Transfer

A web page is simply a file that is located on a server. However, some interactivity betweena user and a site can be created by the host creating files in response to a user’s actions. Forexample, a host may create a one-off temporary file that is removed from the host, or mayassemble a response “on-the-fly” in reply to a user query.

For example, suppose we want to locate web pages containing the words “data”, “syntax”,and “semantics”. Using a search engine located at http://www.apublisher.co.uk, the act ofsubmitting a query may be implemented by the browser sending the string formed by appending

search?q=data+syntax+semantics

to

http://www.apublisher.co.uk.

When the server receives this string, it executes the program search on the input parametersdata, syntax and semantics. The output of the program, in the form of a list of hypertextreferences, is then returned directly to the user.

bnf QueryString

import String

rules〈QueryString〉 ::= 〈String〉 | 〈String〉 + 〈QueryString〉

Strings

Finally, we need to define the text that can be written in an address. An address is composedof a string of symbols which can be letters, digits or certain punctuation marks. Some symbols,such as spaces and letters from non-English alphabets cannot appear directly in a string, butinstead are encoded numerically in hexadecimal notation. For example, a file

Data Syntax and Semantics

located on a server http://www-compsci.swan.ac.uk is addressed as

http://www-compsci.swan.ac.uk/Data%20Syntax%20and%20Semantics

where % indicates that the next number is to be interpreted as the hexadecimal notation of acharacter, and 20 is the hexadecimal notation for the space character.

bnf String

import Letter ,Digit

rules〈String〉 ::= 〈Letter〉 | 〈Digit〉 | 〈Punctuation〉 |

〈Letter〉〈String〉 | 〈Digit〉〈String〉 |〈Punctuation〉〈String〉

〈Punctuation〉 ::= $ | - | | | . | & | + | - | ! | * | ” | ’ | ( | ) | , | % 〈Hex〉〈Hex〉〈Hex〉 ::= 〈Digit〉 | a | b | c | d | e | f | A | B | C | D | E | F


Flattened Grammar for HTTP addresses

If we take the individual component grammars that we constructed for HTTP addresses andflatten them into a single entity, we get:

bnf Flattened HTTP Addresses

rules〈HTTP Address〉 ::= http:// 〈Host〉 |

http:// 〈Host〉 / 〈Path〉 |http:// 〈Host〉 / 〈Path〉 ? 〈QueryString〉

〈Host〉 ::= 〈HostName〉 | 〈HostName〉 : 〈Port〉 |〈IPAddress〉 |〈IPAddress〉 : 〈Port〉

〈HostName〉 ::= 〈Letter〉〈String〉 |〈Letter〉〈String〉 . 〈HostName〉

〈IPAddress〉 ::= 〈Digits〉 . 〈Digits〉 . 〈Digits〉 . 〈Digits〉〈Port〉 ::= 〈Number〉〈Path〉 ::= 〈Segment〉 |

〈Segment〉 / |〈Segment〉 / 〈Path〉

〈Segment〉 ::= 〈String〉〈QueryString〉 ::= 〈String〉 | 〈String〉 + 〈QueryString〉〈String〉 ::= 〈Letter〉 | 〈Digit〉 | 〈Punctuation〉 |

〈Letter〉〈String〉 | 〈Digit〉〈String〉 |〈Punctuation〉〈String〉

〈Punctuation〉 ::= $ | - | | | . | & | + | - | ! | * | ” | ’ | ( | ) | , | % 〈Hex〉〈Hex〉〈Hex〉 ::= 〈Digit〉 | a | b | c | d | e | f | A | B | C | D | E | F〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s |

t | u | v | w | x | y | z〈UpperCase〉 ::= A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P |

Q | R | S | T | U | V | W | X | Y | Z〈Number〉 ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 〈Number〉〈Digits〉〈Digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9〈Digits〉 ::= 〈Digit〉 | 〈Digit〉〈Digits〉



1. Show that the setw ∈ a, b∗ | |w | ≤ n

has 2 n+1 elements. How many strings of length 8 , 16 and 32 can be made from twosymbols.

2. Give a formula for the number of words of length at most n with an alphabet with melements.

3. For each of the following languages L find an alphabet T such that L ⊆ T ∗:

(a) L is the set of all Swedish words.

(b) L is the set of all file names in an operating system of your choice.

(c) L is the set of all IP addresses.

4. Write down some examples of the binary numbers produced by the grammar of Exam-ples 10.2.2(1 ). What kind of binary numbers are they?

5. Give a grammar to define the language

pleasenthank-youn | n ≥ 1.

6. Give a grammar to define the language L = anbn+1 | n ≥ 1.

7. Using the grammar Ganbn

of Examples 10.2.2(2 ), prove that L(Ganbn

) = anbn | n ≥ 1.

8. Write out the grammar GArithmetic Expression of Section 10.3.1 using the display format butwithout importing any other grammars.

9. Write out the full flattened form of the grammar Gwhile in Section 10.3.1. How manynon-terminals, or variables, and production rules are there in the grammar?

10. Derive the following program:

begin

a := 1end

11. Derive the following program:

begin

a := 1;while not a = 0 do

a := a + 1od

end

How many steps are involved in the derivation?


12. Prove the lemma in Section 10.3.3 which states that EBNF has the same power as BNF.

13. Find grammars written in EBNF that define some programming languages and estimatethe number of rewrite rules used. Construct a modular grammar for a simpler subset ofthe language.

For example, choose an early language like Algol 60 or a contemporary language like Java.

14. Let G1 = (T1 ,N1 , S1 ,P1 ) and G2 = (T2 ,N2 , S2 ,P2 ) be two grammars. Define G2 to bea subgrammar of G1 if

T2 ⊆ T1 , N2 ⊆ N1 , and P2 ⊆ P1 .

Show that if G2 is a subgrammar of G1 then

L(G2 ) ⊆ L(G1 ).

15. HTTP is but one form of internet protocol which uses addresses that are examples ofUniform Resource Locators (URLs). Other examples of protocols that use URLs areFile Transfer Protocols (FTP) and the Gopher Protocol. The World Wide Web (W3)Consortium publish a formal definition of URLs, using an extended form of BNF, forvarious protocols at

http://www.w3.org.

Develop a modular grammar for URLs.

16. Develop a modular grammar for file names and paths for an operating system of yourchoice, such as UNIX, MacOS or Windows.



In our chapter on the history of programming languages (Chapter 2) we explained the originsof the notions of formal language and grammar. The ideas come from the problem of definingsyntax, in both natural and computer languages in the 1950s.

The concept of a formal language was given in Chomsky [1956] and that of a grammar inChomsky [1959]. The mathematics of strings and words had been studied much earlier, e.g.,by Axel Thue and Max Dehn on word problems in groups associated with topological spaces.In particular, Emil Post had used strings and production rules in his studies of computability.Chomsky was inspired by Post’s work.

The use of modular grammars, although simple, seems to be new.The use of BNF started with the definition of Algol 60 and was proposed in Backus [1959]

and ?. BNF was an independent idea and its equivalence with context free languages was soonrecognised in Ginsburg and Rice [1962].

The examples reflect origins in linguistics, programming languages, the mathematical theory,and items of contemporary interest.

Standard textbooks on formal language and grammars include Hopcroft and Ullman [1979],Harrison [1978] and Salomaa [1981].

Chapter 11

Languages for Interfaces, Specificationsand Programs

In the previous chapter we introduced the basic ideas of formal language and grammar andapplied them in the specification of the syntax of addressing systems. In this chapter wewill consider applications of these basic ideas to programming languages. The methods arelargely the same — modular grammars written in BNF — but the examples are much morecomplicated. One new method we introduce is the technique of defining an extended syntaxfor a large language by a syntactic transformation that reduces it to kernel syntax for a smalllanguage. We will specify two interface definition languages, two specification languages, andthree programming languages.

In Section 11.1, we begin by specifying an interface definition language for data types bydefining a grammar for the language of

signatures,

a concept introduced in Chapter 4.The formal definition enables us to examine more closely some syntactic properties of signa-

tures, which are typical of many forms of declaration in programming languages. In Section 11.2,the interface declaration language is made modular by adding a construct import that allowsa signature to refer to other signatures in its definition. We have seen examples of the use ofthis construct in defining examples of signatures earlier on; now we will analyse and specify theconstruct in general. We give a second interface definition language for data types by definingthe grammar for the language

signatures with imports.

In Section 11.4, we define languages for the specification of data types by adding axiomaticspecifications to the language of signatures. First-order and equational axioms are considered.

In Section 11.3, we examine the import construct. In simple terms, it is a convenientnotation for defining the standard signatures of algebras. Thus, we view the second signaturelanguage as an extension of the first signature language. To define the meaning of the import

construct we will define a syntactic transformation that tries to reduce the extended syntax(signatures with imports) to the kernel syntax (signatures without imports). To do this weassume there is a library of signatures, called a repository, and an operation on the signatures

397

398CHAPTER 11. LANGUAGES FOR INTERFACES, SPECIFICATIONS AND PROGRAMS

in the library that substitutes actual signatures in place of references to signatures. Thissyntactic transformation is called flattening.

In Section 11.4, we define languages for the specification of data types by adding axiomaticspecifications to the language of signatures. First-order and equational axioms are considered.

In Section 11.5, we begin the specification of the syntax of the while language with the firstof three steps. We give a definition of the

while language for computing on a chosen, fixed data type of natural numbers.

In Section 11.6, we generalise this grammar to define the

while language for computing on any chosen, fixed data type.

From this we can derive grammars for the definition of while programs on other choices of datatypes, e.g., strings and real numbers. Each data type has its own grammar for while programs.

In the third stage we create a single grammar that defines while programs for all data types.In Section 11.7, by adapting and combining the grammar in Section 11.6 with the grammar forsignatures in Section 11.2, we define the

while language for computing on all data types.

Here the grammar generates a signature and then a while program. We answer questions ofhow to extend the while language with other features such as arrays, variable declarations, andso on.

In this part we are developing the essentials of the theory of syntax in stages. We havechosen to accumulate a large number of examples, before beginning the mathematical theoryin the next chapter. The grammars used in Chapter 10 and here are of a special kind, called

context-free grammars.

In the next three chapters, we introduce the mathematical theory of regular and context-freegrammars. This elegant subject enables us to understand the uses and limitations of thesepractically important grammars. Later, in Chapter 15, we give a second theory of syntax thatviews syntax more abstractly. It is based on using algebras to model data types of syntax. Aprogramming language is specified by an algebra whose operations create new programs fromold.

11.1 Interface Definition Languages

A signature is a syntactic interface to a data type. It is purely a piece of syntax, and hence wecan form the language of all

signatures

and use it as

a simple interface definition language for data types.

We use this language in Section 11.4 and 11.7 to consider languages for specifying andprogramming with arbitrary data types, respectively.

11.1. INTERFACE DEFINITION LANGUAGES 399

11.1.1 Target Syntax: Mathematical Definition of Signatures

In Chapter 4, we gave a mathematical definition of a signature and illustrated the idea withplenty of examples. These examples are the raw material for designing a language. First, wescrutinise the mathematical definition of a signature Σ from Section 4.1.2 in order to gain aprecise idea of what we seek to specify in our language. A typical signature has this form:

signature Σ

sorts . . . , s , . . .

constants . . . , c :→ s , . . .

operations . . . , f : s(1 )× · · · × s(n)→ s , . . .

endsig

In practice, we usually require two properties: the sort declaration property and the uniquenaming property.

Definition (Sort Declaration Property) A signature has the sort declaration property if:

(i) each sort s in a constant declaration c :→ s appears in the sort declaration; and

(ii) each sort s(1 ), . . . , s(n), s in a function declaration f : s(1 )× · · · × s(n)→ s appears inthe sort declaration.

Definition (Unique Naming Property) A signature has the unique naming property if eachconstant or function name appears at most once in the signature.

The point of making a declaration, such as a signature, is to present information explicitlyand conveniently. So conditions that remove inessential inferences and their ambiguities aredesirable. Perhaps these conditions must be enforced just as in some programming languages,where identifiers must be declared before they are used.

All our earlier examples satisfy each of these properties. Shortly, we will see examples of“signatures” — strings accepted by our grammar — where the properties fail to hold.

A more concrete type of definition is needed based on appropriate identifiers for sorts,elements and sets.

Thus, in defining a syntax for signatures a number of choices and decisions will be made.

11.1.2 A Simple Interface Definition Language for Data Types

We will construct a modular grammar to specify a language for signatures without imports.Then we will extend our language in a simple manner to include imports.

Structure

We define a signature without imports in terms of the syntactic categories of identifier, sort,constant and operation symbols, each of which we will define using a grammar. We will includea convention about its display.


bnf Signature

import Names , Sorts ,Constants ,Operations

rules〈Signature〉 ::= signature 〈Name〉〈Newline〉〈Sorts〉〈Newline〉

〈Constants〉〈Newline〉〈Operations〉〈Newline〉 endsig

Names

We choose, somewhat arbitrarily, to define a name as a string composed of letters or digits withunderscores anywhere other than at the start. Note that we define both individual names andnon-empty lists of names.

bnf Names


rules〈Names〉 ::= 〈Name〉 | 〈Name〉, 〈Names〉〈Name〉 ::= 〈Letter〉 | 〈Digit〉 | 〈Letter〉〈Name〉 | 〈Digit〉〈Name〉 |

〈Letter〉〈Name〉 | 〈Digit〉〈Name〉

The imported BNFs for Letter is the standard definitions as found in Section 10.3.3. The BNFfor Digit is simply:

bnf Digit

rules〈Digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

The categories of sorts, constants and operations all use names.

Sorts

We define the set of sort symbols to be lists of names. Note that we ensure the sort set isnon-empty.

bnf Sorts

import Names

rules〈Sorts〉 ::= sorts 〈SortList〉〈SortList〉 ::= 〈Sort〉 | 〈Sort〉, 〈SortList〉 | 〈Sort〉〈Newline〉〈SortList〉〈Sort〉 ::= 〈Name〉


Constants

We declare a set of constant symbols as a list of names and type declarations. Thus, eachconstant may have an individual type declaration, or we may group together constants that areof the same sort.

bnf Constants

import Sorts ,Names

rules〈Constants〉 ::= constants 〈ConstantList〉 |

constants

〈ConstantList〉 ::= 〈Constant〉 | 〈Constant〉〈Newline〉〈ConstantList〉〈Constant〉 ::= 〈Names〉: 〈Sort〉

Operations

Similarly with operation symbols, each operation is a name or names and a type declaration.Note that an operation must have at least one sort in its domain.

bnf Operations

import Sorts ,Names

rules〈Operations〉 ::= operations 〈OperationList〉 |

operations

〈OperationList〉 ::= 〈Operation〉 | 〈Operation〉〈Newline〉〈OperationList〉〈Operation〉 ::= 〈Names〉: 〈DomainSort〉 → 〈Sort〉〈DomainSort〉 ::= 〈Sort〉 | 〈Sort〉 ×〈DomainSort〉

Flattened Version

If we map out the dependencies between the grammars that we have defined to construct ourlanguage of signatures, we get the architecture shown in Figure 11.1.

If we unfold this architecture by substituting the appropriate grammars that we imported,we get the single flattened grammar:


Letter Digit

Names

Sorts

Constants

Signature

Operations

Figure 11.1: Architecture of grammar for signatures without imports.

bnf Flattened Signature

rules〈Signature〉 ::= signature 〈Name〉〈Newline〉〈Sorts〉〈Newline〉

〈Constants〉〈Newline〉〈Operations〉〈Newline〉 endsig

〈Names〉 ::= 〈Name〉 | 〈Name〉, 〈Names〉〈Name〉 ::= 〈Letter〉 | 〈Digit〉 | 〈Letter〉〈Name〉 | 〈Digit〉〈Name〉 |

〈Letter〉〈Name〉 | 〈Digit〉〈Name〉〈Sorts〉 ::= sorts 〈SortList〉〈Newline〉〈SortList〉 ::= 〈Sort〉 | 〈Sort〉, 〈SortList〉 | 〈Sort〉〈Newline〉〈SortList〉〈Sort〉 ::= 〈Name〉〈Constants〉 ::= constants 〈ConstantList〉〈Newline〉 |

constants 〈Newline〉〈ConstantList〉 ::= 〈Constant〉 | 〈Constant〉〈Newline〉〈ConstantList〉〈Constant〉 ::= 〈Names〉: 〈Sort〉〈Operations〉 ::= operations 〈OperationList〉〈Newline〉 |

operations 〈Newline〉〈OperationList〉 ::= 〈Operation〉 | 〈Operation〉〈Newline〉〈OperationList〉〈Operation〉 ::= 〈Names〉: 〈DomainSort〉 → 〈Sort〉〈DomainSort〉 ::= 〈Sort〉 | 〈Sort〉 ×〈DomainSort〉〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t |


R | S | T | U | V | W | X | Y | Z〈Digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9


11.1.3 Comparisons with Target Syntax

A typical example of a signature in this language is:

signature Naturals with Tests

sorts nat,bool

constants 0,1: → nattrue,false: → bool

operations plus: nat× nat→ natminus: nat× nat→ nattimes: nat× nat→ natdivide: nat× nat→ natmod: nat× nat→ nateq: nat× nat→ boolgt: nat× nat→ boollt: nat× nat→ boolnot: bool→ boolor: bool× bool→ booland: bool× bool→ bool

endsig

However, we can also derive signatures from the grammar with features that are not desirable.It is easy to derive signatures with the unwanted property that a sort (e.g., bool) appears in afunction declaration but does not appear in the sort declaration, such as in the following case:

signature nat

sorts nat

constants zero: → nat

operations succ: nat→ nateq: nat× nat→ bool

endsig

The question arises:

Can we adapt the grammar to restore the sort declaration property?

Later we will meet a similar problem concerning variable declarations in while programs. Wewill develop techniques to show that we have reached a limitation of these simple (context-free)grammars: the sort declaration property cannot be guaranteed or enforced by context-freerules.


11.2 A Modular Interface Definition Language for Data

Types

In Section 11.1, we considered a language for the design of simple interfaces. Our language hadno imports. Here, we extend this simple language to produce a language that we can use fordefining interfaces in a modular way.

The meaning of an interface defined with import cannot be explained without referenceto the interfaces which it imports. Thus, we shall also extend our language ideas to includethe notion of a library or repository of interfaces. The meaning of an interface with respectto a given repository can then be explained by regarding the import mechanism as a form ofabbreviation if we expand out or flatten the imports of an interface with respect to a repositoryof interfaces.

We have seen examples of signatures with imports which serve as the raw material for thetarget syntax.

11.2.1 Signatures with Imports

It is a simple operation to add imports to our language for signatures.

Imports

First we define a syntactic category of imports which consist of a non-empty list of names.

bnf Imports

import Names

rules〈Imports〉 ::= imports 〈ImportList〉〈Newline〉 |

imports 〈Newline〉〈ImportList〉 ::= 〈Name〉 |

〈Name〉, 〈ImportList〉 |〈Name〉〈Newline〉〈ImportList〉

Signatures with Optional Imports

Then we define a signature with imports to be either

(i) a non-importing signature of the standard kind, or else

(ii) a signature with identifier, non-empty import list, sorts, constants and operations.

11.2. A MODULAR INTERFACE DEFINITION LANGUAGE FOR DATA TYPES 405

bnf Signature with Imports

import Imports , Signature

rules〈Signature〉 ::= signature 〈Name〉〈Newline〉

〈Imports〉〈Newline〉〈Sorts〉〈Newline〉〈Constants〉〈Newline〉〈Operations〉〈Newline〉 endsig

Remember that this grammar contains the rules of the grammar Signature and has also theearlier rule for 〈Signature〉.

Flattened Version

Thus, the architecture of our language of signatures with imports can be represented by Fig-ure 11.2.

Signature with Imports

Signature

Names

Imports

Figure 11.2: Architecture of signatures with imports.

Flattening out this architecture by substituting the grammars that are imported yields:


bnf Flattened Signature with Imports

rules〈Signature〉 ::= signature 〈Name〉〈Newline〉

〈Imports〉〈Newline〉〈Sorts〉〈Newline〉〈Constants〉〈Newline〉〈Operations〉〈Newline〉 endsig |

signature 〈Name〉〈Newline〉〈Sorts〉〈Newline〉〈Constants〉〈Newline〉〈Operations〉〈Newline〉 endsig

〈Names〉 ::= 〈Name〉 | 〈Name〉, 〈Names〉〈Name〉 ::= 〈Letter〉 | 〈Digit〉 | 〈Letter〉〈Name〉 | 〈Digit〉〈Name〉 |

〈Letter〉〈Name〉 | 〈Digit〉〈Name〉〈Imports〉 ::= imports 〈ImportList〉〈Newline〉 |

imports 〈Newline〉〈ImportList〉 ::= 〈Name〉 | 〈Name〉, 〈ImportList〉 |

〈Name〉〈Newline〉〈ImportList〉〈Sorts〉 ::= sorts 〈SortList〉〈Newline〉〈SortList〉 ::= 〈Sort〉 | 〈Sort〉, 〈SortList〉 | 〈Sort〉〈Newline〉〈SortList〉〈Sort〉 ::= 〈Name〉〈Constants〉 ::= constants 〈ConstantList〉〈Newline〉 |

constants 〈Newline〉〈ConstantList〉 ::= 〈Constant〉 | 〈Constant〉〈Newline〉〈ConstantList〉〈Constant〉 ::= 〈Names〉: 〈Sort〉〈Operations〉 ::= operations 〈OperationList〉〈Newline〉 |

operations 〈Newline〉〈OperationList〉 ::= 〈Operation〉 | 〈Operation〉〈Newline〉〈OperationList〉〈Operation〉 ::= 〈Names〉: 〈DomainSort〉 → 〈Sort〉〈DomainSort〉 ::= 〈Sort〉 | 〈Sort〉 × 〈DomainSort〉〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t |


R | S | T | U | V | W | X | Y | Z〈Digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

11.3. EXTENSIONS OF A KERNEL AND FLATTENING 407

11.3 Extensions of a Kernel and Flattening

Now we have a language of signatures and a modular language of signatures. The modularlanguage is based on a simple notion of how a signature can extend or import another signature.Modular signatures are notations that describe how a signature is constructed. We considerhere how we can define the assembly process on signatures.

Technically, in the modular language, we have two types of signature, one with imports,which we generate using the rule

〈Signature〉 ::= signature 〈Name〉〈Newline〉〈Imports〉〈Newline〉〈Sorts〉〈Newline〉〈Constants〉〈Newline〉〈Operations〉〈Newline〉 endsig

and one without, which we generate using the rule:

〈Signature〉 ::= signature 〈Name〉〈Newline〉〈Sorts〉〈Newline〉〈Constants〉〈Newline〉〈Operations〉〈Newline〉 endsig

Thus, signatures are contained within modular signatures, as shown in Figure 11.3.

Signatures

Signatures with Imports

Figure 11.3: Signatures and modular signatures.

When we flatten a signature, we try to transform or “reduce” a signature with imports toa signature without imports. Thus, the language of signatures with imports is an extension ofthe kernel language of signatures without imports. The assembly process is called flatteningbecause it eliminates the hierarchical structure.

11.3.1 Repositories

Suppose we have a signature Σ which imports the signatures:

. . . ,Σ Import , . . .

Then, we need each of the signatures . . . ,Σ Import , . . . and all the signatures that they importto be able to flatten Σ . We need to make precise the collection of signatures we have available.

Definition (Repository) A collection of signatures is called a signature repository.

So, to attempt to flatten a signature, we need a repository of signatures. It is easy to extendour language of modular signatures to include a syntax for finite repositories:


bnf Repository

import Signature with Imports

rules〈Repository〉 ::= repository 〈Name〉〈Contents〉 endrepository

〈Contents〉 contains 〈RepositoryList〉 |contains

〈RepositoryList 〉 ::= 〈Signature〉 |〈Signature〉 , 〈RepositoryList〉

Given a repository, we shall need a partial function

Extract : Name × Repository ; Signature with Imports

so that Extract(n,R) will pick out an interface with name n from a repository R, if it is present.Of course, there is no guarantee that the repository will be suitable for flattening a particular

signature. There could be many problems, the most obvious of which is that the repository doesnot contain all of the signatures that are needed to flatten a modular signature. So flatteningis a function

Flatten : Signature with Imports × Repository → Signature with Imports .

If all goes well, and we can flatten a signature Σ then we will be left with a signature

Flatten(Σ ) ∈ Signature ⊆ Signature with Imports ;

but if there are problems, then we will be left with a signature containing imports

Flatten(Σ ) 6∈ Signature.

11.3.2 Dependency Graphs

Given a repository of signatures, we can determine the import relationship (if any) betweenthem.

Definition (Dependency Graph) A dependency graph of a repository is a graph where:

(i) the nodes of the graph are the signatures of the repository;

(ii) there is an edge in the graph I → J , if, and only if, there is a signature I in the repositorywhich imports a signature J , and J is in the repository.

When we build a dependency graph for a repository, we may find that there may be otherproblems when we try to flatten a signature using the repository. For example, we may findthat the repository does not contain a signature that is listed as an import in another. We mayfind that two interfaces are mutually dependent on one another, which will show up as a cyclein the graph.

11.3. EXTENSIONS OF A KERNEL AND FLATTENING 409

11.3.3 Flattening Algorithm

Given a repository and a signature with imports, we can attempt trace its dependencies bytraversing the dependency graph. In order to flatten a signature, we need to combine thesignatures that it depends on. So how do we combine signatures? Let us suppose we have anoperation

Expand : Signature with Imports × Signature with Imports → Signature with Imports

so that

Expand(Σ ,Σ ′) = a signature with the same name as Σ , and the union of the imports,sorts, constants and operations of the signatures Σ and Σ ′.

If we represent signatures using records then an algorithm for performing flattening, is:

Informal algorithm to perform flattening

(* Given a signature Σ as input, create its flattened version Flattened Σ . *)

(* Create a copy Flattened of Σ , and rename it as Flattened Σ . *)Flattened := Σ ;Flattened .Name := concat(“Flattened ”,Σ .Name);

(* Pick out the imports I of Σ *)I := Σ .Imports ;

(* Whilst there is an import in Flattened *)while I 6= ∅ do

(* Pick an import i *)choose i ∈ I

(* Replace i with the imports that i depends on *)I := (I − i) ∪ Extract(i ,R).Imports ;

(* Update Flattened ’s import list *)Flattened .Imports := I ;

(* Flatten Flattened with the signature named i in the repository R *)Flattened := Expand(Flattened ,Extract(i ,R))

esoohc

od

11.3.4 Example

Suppose we have a repository R with the six signatures named

Integers, Booleans, Integer Group, Integer Ring,Integer Group with Booleans, Integer Ring with Booleans,

whose dependency graph is shown in Figure 11.4.


operations not : → bool

signature Booleans

sorts bool

constants true : → bool

false : → bool

and : bool→ bool

or : bool→ boolendsig

endsig

eq : int× int→ booloperationsconstants

sorts

imports Integer Group, Booleans

Integer Group with Booleanssignature signature Integer Ring with Booleans

imports Integer Ring, Booleans

sorts

constantsoperations

endsigeq : int× int→ bool

endsig

operationsconstants

sorts

signature

imports

times : int→ int

one : → int

Integer Group

Integer Ringsignature Integer Group

Integers

int→ int

int→ intminus :endsig

imports

sorts

constantsplus :operations

signature Integers

sorts

constantsoperations plus one :

unary minus :

→ intzero :

int→ int

int→ intendsig

Figure 11.4: Dependency graph.

11.4. LANGUAGES FOR DATA TYPE SPECIFICATIONS 411

If we flatten the signature Integer Group with Booleans, we get the signature:

signature Flattened Integer Group with Booleans

sorts int,bool

constants true,false: → boolzero: → int

operations plus one: int→ intunary minus: int→ intplus: int× int→ intminus: int× int→ inteq: int× int→ boolnot: bool→ booland: bool× bool→ boolor: bool× bool→ bool

endsig

11.3.5 Comparison with Target Syntax

11.4 Languages for Data Type Specifications

A data type can be specified using

(i) an interface; and

(ii) a list of properties of the constants and operations declared in the interface.

In Chapter 5, and later chapters, we explained this method and considered many examples. Wewill extend our interface definition language into a language for writing axiomatic specificationsfor data types. As an exercise in the design and definition of syntax it is informative and nottoo difficult.

11.4.1 Target Syntax: Data Type Specifications

An interface is modelled by a signature, and the list of properties by an axiomatic theory. Thus,a specification might take the form:


spec Name1

signature Name2

sorts . . . ,s , . . .

constants...

c: → s...

operations...

f : s(1 )× · · · × s(n)→ s...

endsig

axioms Name3...a...

endaxioms

endspec

For example, the axiomatic theory of commutative rings from Chapter 5 would become:


spec CRing

signature CRing

sorts ring

constants 0,1: → ring

operations +: ring× ring→ ring-: ring→ ring.: ring× ring→ ring

endsig

axioms CRing

(∀x)(∀y)(∀z)[(x+y)+z = x+(y+z)](∀x)(∀y)[x+y = y+x]

(∀x)[x+0 = x](∀x)[x+(-x) = 0]

(∀x)(∀y)(∀z)[(x.y).z = x.(y.z)](∀x)(∀y)[x.y = y.x]

(∀x)[x.1 = x](∀x)(∀y)(∀z)[x.(y+z) = x.y+x.z]

endaxioms

endspec

We wish to build a grammar from which we can derive specifications. We have alreadydeveloped a grammar for signatures in Section 11.1 (we have even extended it with import

in Section 11.2). Thus, the problem is to extend the signature grammar with rules to specifyaxioms.

What are these axioms? They are properties of the constants and operations that appearin the signature. Looking back over examples in Part I we see some variety.

In the specification of commutative rings, the axioms are all very similar; the commutativityaxiom

(∀x)(∀y)[x+y=y+x]

is typical. Each axiom is an equation that is forced to be true of all elements. However, wedid meet, in Chapter 5, the integral domain property for certain commutative rings which wasa little more complicated:

(∀x)(∀y)[x.y=0⇒ x=0 ∨ y=0].

Here, the axiom involves the logical connectives

⇒ and ∨of implication and disjunction, respectively. Another important property is the division axiomin the specification of fields:

(∀x)[x 6= 0⇒ x.x-1=x-1.x=1].


This involves the logical connectives

¬ and ⇒

of negation and implication, respectively.Later in Chapter 7, we used the induction axiom in Dedekind’s characterisation of the

natural numbers which is much more complicated again:

(∀X)[0 ∈ X ∧ (∀x)[x ∈ X⇒ succ(x) ∈ X]⇒ (∀x)[x ∈ X]]

This axiom involves the logical connectives

∧ and ⇒

of conjunction and implication, respectively, and two different types of universal quantification:

∀x quantification over data; and∀X quantification over sets of data.

Note, too, that these logical operators are nested.The completeness property for ordered fields is even more complex (see the exercises).These examples suggest that the axioms in our specifications are expressed using

(i) variables for data, and even sets of data;

(ii) constants and operations from a signature;

(iii) logical connectives such as ∧,∨,¬ and ⇒; and

(iv) quantifiers such as the universal quantifier ∀ and the existential quantifies ∃.

What is their structure? Very roughly speaking, at the heart of each axiom are simpleexpressions that are made from the constants, operations and tests in a signature and are eithertrue or false. These expressions will be called

atomic expressions.

An axiom is made by putting together atomic expressions using the logical connectives andconstraining atomic expressions using quantifiers.

We will now proceed to develop a formal language for axioms. For simplicity, we will notallow quantification over sets, only quantification over data. Because of this restriction ofquantification to elements of the data type, the language we develop will contain what will becalled

first-order axiomatic specifications.

If quantification over sets were allowed, then it would contain what would be called

second-order axiomatic specifications.


Term

Signature

Signature AtomicFormula

Signature

Specification

Axioms

FirstOrderFormula

Figure 11.5: Modular grammar (and hence algebraic) structure for first order formulae specifi-cations.

11.4.2 Languages for First-Order Specifications

We will develop the grammar for our language of first-order specifications in a top-down andmodular way. The structure of the grammar is depicted in Figure 11.5.

A specification consists of a signature and some axioms.

bnf Specification

import Signature

rules〈Specification〉 ::= spec 〈Signature〉〈Newline〉〈Axioms〉 endspec

Next, we define lists of axioms.

bnf Axioms

import FirstOrderFormula

rules〈Axioms〉 ::= axioms 〈AxiomList〉 endaxioms |

axioms endaxioms

〈AxiomList〉 ::= 〈Axiom〉〈Newline〉 |〈Axiom〉〈Newline〉〈AxiomList〉

〈Axiom〉 ::= 〈Formula〉

Each axiom is a first-order formula. We classify a first-order formula as either

(i) an atomic formula,

(ii) a quantified formula, or


(iii) a connective formula,

according to its structure. A connective formula is a formula to which one of the operations of

(i) negation,

(ii) disjunction,

(iii) conjunction, or

(iv) implication

is outermost. A quantified formula is a formula which begins with some variables which are

(i) universally, or

(ii) existentially quantified.

bnf FirstOrderFormula

import AtomicFormula

rules〈Formula〉 ::= 〈AtomicFormula〉 |

〈QuantifiedFormula〉 |〈ConnectiveFormula〉

〈ConnectiveFormula〉 ::= ¬ 〈Formula〉 |〈Formula〉 ∧ 〈Formula〉 |〈Formula〉 ∨ 〈Formula〉 |〈Formula〉 ⇒ 〈Formula〉

〈QuantifiedFormula〉 ::= (〈Quantifier〉〈Variable〉)[ 〈Formula〉 ]

〈Quantifier〉 ::= ∀ | ∃

An atomic formula is

(i) a truth value of true or false; or

(ii) formed from the application of a Boolean-valued operation from the signature to a list ofterms.

bnf AtomicFormula

import Term

rules〈AtomicFormula〉 ::= 〈TruthValues〉 |

〈RelationApplication〉〈TruthValues〉 ::= true | false〈RelationApplication〉 ::= 〈BooleanOperation〉 (〈TermList〉)


A term is

(i) a constant;

(ii) a variable; or

(iii) formed from the application of a function to a list of terms.

bnf Term

import Signature

rules〈TermList〉 ::= 〈Term〉 |

〈Term〉 , 〈TermList〉〈Term〉 ::= 〈Constant〉 |

〈Variable〉 |〈FunctionApplication〉

〈Variable〉 ::= 〈Name〉〈FunctionApplication〉 ::= 〈Function〉 (〈TermList〉)

Flattened Version If we now assemble the components of the modular grammar we get thisgrammar:


bnf FlattenedSpecification

import Signature

rules〈Specification〉 ::= 〈Signature〉〈Newline〉〈Axioms〉〈Axioms〉 ::= axioms 〈AxiomList〉 endaxioms |

axioms endaxioms

〈AxiomList〉 ::= 〈Axiom〉〈Newline〉 |〈Axiom〉〈Newline〉〈AxiomList〉

〈Axiom〉 ::= 〈Formula〉〈Formula〉 ::= 〈AtomicFormula〉 |

〈QuantifiedFormula〉 |〈ConnectiveFormula〉

〈ConnectiveFormula〉 ::= ¬ 〈Formula〉 |〈Formula〉 ∧ 〈Formula〉 |〈Formula〉 ∨ 〈Formula〉 |〈Formula〉 ⇒ 〈Formula〉

〈QuantifiedFormula〉 ::= (〈Quantifier〉〈Variable〉)[ 〈Formula〉 ]

〈Quantifier〉 ::= ∀ | ∃〈AtomicFormula〉 ::= 〈TruthValues〉 |

〈RelationApplication〉〈TruthValues〉 ::= true | false〈RelationApplication〉 ::= 〈BooleanOperation〉 (〈TermList〉)〈TermList〉 ::= 〈Term〉 |



〈FunctionApplication〉 ::= 〈Function〉 (〈TermList〉)

11.4.3 Languages for Equational Specifications

The first-order formulae can be classified according to their logical complexity. Perhaps thesimplest kind of first-order formula is an equation. For example,

(x+y)+z=x+(y+z)

x+y=y+x

are equations and, in fact, all the axioms of a commutative ring are equations. In Part I,Chapter 8 was devoted to equations over arbitrary signatures.


We can classify specifications by classifying their formulae allowed as axioms. An equationalspecification is a specification with only equational formulae.

We define a grammar for equational specifications. The structure of the grammar is depictedin Figure 11.6.

Term

Signature

EquationalFormula

EquationalAxiomsSignature

EquationalSpecification

Figure 11.6: Modular grammar (and hence algebraic) structure for equational logic specifica-tions.

bnf EquationalSpecification

import Signature

rules〈EqnlSpecification〉 ::=spec 〈Signature〉〈Newline〉〈EqnlAxioms〉 endspec

An equational axiom is an equational formula.

bnf EquationalAxioms

import EquationalFormula

rules〈EqnlAxioms〉 ::= axioms 〈EqnlAxiomList〉 endaxioms |

axioms endaxioms

〈EqnlAxiomList〉 ::= 〈EqnlAxiom〉〈Newline〉 |〈EqnlAxiom〉〈Newline〉〈EqnlAxiomList〉

〈EqnlAxiom〉 ::= 〈EqnlFormula〉

An equational formula is a universally quantified equation. An equation is formed by ap-plying an equality relation to terms.


bnf EquationalFormula

import Term

rules〈EqnlFormula〉 ::= (〈Quantifier〉〈Variable〉)[ 〈Equation〉 ]

〈Equation〉 ::= 〈Term〉 = 〈Term〉〈Quantifier〉 ::= ∀

Flattened Version Assembling the components, we have:

bnf EquationalSpecification

import Signature

rules〈EqnlSpecification〉 ::= spec 〈Signature〉〈Newline〉〈EqnlAxioms〉 endspec

〈EqnlAxioms〉 ::= axioms 〈EqnlAxiomList〉 endaxioms |axioms endaxioms

〈EqnlAxiomList〉 ::= 〈EqnlAxiom〉〈Newline〉 |〈EqnlAxiom〉〈Newline〉〈EqnlAxiomList〉

〈EqnlAxiom〉 ::= 〈EqnlFormula〉〈EqnlFormula〉 ::= (〈Quantifier〉〈Variable〉)[ 〈Equation〉 ]

〈Equation〉 ::= 〈Term〉 = 〈Term〉〈Quantifier〉 ::= ∀〈TermList〉 ::= 〈Term〉 |



〈FunctionApplication〉 ::= 〈Function〉 (〈TermList〉)


In this language, we can indeed form an equational specification CRing given in Section 11.4.1.However, we can also derive equational specifications from the grammar which are not

desirable, such as:

11.5. A WHILE PROGRAMMING LANGUAGE OVER THE NATURAL NUMBERS 421

spec Malformed

signature CRing

sorts ring

constants 0,1: → ring

operations +: ring× ring→ ring-: ring→ ring.: ring× ring→ ring

endsig

axioms Malformed(∃x)[x-1 = 1](∀x)[x. = x]

(∃+)[x+y = y+x]

endaxioms

endspec

It is easy to derive equational specifications with a variety of unwanted properties. We haveunwanted properties arising from axioms not using the declarations given in the signature, suchas the following:

• An axiom uses a constant or function that does not appear in the signature.

• An axiom uses a function from the signature, but with the wrong number of arguments,or arguments of the wrong type.

• A constant or function name in the signature is used in an axiom as a variable.

The problem of ensuring that these problems do not arise is similar to that of generating onlysignatures with the sort declaration property.

11.5 A While Programming Language over the Natural

Numbers

We will now develop a series of grammars that will show how we can pin down certain aspectsof the syntax of programs and interfaces.

11.5.1 Target Syntax: Simple Imperative Programs

In Chapter 1, we discussed at some length the idea of imperative programming with stores ormemories containing any kind of data. In Section 1.3, we introduced the simple imperativeconstructs of


Null skip

Assignment x := eSequencing S1 ;S2

Conditional if b then S1 else S2 fi

Iteration while b do S0 od

that make up the so-called while language WP . We also gave a simple WP program for Euclid’sAlgorithm and reflected on its general structure, as shown in Figure 11.7. This introductory

data type interface

imperative program body

Figure 11.7: General structure of Euclid’s Algorithm.

material illustrates desirable syntax for WP and is the starting point for a specification usinga grammar.

We will do this in three stages. In this section, we fix a particular data type and concen-trate on giving rules for the primary constructions in the special case of the natural numbers.In Section 11.6, we upgrade out grammar and specify a while language based on any fixed,but arbitrarily chosen, data type interface. Finally, in Section 11.7, we complete the task byremoving the fixed arbitrary interface and importing the grammar of our interface definitionlanguage of Section 11.1.

11.5.2 A Grammar for while Programs over Natural Numbers

We shall present a grammar for a while programming language (WP) for computation onnatural numbers. We have already given a simple grammar in Section 10.3 when illustratingmodular grammars. Here we will refine the definition of the language given earlier: we introduceextra features to the language, and enforce precedence between the operations on the naturalnumbers.

Underlying Data Type

The language we are designing in this section will only be for computing over the naturalnumbers. In particular, we fix the underlying data type to have a signature of the form:


signature Naturals with Tests

sorts nat,Bool

constants 0,1 : → nattrue,false: → Bool

operations +: nat× nat→ nat-: nat× nat→ nat*: nat× nat→ nat/: nat× nat→ natmod: nat× nat→ nat=: nat× nat→ Bool>: nat× nat→ Bool<: nat× nat→ Boolnot: Bool→ Boolor: Bool× Bool→ Booland: Bool× Bool→ Bool

Programs

Now let us consider the structure of while programs in a top-down manner, as shown inFigure 14.16.

While Programs over Natural Numbers

BooleanExpressionsExpressions

Statements

DigitLetter

Identifiers

I /OTypes

Body

CommandsDeclarations

Number

Naturals with Tests Signature

Figure 11.8: Architecture of while programs over the natural numbers.

Programs have a name and are built from the signature Naturals with Tests and a body.


bnf While Programs over Natural Numbers

import Body

rules〈WhileProgram〉 ::= program 〈Name〉

〈Signature Naturals with Tests〉〈Body〉

Signature

The signature for the programs will be fixed as that for Naturals with Tests and so the datatype interface is just one rule.

bnf Naturals with Tests Signature

rules〈Signature Naturals with Tests〉::= signature Naturals with Tests

sorts nat, Boolconstants 0: → nat; true, false: → Booloperations +: nat × nat → nat; -: nat × nat → nat; *: nat × nat → nat;/: nat × nat → nat; mod: nat × nat → nat;=: nat × nat → bool; >: nat × nat → bool; <: nat × nat → bool;not: bool × bool → bool; or: bool × bool → bool; and: bool × bool → boolendsig

Program Bodies

The body of a program consists of a list of variable declarations and a list of commands.

bnf Body

import Declarations ,Commands

rules〈Body〉 ::= begin 〈CommandList〉 end

Declarations

The declarations for a while program will be a list of declarations, which may be empty. Theintention is that the variables that appear in the program should be listed in the declaration,but as we will see later the BNF is not able to enforce this.


bnf Declarations

import Identifiers ,Types

rules〈Declarations〉 ::= var 〈DeclarationList〉 ; |

var ;

〈DeclarationList〉 ::= 〈Declaration〉 ; 〈DeclarationList〉 |〈Declaration〉

〈Declaration〉 ::= 〈IdentifierList〉 : 〈Type〉

Types

The types of the variables that we declare in the program are the sorts of the signatureNaturals with Tests , namely the Booleans and the natural numbers:

bnf Types

rules〈Type〉 ::= bool | nat

Commands

We define the commands of a program to consist of a list of individual commands separatedby semi-colons. A command will either be concerned with input and output, or it will not.

bnf Commands

import Statements , I /O

rules〈CommandList〉 ::= 〈Command〉 | 〈CommandList〉 ; 〈Command〉〈Command〉 ::= 〈Statement〉 | 〈I/O〉

Input-Output Statements

We have simple input-output mechanisms that will allow us to read values into identifiers andto output the values of identifiers.

bnf I /O

import Identifiers

rules〈I/O〉 ::= 〈Read〉 | 〈Write〉〈Read〉 ::= read (〈IdentifierList〉)〈Write〉 ::= write (〈IdentifierList〉)


Statements

Alternatively, statements may be

(i) a null statementskip

which has no effect;

(ii) an assignmentx :=e

from an identifier x to a value of an expression e;

(iii) a conditional statementif b then S1 else S2 fi

which directs the flow of control of the program to the statement S1 if the Booleanexpression b evaluates to true, and to the statement S2 otherwise; or

(iv) an iterative statementwhile b do S0 od

which repeatedly executes the statement S0 whilst the Boolean expression b evaluates totrue.

bnf Statements

import Expressions ,BooleanExpressions

rules〈Statement〉 ::= 〈Null〉 | 〈Assignment〉 | 〈Conditional〉 | 〈Iteration〉〈Null〉 ::= skip

〈Assignment〉 ::= 〈Identifier〉 := 〈Expression〉〈Conditional〉 ::= if 〈Comparison〉 then 〈CommandList〉 else 〈CommandList〉 fi

〈Iteration〉 ::= while 〈Comparison〉 do 〈CommandList〉 od

Boolean Expressions

Statements are dependent on the tests that we can perform. A comparison is either

(i) true or false;

(ii) a comparison of two expressions; or

(iii) a Boolean combination of comparisons.

We can compare expressions using the relational operators =, < or >. Note that theseare the operations of the underlying data type Naturals with Tests that have a domain typeinvolving nat and return type of Bool.

We can combine comparisons with the Boolean operations not, and, and or. Again, notethat these are the operations of the underlying data type that have a domain type involvingonly Bool and return type of Bool.


bnf Boolean Expressions

import Identifiers

rules〈BooleanExpression〉 ::= true | false |

not 〈BooleanExpression〉 |〈BooleanExpression〉 and 〈BooleanExpression〉 |〈BooleanExpression〉 or 〈BooleanExpression〉 |〈Expression〉〈Relation〉〈Expression〉

〈Relation〉 ::= = | < | >

Expressions

Both statements and conditional tests are directly dependent on expressions over the signatureof the underlying data type Naturals with Tests. We can form an expression as

(i) an identifier,

(ii) a number, or

(iii) combining expressions with +, -, *, / or mod.

Again, the operations to combine expressions are those operations of Naturals with Tests whichhave return type nat.

bnf Expressions

import Identifiers

rules〈Expression〉 ::= 〈Identifier〉 | 〈Number〉 |

〈Expression〉 + 〈Expression〉 |〈Expression〉 - 〈Expression〉 |〈Expression〉 * 〈Expression〉 |〈Expression〉 / 〈Expression〉 |〈Expression〉 mod 〈Expression〉

Identifiers

We construct variables, or identifiers, as strings of letters, digits or underscores that start witha letter.

bnf Identifiers


rules〈IdentifierList〉 ::= 〈Identifier〉 | 〈IdentifierList〉 , 〈Identifier〉〈Identifier〉 ::= 〈Letter〉 | 〈Identifier〉〈Letter〉 | 〈Identifier〉〈Digit〉 | 〈Identifier〉


Flattened Version

If we take all the component grammars we have constructed, we can produce the flattenedgrammar:

bnf Flattened While Programs over Natural Numbers


〈Signature Naturals with Tests〉〈Body〉

〈Signature Naturals with Tests〉::= signature Naturals with Tests

sorts nat , boolconstants 0: → nat ; true, false: → booloperations +: nat × nat → nat ; -: nat × nat → nat ;*: nat × nat → nat ; /: nat × nat → nat ; mod: nat × nat → nat ;=: nat × nat → bool ; >: nat × nat → bool ; <: nat × nat → bool ;not: bool × bool → bool ; or: bool × bool → bool ;and: bool × bool → boolendsig

〈Body〉 ::= begin 〈Declarations〉〈CommandList〉 end

〈Declarations〉 ::= var 〈DeclarationList〉 ; | var ;

〈DeclarationList〉 ::= 〈Declaration〉 ; 〈DeclarationList〉 | 〈Declaration〉〈Declaration〉 ::= 〈IdentifierList〉 : 〈Type〉〈Type〉 ::= bool | nat

〈CommandList〉 ::= 〈Command〉 | 〈CommandList〉 ; 〈Command〉〈Command〉 ::= 〈Statement〉 | 〈I/O〉〈I/O〉 ::= 〈Read〉 | 〈Write〉〈Read〉 ::= read (〈IdentifierList〉)〈Write〉 ::= write (〈IdentifierList〉)


bnf Flattened While Programs over Natural Numbers (continued)




〈BooleanExpression〉 ::= true | false |not 〈BooleanExpression〉 |〈BooleanExpression〉 and 〈BooleanExpression〉 |〈BooleanExpression〉 or 〈BooleanExpression〉 |〈Expression〉〈Relation〉〈Expression〉

〈Relation〉 ::= = | < | >〈Expression〉 ::= 〈Identifier〉 | 〈Number〉 |


〈IdentifierList〉 ::= 〈Identifier〉 | 〈IdentifierList〉 , 〈Identifier〉〈Identifier〉 ::= 〈Letter〉 | 〈Identifier〉〈Letter〉 | 〈Identifier〉〈Digit〉 | 〈Identifier〉〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t |


R | S | T | U | V | W | X | Y | Z〈Number〉 ::= 〈Digit〉 | 〈NonZeroDigit〉〈Digits〉〈Digit〉 ::= 0 | 〈NonZeroDigit〉〈NonZeroDigit〉 ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9〈Digit〉 ::= 〈Digit〉 | 〈Digit〉〈Digits〉

The grammar produces basic while programs over natural numbers. Several extensions cometo mind, such as adding control constructs (e.g., for-loops, repeat statements, declarations)and further data types. Recall the discussion in Chapter 1, and see the exercises at the end ofthis chapter.

Example

We can derive the program statement


x := y;y := r;r := x mod y

from the BNF While Programs Over Natural Numbers as follows:

〈CommandList〉 ⇒ 〈CommandList〉 ; 〈Command〉⇒ 〈CommandList〉 ; 〈Statement〉⇒ 〈CommandList〉 ; 〈Assignment〉⇒ 〈CommandList〉 ; 〈Identifier〉 := 〈Expression〉⇒ 〈CommandList〉 ; 〈Letter〉 := 〈Expression〉⇒ 〈CommandList〉 ; r := 〈Expression〉⇒ 〈CommandList〉 ; r := 〈Identifier〉 mod 〈Expression〉⇒ 〈CommandList〉 ; r := 〈Expression〉 mod 〈Expression〉⇒ 〈CommandList〉 ; r := 〈Letter〉 mod 〈Expression〉⇒ 〈CommandList〉 ; r := x mod 〈Expression〉⇒ 〈CommandList〉 ; r := x mod 〈Identifier〉⇒ 〈CommandList〉 ; r := x mod 〈Letter〉⇒ 〈CommandList〉 ; r := x mod y⇒ 〈CommandList〉 ; 〈Command〉 ; r := x mod y⇒ 〈CommandList〉 ; 〈Statement〉 ; r := x mod y⇒ 〈CommandList〉 ; 〈Assignment〉 ; r := x mod y⇒ 〈CommandList〉 ; 〈Identifier〉 := 〈Expression〉 ; r := x mod y⇒ 〈CommandList〉 ; 〈Letter〉 := 〈Expression〉 ; r := x mod y⇒ 〈CommandList〉 ; y := 〈Expression〉 ; r := x mod y⇒ 〈CommandList〉 ; y := 〈Identifier〉 ; r := x mod y⇒ 〈CommandList〉 ; y := 〈Letter〉 ; r := x mod y⇒ 〈CommandList〉 ; y := r ; r := x mod y⇒ 〈Command〉 ; y := r ; r := x mod y⇒ 〈Statement〉 ; y := r ; r := x mod y⇒ 〈Assignment〉 ; y := r ; r := x mod y⇒ 〈Identifier〉 := 〈Expression〉 ; y := r ; r := x mod y⇒ 〈Letter〉 := 〈Expression〉 ; y := r ; r := x mod y⇒ x := 〈Expression〉 ; y := r ; r := x mod y⇒ x := 〈Identifier〉 ; y := r ; r := x mod y⇒ x := 〈Letter〉 ; y := r ; r := x mod y⇒ x := y ; y := r ; r := x mod y

11.5.3 Operator Precedence

In this section, we adapt the original version to enforce precedence between the operators.

Boolean Expressions

We shall ensure that not binds more tightly than and, which in turn binds more tightly thanor. Thus, a Boolean expression

b1 and not b2 or b3


can only have the interpretation

b1 and (not(b2 or b3 )).

Any other interpretation must be bracketed suitably. This has the advantage that brackets canbe omitted in some circumstances.


import Identifiers

rules〈Comparison〉 ::= 〈BooleanExpression〉 |

〈Expression〉〈Relation〉〈Expression〉〈BooleanExpression〉 ::= 〈BooleanTerm〉 |

〈BooleanExpression〉 or 〈BooleanTerm〉〈BooleanTerm〉 ::= 〈BooleanFactor〉 |

〈BooleanTerm〉 and 〈BooleanFactor〉〈BooleanFactor〉 ::= 〈BooleanAtom〉 | (〈Comparison〉)|

not 〈BooleanFactor〉〈Relation〉 ::= = | < | >〈BooleanAtom〉 ::= true | false

Expressions

The precedence of operators encoded by these rules is that *, / and mod bind more tightly than+ and -.

bnf Expressions

import Identifiers

rules〈Expression〉 ::= 〈Term〉 | 〈Expression〉〈AddingOperator〉〈Term〉〈Term〉 ::= 〈Factor〉 | 〈Term〉〈MultiplyingOperator〉〈Factor〉〈Factor〉 ::= 〈Atom〉 | (〈Expression〉)〈AddingOperator〉 ::= + | -〈MultiplyingOperator〉 ::= * | / | mod

〈Atom〉 ::= 〈Identifier〉 | 〈Number〉

Example

We can derive the program statement

x := x * y + z


from the BNF While Programs Over Natural Numbers as follows:

〈Assignment〉 ⇒ 〈Identifier〉 := 〈Expression〉⇒ 〈Letter〉 := 〈Expression〉⇒ x := 〈Expression〉⇒ x := 〈Expression〉〈AddingOperator〉〈Term〉⇒ x := 〈Expression〉 + 〈Term〉⇒ x := 〈Term〉 + 〈Term〉⇒ x := 〈Term〉〈MultiplyingOperator〉〈Factor〉 + 〈Term〉⇒ x := 〈Term〉 * 〈Factor〉 + 〈Term〉⇒ x := 〈Factor〉 * 〈Factor〉 + 〈Term〉⇒ x := 〈Atom〉 * 〈Factor〉 + 〈Term〉⇒ x := 〈Identifier〉 * 〈Factor〉 + 〈Term〉⇒ x := 〈Letter〉 * 〈Factor〉 + 〈Term〉⇒ x := x * 〈Factor〉 + 〈Term〉⇒ x := x * 〈Atom〉 + 〈Term〉⇒ x := x * 〈Identifier〉 + 〈Term〉⇒ x := x * 〈Letter〉 + 〈Term〉⇒ x := x * y + 〈Term〉⇒ x := x * y + 〈Factor〉⇒ x := x * y + 〈Atom〉⇒ x := x * y + 〈Identifier〉⇒ x := x * y + 〈Letter〉⇒ x := x * y + z


We have seen how we can generate some sample while program fragments over the naturalnumbers data type from this grammar. However, we can also generate programs which haveundesirable features, such as the program fragment:

var x, y: nat;begin

x := z;z := not(x)

end

In this program, we have not declared the variable z. Also, we have problems with the variabletypes. First, the variable x has been declared to be of sort nat, yet the function not has beendeclared in the signature to take an argument of sort bool. Second, the function not has beendeclared to return an argument returns an argument of sort bool, which would imply that thevariable z should have been declared to have been of sort bool. This contradicts the implicationof the first assignment that x and z are of the same sort, yet x has been declared to be of sortnat.

In general, in this grammar, we cannot enforce such conditions as:

(i) every variable used in the program is declared;

11.6. A WHILE PROGRAMMING LANGUAGE OVER A DATA TYPE 433

(ii) expressions are assigned to variables of the same sort;

(iii) functions are applied to variables of the right sort; and

(iv) a variable is declared only once.

11.6 A While Programming Language over a Data Type

In the last section we gave grammars that modelled the syntax of

a while language WP(ΣNaturals with Tests) for computing on a data type of naturalnumbers with fixed signature ΣNatural with Tests .

To compute on another data type, we need to adapt the grammars to model the syntax of

a while language WP(Σ ) for computing on an arbitrary data type of fixed signatureΣ .

This adaptation is informative for, in abstracting from the natural numbers to any data type,we clarify how data and the flow of control are combined in imperative programming. Theadaptation can be used to define all sorts of special languages, such as a while language forreal number computation, or for computational geometry, or for syntax processing, simply bychoosing appropriate signatures.

11.6.1 While Programs for an Arbitrary, Fixed Signature

We can construct a while program grammar which computes over the types and operationsspecified by an arbitrary fixed signature Σ . First, we choose and fix any signature Σ containingthe Booleans as shown below:


signature

sorts . . . ,s , . . .

Bool

constants...

c: → s...

true,false: → Bool

operations...

f : s(1 )× · · · × s(n)→ s......

r : t(1 )× · · · × t(m)→ Bool...

not: Bool→ Booland: Bool× Bool→ Boolor: Bool× Bool→ Bool

endsig

We use the notation . . . , r , . . . to highlight the tests on the data.Next, we choose the alphabet and variables and we remove reference to the operations on

natural numbers in Section 11.5.What changes will we need to make to the grammar of Section 11.5? Obviously, the signature

has changed and so we need to change the form of expressions and Boolean expressions thatwe can construct. Also, variable declarations must now be typed. But, surprisingly, much ofthe first grammar is reusable.

We illustrate the structure of the grammar that we produce in Figure 11.9.

Signature

We fix the signature in the same way as we did for while programs computing over Naturals with Tests:

bnf FixedSignature

rules〈FixedSignature〉 ::=signature Fixedsorts . . . , s , . . . , boolconstants . . . ; c:→ s ;. . . true, false: → booloperations . . . ; f : s(1 ) × · · ·× s(n) → s ;. . . . . . ; r : t(1 ) × · · ·× t(m) → bool;. . .not: bool × bool → bool; or: bool × bool → bool; and: bool × bool → boolendsig


DigitLetter

Identifiers


Statements

While Programs over Fixed Signature

I /OTypes

Body


Fixed Signature

Figure 11.9: Architecture of while programs over a fixed signature. Grammars that differ fromthose for while programs over the natural numbers are indicated by rectangles with non-roundedcorners.

Types

The types of the variables that we can use in programs will be the sorts of the fixed signatureΣ , namely:

(i) . . . , s , . . . or

(ii) Boolean.

Thus, we define:

bnf Types

rules〈Type〉 ::= . . . | s | . . . | bool

Boolean Expressions

As before, a Boolean expression can be

(i) true or false, or

(ii) a Boolean combination of comparisons.

But instead of forming comparisons using the relational operators of Naturals with Tests, wenow form comparisons using the relational operators

. . . , r , . . .


of the signature Σ . In addition, we allow identifiers to represent truth values.

bnf BooleanExpressions

import Expressions

rules〈BooleanExpression〉 ::= 〈Identifier〉 | true | false |

not (〈BooleanExpression〉)|and (〈BooleanExpression〉 , 〈BooleanExpression〉)|or (〈BooleanExpression〉 , 〈BooleanExpression〉)|· · · | r (〈t(1 )Expression〉 , . . . , 〈t(m)Expression〉)| · · ·

Note that any comparison we build up has to be done in prefix form. For example, we write

not(b) instead of not band(b1 ,b2 ) instead of b1 and b2

or(b1 ,b2 ) instead of b1 or b2 .

This might seem a retrograde step, but it fits in with the way in which we are forced to writecomparisons involving expressions. The problem is that although we have a syntax for therelations

. . . , r , . . .

on expressions, we do not have any information on the relative priorities of all of these relationsbecause we are working with a completely general signature and any assumption may be valid.

Although the syntax is more clumsy to write, it does mean though that the structure of thegrammar is more transparent.

Expressions

The way in which we form expressions over the signature Σ is analogous to the manner inwhich we formed expressions over the signature Naturals with Tests.

As before, an expression can be

(i) an identifier.

But, instead of numbers, we can have

(ii) constants. . . , c, . . .

from the signature Σ .

And instead of combining expressions with the operations of Naturals with Fixed, we

(iii) combine expressions with the operations

. . . , f , . . .

from the signature Σ


At this point, our model of expressions has an additional level of complexity not so evidentwith forming expressions over the two-sorted Naturals with Tests. The reason why we hadtwo categories of expressions — expressions and Boolean expressions — when dealing withNaturals with Tests is because this signature had two sorts nat (which yielded expressions) andBool (which yielded Boolean expressions).

The signature Σ though has one or more sorts. We require Σ to have a sort Bool, and tohave optionally have additional sorts. Each of these different sorts

. . . , s , . . .

corresponds to an expression of sort s or

s-expression.

So for each constantc :→ s

of the signature Σ of sort s, we can form an s-expression:

c

And for each operationf : s(1 )× · · · × s(n)→ s

of the signature Σ of range sort s , we can form an s-expression

f (e1 , . . . ,en)

provided that we have s(i)-expressions ei for i = 1 , . . . , n.Thus, the sequence of rules for the nonterminals

. . . , 〈sExpression〉, . . .

try to ensure the expressions are correctly sorted in accepted strings, but the identifiers are notsorted.

In general, an expression can be an expression of any type; i.e., an expression is a Booleanexpression or any of . . ., 〈sExpression〉, . . ..

bnf Expressions

import Identifiers ,Boolean Expressions

rules〈Expression〉 ::= 〈BooleanExpression〉 |

· · · | 〈sExpression〉 | · · ·...

...〈sExpression〉 ::= 〈Identifier〉 |

· · · | c | · · · |· · · | f (〈s(1 )Expression〉 , . . . , 〈s(n)Expression〉)| · · ·

......

Again, we are forced into abandoning encoding any precedence rules between operators aswe do not know what they should be. As with Boolean expressions though, the structure ofexpressions is more transparent as a result.


Flattened Version

We combine the amended grammars for expressions and Boolean expressions with the compo-nent grammars for programs, statements, i/o statements, declarations, identifiers, letters andnumbers from Section 11.5, to give:

bnf Flattened While Programs with Fixed Interface


〈FixedSignature〉〈Body〉

〈FixedSignature〉 ::= signature Fixedsorts . . . , s , . . . ,boolconstants . . . ; c:→ s ; . . . true, false: → bool;operations . . . ; f : s(1 ) × · · · × s(n) → s ; . . .. . . ; r : t(1 ) × · · ·× t(m) → bool; . . .not: bool × bool → bool; or: bool × bool → bool;and: bool × bool → boolendsig

〈Body〉 ::= begin 〈Declarations〉〈CommandList〉 end

〈Declarations〉 ::= var 〈DeclarationList〉 ; |var ;


〈Declaration〉 ::= 〈IdentifierList〉 : 〈Type〉〈Type〉 ::= . . . | s | . . . | bool

〈CommandList〉 ::= 〈Command〉 | 〈CommandList〉 ; 〈Command〉〈Command〉 ::= 〈Statement〉 | 〈I/O〉〈I/O〉 ::= 〈Read〉 | 〈Write〉〈Read〉 ::= read (〈IdentifierList〉)〈Write〉 ::= write (〈IdentifierList〉)


bnf Flattened While Programs with Fixed Interface (continued)


〈Assignment〉 ::= 〈Identifier〉 := 〈Expression〉 |〈Conditional〉 ::= if 〈Comparison〉 then 〈CommandList〉 else 〈CommandList〉 fi


〈Expression〉 ::= 〈sExpression〉 | 〈BooleanExpression〉〈BooleanExpression〉 ::= 〈Identifier〉 | true | false |

not (〈BooleanExpression〉)|and (〈BooleanExpression〉, 〈BooleanExpression〉)|or (〈BooleanExpression〉, 〈BooleanExpression〉)|· · · | r (〈t(1 )Expression〉, . . ., 〈t(m)Expression〉)| · · ·

......

〈sExpression〉 ::= 〈Identifier〉 |· · · | c | · · · |· · · | f (〈s(1 )Expression〉, . . ., 〈s(n)Expression〉)| · · ·

......

〈IdentifierList〉 ::= 〈Identifier〉 | 〈IdentifierList〉, 〈Identifier〉〈Identifier〉 ::= 〈Letter〉 | 〈Identifier〉〈Letter〉 | 〈Identifier〉〈Digit〉 | 〈Identifier〉〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t |


R | S | T | U | V | W | X | Y | Z〈Digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9


The grammar for this language allows us to form while programs over a fixed data type.However, we still cannot enforce the conditions that we wanted to in the grammar for computingover the natural numbers:

(i) every variable used in the program is declared;

(ii) expressions are assigned to variables of the same sort;

(iii) functions are applied to variables of the right sort; and

(iv) a variable is declared only once.


11.7 A While Programming Language over all Data Types

In Section 11.6.1, we introduced a technique for producing while programming languages whichcompute over a fixed but arbitrary signature. The expressions in the resulting language areconstrained to those which compute using the constants and functions declared in the fixedsignature which contains the built-in type Bool . In this section we consider a single while

programming language can compute over all possible declared signatures.

11.7.1 A Grammar for While Programs over all Data Types

Now we will see the purpose of our approach to design a grammar which structures strings inits language in two sections. The first section consists of an interface definition and the secondsection the imperative program itself. The intention is that the program section is constrainedto compute over the signature declared in the first section of the string. For example, tocompute over a data type with signature Σ , we can use a while body S . This program isessentially a pair

(Σ , S ).

In the case of the natural numbers, and of any fixed data type, the signature Σ was given bya single rule and was somewhat redundant as a declaration as it was common to all programsin the languages. However, here we need it declared. The set of while programs is essentially

WP = (Σ , S ) | Σ ∈ Sig and S is a while body.

The program (Σ , S ) is displayed as:

program signature Σ endsig; begin S end

Programs

Programs are now use an interface definition.

bnf While Programs over Any Interface

import Signature,Body


〈FixedSignature〉〈Body〉

We illustrate the structure of the grammar for while programs over any data type in Fig-ure 11.10.

Interface Declaration

The interface declaration is defined by the grammar Signature given in Sections 11.1 and 11.2.But for ease of reference we reproduce its flattened version here.

11.7. A WHILE PROGRAMMING LANGUAGE OVER ALL DATA TYPES 441

DigitLetter

Identifiers


Statements

While Programs over Any Signature

I /OTypes

Sorts

Body


Signature

Figure 11.10: Architecture of while programs over any data type. Grammars that differ fromthose for while programs over a fixed signature are indicated by rectangles with non-roundedcorners.

bnf Flattened Signature

rules〈Signature〉 ::= signature 〈Name〉〈Newline〉〈Sorts〉〈Constants〉

〈Operations〉 endsig

〈Names〉 ::= 〈Name〉 | 〈Name〉, 〈Names〉〈Name〉 ::= 〈Letter〉 | 〈Digit〉 | 〈Letter〉〈Name〉 |

〈Digit〉〈Name〉 | 〈Letter〉〈Name〉〈Digit〉〈Name〉〈Sorts〉 ::= sorts 〈SortList〉〈Newline〉〈SortList〉 ::= 〈Sort〉 | 〈Sort〉, 〈SortList〉 | 〈Sort〉〈Newline〉〈SortList〉〈Sort〉 ::= 〈Name〉〈Constants〉 ::= constants 〈ConstantList〉〈Newline〉 |

constants 〈Newline〉〈ConstantList〉 ::= 〈Constant〉 | 〈Constant〉〈Newline〉〈ConstantList〉〈Constant〉 ::= 〈Names〉 : 〈Sort〉〈Operations〉 ::= operations 〈OperationList〉〈Newline〉 |

operations 〈Newline〉〈OperationList〉 ::= 〈Operation〉 | 〈Operation〉〈Newline〉〈OperationList〉〈Operation〉 ::= 〈Names〉: 〈DomainSort〉 → 〈Sort〉〈DomainSort〉 ::= 〈Sort〉 | 〈Sort〉 × 〈DomainSort〉〈Letter〉 ::= 〈LowerCase〉 | 〈UpperCase〉〈LowerCase〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t |


R | S | T | U | V | W | X | Y | Z


Declarations

The variable declarations themselves are unchanged:

bnf Declarations

import Identifiers ,Types

rules〈Declarations〉 ::= var 〈DeclarationList〉 ; |

var ;


〈Declaration〉 ::= 〈IdentifierList〉 : 〈Type〉

However, we shall have different types.

Types

The types of the variables that we can use in programs are the sorts of the signature in theinterface declaration:

bnf Types

import Sorts

rules〈Type〉 ::= 〈Sort〉

Here, the sorts are taken from the unflattened definition of signatures defined by the grammarSignature.

Input/Output Statements

The statements concerned with input and output processing remain unaltered.

bnf I /O

import Identifiers

rules〈I/O〉 ::= 〈Read〉 | 〈Write〉〈Read〉 ::= read (〈IdentifierList〉)〈Write〉 ::= write (〈IdentifierList〉)〈IdentifierList〉 ::= 〈Identifier〉 | 〈IdentifierList〉 , 〈Identifier〉


Statements

The other forms of statements are also unaffected.

bnf Statements

import Expressions ,Boolean Expressions

rules〈Statement〉 ::= 〈Null〉 | 〈Assignment〉 |

〈Conditional〉 | 〈Iteration〉〈Null〉 ::= skip

〈Assignment〉 ::= 〈Identifier〉 := 〈Expression〉〈Conditional〉 ::= if 〈BooleanExpression〉 then 〈CommandList〉 else 〈CommandList〉 fi

〈Iteration〉 ::= while 〈BooleanExpression〉 do 〈CommandList〉 od

Boolean Expressions

Boolean expressions are similar to those we formed over the signature Fixed . The differencehere is that the Boolean expressions we can form using the interface definition depend on whatsignature is declared. So we cannot say at this point

(i) what the names of any tests will be,

(ii) nor what their arguments will look like, either in terms of their number or type.

We can only give the amorphous description that a Boolean expression can be formed as theapplication of some relation to some list of expressions.


import Identifiers , Interfaces

rules〈BooleanExpression〉 ::= 〈Identifier〉 | true | false |

not (〈BooleanExpression〉)|and (〈BooleanExpression〉 , 〈BooleanExpression〉)|or (〈BooleanExpression〉 , 〈BooleanExpression〉)|〈OpName〉 (〈ExpressionList〉)

Expressions

Similar remarks apply to expressions. Expressions that we will be able to form depend onthe constants and operations defined in the interface. We cannot know at this point what thenames of any constants or operations will be. And we cannot know how many arguments anoperation will take, nor what the types of any of those arguments will be.


bnf Expressions

import Identifiers , Interfaces

rules〈Expression〉 ::= 〈Identifier〉 | 〈ConstName〉 |

〈OpName〉 (〈ExpressionList〉)| · · ·〈ExpressionList〉 ::= 〈Expression〉 | 〈Expression〉 , 〈ExpressionList〉

Identifiers

The identifiers that we can form remain unchanged.

bnf Identifiers


rules〈Identifier〉 ::= 〈Letter〉 | 〈Identifier〉〈Letter〉 |

〈Identifier〉〈Digit〉 |〈Identifier〉

Flattened Version

By substituting in the appropriate grammars that are imported, we can form the flattenedgrammar:


bnf Flattened While Programs over any Interface

rules〈WhileProgram〉 ::= program 〈Signature〉 ; 〈Body〉〈Signature〉 ::= signature 〈Name〉〈Newline〉

〈Sorts〉〈Constants〉〈Operations〉〈Names〉 ::= 〈Name〉 | 〈Name〉, 〈Names〉〈Name〉 ::= 〈Letter〉 | 〈Digit〉 | 〈Letter〉〈Name〉 | 〈Digit〉〈Name〉 |

〈Letter〉〈Name〉〈Digit〉〈Name〉〈Sorts〉 ::= sorts 〈SortList〉〈Newline〉〈SortList〉 ::= 〈Sort〉 | 〈Sort〉, 〈SortList〉 |

〈Sort〉〈Newline〉〈SortList〉〈Sort〉 ::= 〈Name〉〈Constants〉 ::= constants 〈ConstantList〉〈Newline〉 | constants 〈Newline〉〈ConstantList〉 ::= 〈Constant〉 | 〈Constant〉〈Newline〉〈ConstantList〉〈Constant〉 ::= 〈Names〉 : 〈Sort〉〈Operations〉 ::= operations 〈OperationList〉〈Newline〉 | operations 〈Newline〉〈OperationList〉 ::= 〈Operation〉 | 〈Operation〉〈Newline〉〈OperationList〉〈Operation〉 ::= 〈Names〉: 〈DomainSort〉 → 〈Sort〉〈DomainSort〉 ::= 〈Sort〉 | 〈Sort〉 × 〈DomainSort〉〈Body〉 ::= 〈Declarations〉 begin 〈CommandList〉 end

〈Declarations〉 ::= var 〈DeclarationList〉 ; | var ;

〈DeclarationList〉 ::= 〈Declaration〉〈DeclarationList〉 ;| 〈Declaration〉〈Declaration〉 ::= 〈IdentifierList〉 : 〈Type〉〈Type〉 ::= 〈Sort〉〈CommandList〉 ::= 〈Command〉 | 〈CommandList〉; 〈Command〉〈Command〉 ::= 〈Statement〉 | 〈I/O〉〈I/O〉 ::= 〈Read〉 | 〈Write〉〈Read〉 ::= read (〈IdentifierList〉)〈Write〉 ::= write (〈IdentifierList〉)


bnf Flattened While Programs over any Interface (continued)




〈BooleanExpression〉 ::= 〈Identifier〉 | true | false |not (〈BooleanExpression〉)|and (〈BooleanExpression〉 , 〈BooleanExpression〉)|or (〈BooleanExpression〉 , 〈BooleanExpression〉)|〈OpName〉 (〈ExpressionList〉)

〈Expression〉 ::= 〈Identifier〉 | 〈ConstName〉 | 〈OpName〉 (〈ExpressionList〉 )

〈ExpressionList〉 ::= 〈Expression〉 | 〈Expression〉 , 〈ExpressionList〉〈Identifier〉 ::= 〈Letter〉 | 〈Identifier〉〈Letter〉 | 〈Identifier〉〈Digit〉 | 〈Identifier〉〈Number〉 ::= 〈Digit〉 | 〈Number〉〈Digit〉〈Letter〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q |

r | s | t | u | v | w | x | y | z |A | B | C | D | E | F | G | H | I | J | K | L | M | N |O | P | Q | R | S | T | U | V | W | X | Y | Z |

〈Digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9


Our aim was to define a syntax for imperative programs that computes with stores carryingany form of data.

The grammar GWhile Programs over any Interface produces programs of the desired form. How-ever, it also produces strings that are defective in a number of ways. For example,

(i) we can define interfaces in which sorts are used but not declared;

(ii) we can produce strings in which operations used in the body have not been declared inthe signature;

(iii) we can form expressions in which an operation does not have the right number of argu-ments;

(iv) we can form expressions in which its operations are not of the correct types.

These are in addition to the problems with interface and variable declarations that we havealready noted.



1. Add rules to the grammar for first-order language specification in Section 11.4.2 to enablevariables that appear in the axioms to be declared.

2. Design two metalanguages for writing grammars in BNF:

(a) a kernel language for grammars without import, and

(b) an extended language for grammars with import,

and define them formally using grammars. Recalling the intended meaning of the import

mechanism from Section 4.9, give a formal definition using the two languages for grammarsby constructing a syntactic transformation that reduces the extended language to thekernel language. (Hint: Since flattening signatures is analogous to flattening grammars,adapt the flattening algorithm for signatures to produce a flattening algorithm for thegrammars in the extended grammar language.)

3. Sketch the derivation of a program for Euclid’s algorithm.

4. Using the grammar of Section 11.6 for any fixed signature, derive grammars for while

languages with the following data types:

(a) the integers;

(b) the real numbers;

(c) the integers and the real numbers; and

(d) characters and strings.

5. Add terminal symbols, non-terminal symbols and production rules to the grammar of thelanguage WP(Σ ) over an arbitrary fixed signature Σ , in order to extend it with:

(a) repeat statements;

(b) for statements;

(c) concurrent assignments; and

(d) case statements.

Which additional constructs can be reduced to the constructs of WP(Σ )?

6. Consider the grammar for the while programming language for all signatures. Sketchhow to derive the following programs:

(a) a program over the real numbers whose signature has the sort declaration prop-erty, body has the variable declaration property, but whose body contains sorts notdeclared in the signature; and

(b) a program over the real numbers whose signature has the sort declaration propertybut whose body fails to have the variable declaration body.



To appear.

Chapter 12

Chomsky Hierarchy and RegularLanguages

Since we have introduced the concepts of formal language, rule, and derivation at the startof Chapter 10 (in Section 10.1 and 10.2), we have shown how to specify many examples oflanguages, theoretical and practical, using grammars, small and large. With this simple know-how, and an impression of the vast scope of its application, it is time to investigate grammarsand their general properties in depth. It is time to make a mathematical theory of formallanguages and grammars.

What do we expect such theory to analyse and explain? For example, here are some generalquestions:

Can we find special types of grammar with properties that make them especiallyuseful, interesting or beautiful?Can we discover algorithms that solve the recognition problem?Can we understand why certain properties cannot be specified by such a naturalnotation as BNF?

In this and the next chapter we begin the development of the theory of formal languagesand grammars. We will be guided by our needs to understand the syntax of languages andwe will not progress beyond the basics. The theory of formal languages in its general form isvast, deep and the source of many special topics designed to meet other needs. The theory is,after all, about the mathematics of processing strings, which are ubiquitous in Mathematicsand Computer Science and their applications, from Business to Biology.

We will concentrate on just two kinds of grammar that are simple because their rules havea simple form.

Now in a grammar G = (T ,N , S ,P) a rule has the form

u → v

where u ∈ (T ∪ N )+ and v ∈ (T ∪ N )∗.What are the simplest rules we can use? What makes a rule simple? Now, there are lots of

conditions we can place on

(i) u to control where the rule may be used; and

(ii) v to control what is possible after it is used.

449

450 CHAPTER 12. CHOMSKY HIERARCHY AND REGULAR LANGUAGES

First, one of the simplest forms of rule u → v occurs when u is not a complex string but asingle non-terminal A ∈ N , i.e.,

A→ v .

Substitutions are simple because wherever one finds A one can replace it by v . Grammarswhose rules all have this property are called

context-free grammars.

Secondly, such a context-free rule A → v is further simplified by placing conditions on v .Suppose v is not a complex string of terminals and non-terminals but either

(i) a string u ∈ T ∗ of terminalsA→ u

after which there can be no more rewrites; or

(ii) a single non-terminal B ∈ N together with a string u ∈ T ∗ of terminals:

A→ uB or A→ Bu

after which there can be one nonterminal to rewrite.

Grammars whose rules all have this property are called

regular grammars.

The classification of grammars using such conditions on rules was started by Noam Chomsky,and is technically and historically one of the first steps in the theory of grammars: see Chapter 2.

In this chapter we will introduce Chomsky’s classification of grammars by their rules andstart explaining the theory of the simplest type of grammar, the regular grammar.

12.1 Chomsky Hierarchy

The complexity of a grammar arises from the nature and, to a lesser extent, the number of itsrules. The grammar for while programs in Section 11.5.2 contained over a hundred productionrules but cannot be considered complicated to use. One reason is that its rules have a simpleform.

Let G be a grammar with a set T of terminals and a set N of non-terminals. The followingclassification of G , by means of the four properties of its production rules P , is called theChomsky Hierarchy.

Definition (Unrestricted) A grammar G is said to be of Type 0 or unrestricted if, and onlyif, it is any grammar of the kind already defined, i.e., we place no restrictions on the productionrules, which all have the form

u→ v

where u ∈ (T ∪ N )+ is a non-empty string and v ∈ (T ∪ N )∗ is any string.

12.1. CHOMSKY HIERARCHY 451

Definition (Context-Sensitive) A grammar G is said to be of Type 1 or context-sensitiveif, and only if, all its production rules have the form

uAv → uwv

where A ∈ N is a non-terminal which rewrites to a non-empty string w ∈ (T ∪ N )+, but onlywhere A is in the context of the strings u, v ∈ (T ∪ N )∗.

Definition (Context-Free) A grammar G is said to be of Type 2 or context-free if, and onlyif, all its production rules have the form

A→ w

where A ∈ N is a non-terminal that can always be used to rewrite to a string w ∈ (T ∪ N )∗.

Definition (Regular) A grammar G is said to be of Type 3 or regular if, and only if, eitherall its production rules have the form:

A→ aB or A→ a

or else all its production rules have the form:

A→ Ba or A→ a

where A,B ∈ N are non-terminals and a ∈ T is a terminal.

Each of the four types of grammar defines a type of language as follows.

Definition (Languages) A language L ⊆ T ∗ is unrestricted, context-sensitive, context-free orregular if, and only if, there exists a grammar G of the relevant type, such that L(G) = L.

The four conditions on grammars are related as follows: for any L,

L regular⇒ L context-free⇒ L context-sensitive⇒ L unrestricted

and are illustrated in Figure 12.1.

regular

context-free

context-sensitive

unrestricted

Figure 12.1: The Chomsky hierarchy.


12.1.1 Examples of Equivalent Grammars

Consider the languageLa2n

= a i | i is even.We will give four different grammars that define this language, one from each of the four typesin the Chomsky Hierarchy. We can generate this language using an unrestricted grammar:

grammar Gunrestricted a2n

terminals a

nonterminals S

start symbol S

productions S → εS → aaa → aaa

or using a context-sensitive grammar:

grammar Gcontext-sensitive a2n

terminals a

nonterminals S

start symbol S

productions S → εS → aaSaS → aSaa

or using the context-free grammar of Examples 10.2.2(3):

grammar Gcontext-free a2n

terminals a

nonterminals S

start symbol S

productions S → εS → aSa

or using the regular grammar:


grammar G regular a2n

terminals a

nonterminals S

start symbol S

productions S → εS → aAA → aS

Note that, as illustrated in Figure 12.2,

(i) the grammar Gunrestricted a2n

is unrestricted and is not context-sensitive (and hence, is notcontext-free or regular) because of the production rule a → aa;

(ii) the grammar Gcontext-sensitive a2n

is context-sensitive (and hence, also unrestricted), and isnot context-free (and hence, not regular) because of the production rule aS → aSaa;

(iii) Gcontext-free a2n

is context-free (and hence, also context-sensitive and unrestricted), and isnot regular because of the production rule S → aSa; and

(iv) G regular a2n

is regular (and hence, also context-free, context-sensitive and unrestricted).

Each of these grammars generates the language La2n

:

L(Gunrestricted a2n

) = L(Gcontext-sensitive a2n

) = L(Gcontext-free a2n

) = L(G regular a2n

) = La2n

.

Finally, we conclude that the language La2n

is regular, as we can find a regular grammarGregular a2n

such that La2n

= L(G regular a2n

).

12.1.2 Examples of Grammar Types

1. All the grammars of Chapter 10 are context-free.

2. The grammars developed to generate the while languages in Sections 11.5.2–11.7 arecontext-free.

3. The grammars for languages for interface definition and data type specifications in Sec-tions 11.1 and 11.4 are context-free.

4. The grammar


Grammars

Languages

Gcontext sensitive a2n

Gunrestricted a2n

Gunrestricted a2n

Gcontext free a2n

La2n

context-free

unrestricted

regular

context-sensitive

context-freeregular

context-sensitiveunrestricted

Figure 12.2: Equivalent grammars to generate the language La2n

= a i | i is even.

grammar G

terminals a, b

nonterminals S

start symbol S

productions S → aSaS → bSbS → ε

is context-free. The language generated by this grammar is

L = xxR | x ∈ a, b∗

where xR is the reverse of x .

5. The grammar


grammar Gan

terminals a

nonterminals S ,A,B

start symbol S

productions S → aS → aS

is regular. The language defined by this grammar is

L(Gan

) = an | n ≥ 1.

6. The grammar

grammar Ganbn

terminals a, b

nonterminals S

start symbol S

productions S → abS → aSb

is context-free. The language defined by this grammar is

L(Ganbn

) = anbn | n ≥ 1.

7. The grammar

grammar Ganbncn

terminals a, b, c

nonterminals S ,A,B

start symbol S

productions S → abc Ab → bA bB → BbS → aAbc Ac → Bbcc aB → aaA

aB → aa

is context-sensitive. The language defined by this grammar is

L(Ganbncn

) = anbncn | n > 1.A sample derivation using Ganbncn

is given in Figure 12.3. To add clarity, the string thathas been matched against the left-hand-side of a production rule, has been highlighted.Later, in Chapter 14, we will see that this is not context-free.


a Ab c

S → aAbc

S ⇒ aabbcc⇒

aB → aa

aB bbcc⇒

bB → Bb

a bB bcc

Ac → Bbcc

⇒ab Ac

Ab → bA

⇒

Figure 12.3: Derivation of aabbcc in the context-sensitive grammar G anbncn

.

In this, and subsequent chapters, we shall prove the following.

• There does not exist an equivalent regular grammar G such that L(G) = L(G anbn

). This,combined with the fact that Ganbn

is context-free, determines that the language L(Ganbn

)is context-free.

• There does not exist an equivalent context-free grammar G such that L(G anbn

) This,combined with the fact that Ganbncn

is context-sensitive, determines that the languageL(Ganbncn

) is context-sensitive.

This is illustrated in Figure 12.4.

Grammars

Languages

Gan

L(Gan

)

Ganbn

Ganbncn

L(Ganbn

) L(Ganbncn

)

context-free

unrestricted

regular

context-sensitive

context-freeregular

context-sensitiveunrestricted

Figure 12.4: Hierarchy of languages L(Gan

), L(Ganbn

) and L(Ganbncn

).

12.2. REGULAR LANGUAGES 457

The form of the rules have a profound effect on the derivations. We will consider some specialproperties of the derivations that are possible with both regular and context-free grammars.

12.2 Regular Languages

First, we look at a very simple set of languages, the

regular languages.

These languages allow very rudimentary strings to be built up. For example, they candescribe how a program identifier, program keyword, or a number may be constructed. But,as we will see, they cannot construct languages that require the ability to nest an arbitrarynumber of matching pairs of symbols, such as parentheses or begin-end pairs.

Regular languages are generated by grammars where the production rules all conform to avery limited template. Whilst this imposes great limitations on the languages that are produced,it does mean that these languages can be processed very efficiently.

12.2.1 Regular Grammars

We divide the regular grammars into two categories:

left-linear grammars and right-linear grammars

based on the form of their production rules.Left-linear grammars build strings by expanding possibilities on the left and generating

symbols from the right, and right-linear grammars build strings by expanding possibilities onthe right and generating symbols from the left. There is no difference in the set of languagesthat left-linear and right-linear grammars can produce (exercise). It is purely a matter ofstylistic preference in their theoretical study; practical implementations, of course, may favourone over another.

Definition (Left-Linear Grammars) A grammar is left-linear if all its production rules havethe form:

A→ Bu or A→ u

Definition (Right-Linear Grammars) A grammar is right-linear if all its production ruleshave the form:

A→ uB or A→ u

12.2.2 Examples of Regular Grammars

Here are some examples of regular grammars and the languages they generate.

Finite Languages The following is an example of a grammar that happens to satisfy theconditions for being both left-linear and right-linear, as there are no non-terminals in the right-hand side of any the production rules.


grammar Gab,aabb,aaabbb

terminals a, b

nonterminals S

start symbol S

productions S → abS → aabbS → aaabbb

As there are no non-terminal symbols on the right-hand side of any of the production rules,the language that is generated is finite. In this particular case, the language L(G ab,aabb,aaabbb)generated by the grammar Gab,aabb,aaabbb contains precisely three strings, namely ab, aabb andaaabbb.

We note the following fact, leaving its proof as an exercise:

Lemma (Finite Languages are Regular) Any finite language L ⊆ T ∗ is a regular language.

Infinite Languages The following two regular grammars

G left-linear ambn

and Gright-linear ambn

both generate the languageambn | m, n ≥ 1.

This is a set containing an infinite number of strings.The first of these grammars is left-linear:

grammar G left-linear ambn

terminals a, b

nonterminals S

start symbol S

productions S → SbS → TT → TaT → a

The other is right-linear:


grammar G right-linear ambn

terminals a, b

nonterminals S

start symbol S

productions S → aSS → TT → bTT → b

Numbers The following is a right-linear grammar for generating natural numbers in base10 notation. Note that the grammar disallows numbers with leading zeros, so it generates thelanguage

L = 0, 1, 2, . . ..

grammar Regular Number

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals digit , digits , number

start symbol number

productions number → digitnumber → 1 digitsnumber → 2 digits

...number → 9 digitsdigits → 0 digitsdigits → 1 digits

...digits → 9 digitsdigits → digitdigit → 0digit → 1

...digit → 9


As an example of a derivation using this grammar, we generate the string 1984:

number ⇒ 1 digits

⇒ 19 digits

⇒ 198 digits

⇒ 198 digit

⇒ 1984

Program Identifiers The following is a left-linear grammar for generating program identi-fiers which we define to be a string that starts with a letter and then can be followed by anymix of letters, numbers or underscores.

grammar Regular Identifier

terminals a, b, c, d, e, f, g, h, i, j, k, l,m, n, o, p, q, r, s, t, u, v,w, x, y, z,A,B,C,D,E,F,G,H, I, J,K, L,M,N,O,P,Q,R, S,T,U,V,W,X,Y,Z, ,0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals identifier , letter

start symbol identifier

productions identifier → identifier aidentifier → identifier b

...identifier → identifier zidentifier → identifier Aidentifier → identifier B

...identifier → identifier Zidentifier → identifieridentifier → identifier 0identifier → identifier 1

...identifier → identifier 9identifier → letterletter → aletter → b

...letter → zletter → Aletter → B

...letter → Z


Postcodes

We can give the language for postcodes from Section 10.4.1 as a regular grammar. Recall thata postcode has one of three forms:

Letter1 Letter2 Digit1 Digit2 Digit3 Letter3 Letter4 ,Letter1 Digit1 Digit2 Digit3 Letter3 Letter4 orLetter1 Letter2 Digit1 Digit3 Letter3 Letter4

Thus, the first symbol is always a letter, the second is a letter or a digit, the third is always adigit, and so on.


grammar Regular Postcodes

terminals A,B,C,D,E,F,G,H, I, J,K, L,M,N,O,P,Q,R, S,T,U,V,W,X,Y,Z,0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals postcode, symbol2 , digit1 , digit2 , digit3 , letter3 , letter4

start symbol postcode

productions postcode → Asymbol2...

postcode → Zsymbol2

symbol2 → Adigit1...

symbol2 → Zdigit1symbol2 → 0digit2

...symbol2 → 9digit2

digit1 → 0digit2...

digit1 → 9digit2digit1 → 0digit3

...digit1 → 9digit3

digit2 → 0digit3...

digit2 → 9digit3

digit3 → 0letter3...

digit3 → 9letter3

letter3 → Aletter4...

letter3 → Zletter4

letter4 → A...

letter4 → Z

12.3. LANGUAGES GENERATED BY REGULAR GRAMMARS 463

As an example of a derivation using this grammar, we generate the postcode SA2 8PP:

postcode ⇒ S symbol2

⇒ SA digit2

⇒ SA2 digit3

⇒ SA2 8 letter3

⇒ SA2 8P letter4

⇒ SA2 8PP

Non-regular grammar

A grammar G = (T ,N , S ,P) whose set P of production rules contain both left-linear andright-linear rules can define languages that are not regular. Thus, mixing the types of rule doesnot lead to a regular grammar. To see the reason for this, consider the following grammar Gwith precisely this property:

grammar G

terminals a, b

nonterminals S ,A

start symbol S

productions S → aAS → abA → Sb

The language generated by G is

L(G) = anbn | n ≥ 1.

Clearly, L(G) is context-free because G is context-free. Shortly, we will show that L(G) is notregular.

12.3 Languages Generated by Regular Grammars

We can take advantage of the simple structure of regular grammars to

• predict how a string will be built up from a given grammar, and

• help design a grammar for a regular language.

12.3.1 Examples of Regular Derivations

We take the two grammars

G left-linear ambn

and Gright-linear ambn


from Section 12.2.2 which generate the same language

ambn | m, n ≥ 1but

• G left-linear ambn

is left-linear, and

• Gright-linear ambn

is right linear.

We consider the derivation of the string:

aabbb

In the left linear grammar G left-linear ambn

, we derive aabbb by:

S ⇒ Sb

⇒ Sbb

⇒ Sbbb

⇒ Tbbb

⇒ Tabbb

⇒ aabbb

There are three things to note in this derivation:

(i) at each step whilst we are still building it up, there is exactly one non-terminal in thestring;

(ii) this non-terminal always appears at the left end of the string being created; and

(iii) the string is built from the last terminal symbol, from right to left, through to the firstterminal.

Now, we compare this derivation to that which results from the right-linear grammarGright-linear ambn

for aabbb.

S ⇒ aS

⇒ aaS

⇒ aaT

⇒ aabT

⇒ aabbT

⇒ aabbb

Again, we note three things in this derivation:

(i) at each step whilst we are still building it up, there is exactly one non-terminal in thestring;

(ii) this non-terminal always appears at the right end of the string being created; and

(iii) the string is built from the first terminal, from left to right, through to the last terminal.

In fact, the observations we have made about these two derivations are general to regularlanguages. We spell this out in the next section.


12.3.2 Structure of Regular Derivations

The very restricted form of production rules in regular grammars results in derivations that fol-low a predictable fashion. In a left-linear grammar, strings are built up by repeatedly replacingthe non-terminal symbol that is present as the first symbol of a string. Thus, given a string

a1a2 · · · an

generated by a left-linear grammar, the first production rule applied that has a terminal symbolappearing in the right-hand side, will produce the last symbol(s) of the string: the last terminalpresent in the production rule will be an . If there is more than one terminal present in the rule,these will be ai · · · an . This pattern is repeated, with the string being created from right-to-left,with there being only ever one non-terminal present at the left end of the string until this singlenon-terminal is replaced by a string of terminals.

In a right-linear grammar, there is also exactly one non-terminal present in a string whilstit is being built up. However, this non-terminal is at the right end of the string, until it isreplaced by a string of terminals. The string is created from terminal symbols from the left tothe right.

This pattern of the way in which strings are derived in regular grammars is captured in thefollowing two lemmas. The first lemma deals with derivations from an arbitrary non-terminal,and the second considers what happens when the derivation is started from the start symbol.

Lemma (Regular Derivations) Let G = (T ,N , S ,P) be a regular grammar.For any string w ∈ (N ∪ (N ∪ T )∗) involving at least one non-terminal that we can derive

from G via the derivation sequence

w0 ⇒ w1 ⇒ w2 · · · ⇒ wn

wherew0 = S and w = wn

then we can do so if, and only if, there exist strings of terminals u0 , u1 , . . . , un ∈ T ∗ andnon-terminals A0 ,A1 , . . . ,An ∈ N , such that:

(a) for each of the stages 0 ≤ i ≤ n, we have a derivation

wi =

Aiui · · · u1 if G is left-linear, or

u1 · · · uiAi if G is right-linear.

(b) for each of the stages 0 ≤ i ≤ n − 1 , we continue growing the string from the productionrule

Ai → Ai+1ui+1 ∈ P if G is left-linear, or

Ai → ui+1Ai+1 ∈ P if G is right-linear.

Proof We prove this by induction on the length of derivation sequences.


Base Case n = 1 :

By definition of grammar derivations,

S ⇒ w

if, and only if, there exists some production rule

S → w ∈ P .

But, the lemma requires us to have at least one non-terminal in w , so given thepossible production rules in a regular grammar, the possible derivations are

S ⇒

A1u1 if G is left-linear, or

u1A1 if G is right-linear.

This is possible precisely when

S →

A1u1 ∈ P if G is left-linear, or

u1A1 ∈ P if G is right-linear.

Induction Step Suppose that the Lemma holds for all values of n ≤ k .

First, we consider the “if” side of the lemma.

Then for derivation sequences of length n = k + 1 , we have:

S ⇒k wk ⇒ wk+1

Now, by the Induction Hypothesis, we know that wk has the form

wk =

Akuk · · · u1 if G is left-linear, or

u1 · · · ukAk if G is right-linear.

and that for 0 ≤ i ≤ k ,

Ai → Ai+1ui+1 ∈ P if G is left-linear, or

Ai → ui+1Ai+1 ∈ P if G is right-linear.

So now let us consider the last derivation step from wk to wk+1 :

S ⇒k Akuk · · · u1 ⇒ wk+1 if G is left-linear, or

S ⇒k u1 · · · ukAk ⇒ wk+1 if G is right-linear.

The lemma requires wk+1 to have at least one non-terminal, so this restrictsthe possibility to rules of the form:

Ak →

Ak+1uk+1 if G is left-linear, or

uk+1Ak+1 if G is right-linear.


So, this will produce the derivation sequence

S ⇒k Akuk · · · u1 ⇒ Ak+1uk+1uk · · · u1 if G is left-linear, or

S ⇒k u1 · · · ukAk ⇒ u1 · · · ukuk+1Ak+1 if G is right-linear.

Now, consider the “only if” side of the lemma. This follows immediately fromthe definition of derivation sequences.

2

Now we use this lemma in considering what happens when the first derivation is from thestart symbol of the grammar.

Lemma (Regular Languages Sentence Derivations) Let G = (T ,N , S ,P) be a regulargrammar.

For any string w ∈ L(G) that we can derive from G via the derivation sequence

w0 ⇒ w1 ⇒ w2 · · · ⇒ wn

wherew0 = S and wn = w ∈ L(G)

then we can do so if, and only if, there exists strings of terminals u0 , u1 , . . . , un ∈ T ∗ andnon-terminals A0 ,A1 , . . . ,An ∈ N , such that the following hold.

(a) For each of the non-final stages 0 ≤ i ≤ n − 1 : the derivations

wi =

Aiui · · · u1 if G is left-linear, or

u1 · · · uiAi if G is right-linear;

(b) and the derivation at the final stage n,

wn = u1 · · · un .

(c) For each of the non-final stages 0 ≤ i < n − 1 , we have a production rule

Ai → Ai+1ui+1 ∈ P if G is left-linear, orAi → ui+1Ai+1 ∈ P if G is right-linear;

(d) and the production at the final stage n,

An−1 → un ∈ P

produces only terminal symbols.

Proof The intermediate stages (cases (a) and (c)) follow immediately from the Regular Lan-guages Derivation Lemma.

For cases (b) and (d), we again split the proof into “if” and “only if” parts.


For the “if” case, we know from (a) that

wn−1 =

u1 · · · un−1An−1 if G is left-linear, or

An−1un−1 · · · u1 if G is right-linear.

We are also required to produce a string of terminals, so wn ∈ T ∗. Thus, by the nature ofregular grammars, the production rule will be of the form

An → un ∈ P

where An ∈ N , un ∈ T ∗, so we have a derivation of the form

S ⇒ An−1un−1 · · · u1 ⇒ unun−1 · · · u1 if G is left-linear, or

S ⇒ u1 · · · un−1An−1 ⇒ u1 · · · un−1un if G is right-linear.

The “only if” part of the proof follows directly from the definition of derivation sequences.2

This lemma tells us how we go about creating a grammar for a regular language, but notall languages are regular. If we cannot design a regular grammar for a language, how do weknow if we have not tried hard enough to create the grammar, or if the language is not regularin the first place? The following section gives us a tool we can use to answer this question.

12.4 The Pumping Lemma for Regular Languages

The Pumping Lemma for Regular Languages takes advantage of the form of strings created byregular grammars, and as we shall see in Chapter 14, the Pumping Lemma for Context-FreeLanguages takes advantage of the form of strings created by context-free grammars.

The principle used in these Pumping Lemmas is the dichotomy between

• the finiteness of a grammar: it has only a finite number of terminal symbols, non-terminalsymbols, start symbol and production rules; and

• the infinite number of strings that a grammar can generate.

The form of a grammar restricts the way we can grow strings to an arbitrary length. Roughlyspeaking,

• a regular grammar can allow for repetition of one segment of a string (e.g., an | n ≥ 1)whereas

• a context-free grammar can allow for repetition of two segments of a string, (e.g., anbn |n ≥ 1) whereas

• a context-sensitive grammar can allow for repetition of three or more segments of a string(e.g., anbncn | n ≥ 1).

So, for example,

12.4. THE PUMPING LEMMA FOR REGULAR LANGUAGES 469

• the language uanw | n ≥ 1 is regular,

• the language uanwbny | n ≥ 1 is context-free,

• the languages uanwbnycnz | n ≥ 1 and uanwbnycndnz | n ≥ 1, are context-sensitive.

Theorem (Pumping Lemma for Regular Languages) Let G be a regular grammar with aset T of terminals. Then there exists a number k = k(G) ∈ N that will depend on the grammarG such that, for any string z ∈ L(G) with length |z | > k then we can write

z = uvw

as the concatenation of strings u, v ,w ∈ T ∗ and

(a) the length of the string |v | ≥ 1 (i.e., v is not an empty string);

(b) the length of the portion |uv | ≤ k; and

(c) for all i ≥ 0 , the stringuv iw ∈ L(G).

Before proving the result let us explore what it means.The theorem says that for any regular grammar G there is a number k = k(G) that depends

on the grammar G such that for any string w that is longer than length k , i.e.,

|z | > k ,

it is possible to split up the string z into three segments u, v and w , i.e.,

z = uvw

with the property that v is non-trivial, the segment uv is smaller than length k , and, inparticular, removing v , or copying v , gives us infinitely many strings that the rules of thegrammar will also accept, i.e.,

remove v uworiginal string z uvwtwo copies of v uvvwthree copies of v uvvvw

...i copies of v uv iw

...

are all in L(G).


Proof of the Pumping Lemma for Regular Languages using Grammars

We show that if a word z ∈ L(G) is long enough then z has a derivation of the form:

S ⇒∗ Aw ⇒∗ Avw ⇒∗ uvw if G is left-linear, orS ⇒∗ uA⇒∗ uvA⇒∗ uvw if G is right-linear.

in which the non-terminal A can be repeated in the following manner

A⇒∗ Av if G is left-linear, orA⇒∗ vA if G is right-linear.

(1 )

before it is eliminatedA⇒∗ u if G is left-linear, orA⇒∗ w if G is right-linear.

(2 )

Having found derivations of the appropriate form it is possible to “pump” using G as follows.If G is left-linear, we have:

S ⇒∗ Aw

⇒∗ Avw

⇒∗ Av2w by (1 )

... i times

⇒∗ Av iw by (1 )

⇒∗ uv iw by (2 ).

If G is right-linear, we have:

S ⇒∗ uA

⇒∗ uvA

⇒∗ uv2A by (1 )

... i times

⇒∗ uv iA by (1 )

⇒∗ uv iw by (2 ).

An observation is that

long words need long derivations.

Letl = max|α| | A→ α ∈ P+ 1 ,

i.e., the longest right hand side of any production in G . In any derivation of a word z ∈ L(G),the replacement of the non-terminal A by the string α, by applying A → α, can introduce|α| < l symbols.

Lemma If the height of the derivation tree for a word z is h then |z | ≤ l h . Conversely, if|z | > lh then the height of its derivation tree is greater than h.

12.4. THE PUMPING LEMMA FOR REGULAR LANGUAGES 471

Let m = |N | be the number of non-terminals in the grammar. We choose k = lm+1 . Bythe above Lemma, the height of derivation tree Tr for the string z with |z | > k is greaterthan m + 1 . So at least one path p through Tr is longer than m + 1 and some non-terminalin this path must be repeated. Travel path p upwards from the leaf searching for repeatednon-terminals. Let A be the first such non-terminal encountered. We can picture Tr as shownin Figure 12.5.

Right-linear grammar G

Left-linear grammar G

wv

S

height ≤ m

height > m + 1

p

A

A

length > lm+1

length ≤ lm+1

u v w

A

A

S

last occurence ofheight ≤ m

height > m + 1

penultimate occurence ofp

length ≤ lm+1

length > lm+1

last occurence of

penultimate occurence of

repeated non-terminal A on path p




u

Figure 12.5: Derivation tree Tr for the string z = uvw where the height of Tr is such that theremust be at least one repetition of a non-terminal A. This is enforced through the constant k ofthe Pumping Lemma. The value of k = lm+1 is calculated from the length l of the longest right-hand side of the production rules and the number m = |N | of non-terminals in the grammarG .

Take v to be the terminal string of subtree whose root is the lower A. If G is left-linear,take vw to be the terminal string of subtree whose root is the upper A, and u the remains ofz . If G is right-linear, take uv to be the terminal string of subtree whose root is the upper A,and w the remains of z . Clearly we can also deduce S ⇒∗ uv iw for i ≥ 0 .

From our choice of A there can be no repeated non-terminals on the path p below the upperA so the height of the subtree of root upper A is less than m + 1 . From the above Lemma, ifG is left-linear then |vw | ≤ lm+1 = k , and if G is right-linear then |uv | ≤ lm+1 = k . Finally


we choose the shortest derivation sequence for z so we cannot have

S ⇒∗ Aw ⇒+ uw = uvw if G is left-linear, orS ⇒∗ uA⇒+ uw = uvw if G is right-linear.

Thus |v | ≥ 1 . 2

12.5 Limitations of Regular Grammars

In this section, we use the Pumping Lemma for regular languages to illustrate just how simplethe regular languages are. We shall consider a very simple language which can be used as abasis for establishing that a language involving any of the following is not regular:

• begin and end markers where these can be nested within each other, but need to bematched in pairs;

• expressions with brackets that are balanced (equal numbers of left and right brackets);

• matching quotes where further matching quotes can appear within a quote;

• iteration statements of the form while-do-od where other iteration statements can benested inside the body of the loop; or

• conditional statements of the form if-then-else-fi where other conditional statements canbe nested inside the then statements or the else statements.

12.5.1 Applications of the Pumping Lemma for Regular Languages

An application of the Pumping Lemma is to demonstrate the following results.

Lemma The languageL = a ibi | i ≥ 1

is not regular.

Proof. Suppose L = L(G) for some regular grammar G . Let k be the constant from thePumping Lemma. We can choose n > k/2 and consider z = anbn , as this satisfies |anbn | > k .By the Pumping Lemma we can rewrite z as the concatenation z = uvw of three strings u, v ,w , with v not empty and such that for all i ≥ 0 , uv iw ∈ L.

Consider v in z . We explore two possible options for the form that v could take; we showthat both are not possible, and hence that anbn | n ≥ 1 is not regular. The two options welook at are as follows.

(i) We have a mix of symbols in v :

a · · · a · · · ab · · · b · · · b

(ii) We do not have a mix of symbols in v , i.e., either we only have as or only have bs:

a · · · a · · · a · · · ab · · · b or a · · · ab · · · b · · · b · · · b

12.5. LIMITATIONS OF REGULAR GRAMMARS 473

Case (i): v has a mix of symbols So, for the first case, suppose v contained a break, i.e., bothan a and a b. The Pumping Lemma tells us that the string uv 2w has to be in the language.However, in the string v 2 = vv there would be some instance of a b that would precede someinstance of an a.

For example, if v = ab then v 2 = abab; if v = aabb then v 2 = aabbaabb, and so on. Itdoes not matter what the strings u or w are as the string uv 2w would not be in the languageanbn | n ≥ 1 because the order of the symbols is not right.

Thus, v cannot contain a break, as the Pumping Lemma destroys the requirement of theordering of the symbols, that all the occurrences of a must come before all the occurrences ofb.

Case (ii): v does not have a mix of symbols So now we consider that v contains only instancesof a single character, i.e., v is some non-empty string of either a symbols or b symbols. ThePumping Lemma tells us that the string uv 2w has to be in the language. However, in the stringv2 = vv we would end up with a different number of a symbols to b symbols.

For example, suppose uvw ∈ L. Then it has an equal number of a and b symbols. Butlooking at the string uv 2w = uvvw , if v = a then v 2 = aa, so uv 2w will have one more asymbol than the number of b symbols. Similarly, if v = aa, then uv 2w will have two more asymbols, and so on.

Thus, v cannot only contain instances of a single character as the Pumping Lemma destroysthe requirement that all the symbols are present in equal numbers.

Hence, v must be the empty string. But this contradicts the requirements of the PumpingLemma, so there can be no regular grammar for L.

Recall that if x = t1 · · · tn then the reverse of x is

xR = tn · · · t1 .

For example, (abab)R = baba and (abba)R = abba.

Lemma The languageL = xxR | x ∈ a, b∗

is not regular.

Proof. Suppose L = L(G) for some regular grammar G . Let k be the constant from thePumping Lemma. Consider the string

x = a · · · ab · · · b

consisting of a block of a’s of length k followed by a block of b’s of length k . Then

xxR = a · · · ab · · · bb · · · ba · · · a

is a string with |xxR| = 4k . Since the string is longer than k , by the Pumping Lemma, we cansplit up the string

z = xxR = uvw

with |v | ≥ 1 and |uv | < k .


Now, because |uv | < k , we know that the substring v of xxR must contain only a’s, since xbegins with a block of a’s of length k . Furthermore, we know that

uv iw ∈ L

for i = 0 , 1 , 2 , . . ..In the case i = 0 , we have uw ∈ L. But in this string uw there are fewer a’s at the beginning

of uw than there are a’s at the end. Hence, uw cannot be written in the form yyR for anyy ∈ a, b∗. This contradicts uw ∈ L and so we conclude that there is no regular grammar Gsuch that L = L(G).

The Pumping Lemma is not always used to give negative results regarding the non-existenceof regular grammars for languages.

Theorem Let G be a regular grammar. Let k = k(G) ∈ N be the constant of the PumpingLemma. Then

1. L(G) is non-empty if, and only if, there is a z ∈ L(G) with |z | < k.

2. L(G) is infinite if, and only if, there is a z ∈ L(G) with k ≤ |z | < 2k.

Proof. Consider Statement 1. Trivially, if z ∈ L(G) with |z | < k then L(G) 6= ∅. Conversely,suppose L(G) 6= ∅. Let z ∈ L(G) have minimal length so that for any other z ′ ∈ L(G),

|z | ≤ |z ′|;

such shortest strings exist but are not, of course, unique. Now we claim that |z | < k .Suppose for a contradiction that |z | ≥ k . Then, by the Pumping Lemma, we can write

z = uvw and know that uv iw ∈ L(G) for all i = 0 , 1 , . . .. In particular, uw ∈ L(G) and|uw | < |z | since we have removed the non-empty string v . This contradicts the choice of z asa word in L(G) of smallest possible length. Therefore, |z | < k .

Consider Statement 2. If z ∈ L(G) with k ≤ |z | < 2k then, by the Pumping Lemma, wecan write z = uvw and know that uv iw ∈ L(G) for all i = 0 , 1 , . . .. Thus, L(G) is infinite.

Conversely, suppose L(G) is infinite. Then there must exist z ∈ L(G) such that |z | ≥ k . If|z | < 2k then we have proved Statement 2.

Suppose that no string z ∈ L(G) has k ≤ |z | ≤ 2k − 1. Let z ∈ L(G) have minimal lengthoutside this range so that at least |z | ≥ 2k . By the Pumping Lemma, we can write z = uvwwith

1 ≤ |v | ≤ k

since |uv | ≤ k and, by pumping with i = 0 , deduce that

uw ∈ L(G).

Now |uv | < |z |. Thus, either |z | was not of shortest length ≥ 2k or k < |uw | < 2k which is acontradiction in both cases.



1. Produce grammars

(a) G that is context-free but not regular, and

(b) G′ that is regular

such thatL(G) = L(G ′) = ab, abab, ababab, . . . = (ab)n | n ≥ 1.

2. Extend the grammar Ganbncn

of Section 12.1.2 that generates the language

L(Ganbncn

) = anbncn | n > 1

to produce a grammar Ganbncndn

that generates the language

L(Ganbncndn

) = anbncndn | n > 1.

3. Consider the context-free grammar:

grammar G

terminals a, b

nonterminals S

start symbol S

productions S → aSbS → SSS → ε

What is the language L(G) ⊆ a, b∗, defined by the grammar G? Suppose we rewritethis grammar with parentheses ( and ) replacing a and b, respectively. What does thelanguage now represent?

4. Prove that every finite language is regular.

5. Produce a left-linear grammar for numbers that produces the same language as is gener-ated by the grammar GRegular Number in Section 12.2.2.

6. Construct an algorithm to convert an arbitrary right-linear grammar G into a left-lineargrammar G ′ such that they generate the same language L(G) = L(G ′). Apply youralgorithm to the grammars:

(a) GRegular Number ; and

(b) G right-linear ambn

from Section 12.2.2.

7. Give a regular grammar for HTTP addresses that determines the same language as thecontext-free grammar in Section 10.4.2.


8. The reverse xR of a string x ∈ T ∗ is defined inductively by

Base Case x = t is a terminal:tR = t .

Induction Step (w .t)R = t .wR where t ∈ T and w ∈ T ∗.

Prove the following for any x , y ∈ T +

(a) (x .y)R = yR.xR; and

(b) (xR)R = x .

9. Use the Pumping Lemma to show the following languages L over a∗ are not regular:

(a) L = ap | p is a prime number; and

(b) L = an | n is a composite number.

10. Use the Pumping Lemma to show the following languages L over a, b∗ are not regular:

(a) L = ww | w ∈ a, b∗;(b) L = w | na(w) = nb(w);(c) L = w | na(w) < nb(w); and

(d) L = w | na(w) 6= nb(w).

Recall, na(w) gives the number of occurrences of a terminal a in the string w .

Chapter 13

Finite State Automata and RegularExpressions

In the previous chapter we began the mathematical theory of syntax by classifying grammars bymeans of the forms of their rules, and then investigating the simplest grammars. Technically, wefocused on the way regular grammars generated strings via derivations. We found that placingconditions on the rules allowed us to prove properties of derivations and, hence, of languagesdefined by the grammars. We now consider the problem of recognising the strings of regularlanguages.

The recognition problem for grammars in general is very complicated. However, for theregular grammars the problem is completely and efficiently solvable. In this chapter, we shallstudy a method for solving the recognition problem that is based on a class of models of simplemachines or automata. These machines provide an equivalent way of defining regular languages:

A language L is a regular language if, and only if, there is a finite state automaton that, givena string w, recognises whether w ∈ L or not.

The notion of finite state automaton transforms the theory of regular languages, for it bringsmany new concepts, techniques, and results of theoretical interest and practical application.

The idea of a finite state automaton first made its scientific mark through its use in syntaxprocessing. However, the idea originates in the 1950s with the problem of modelling in anabstract, mathematical way machines and devices with finitely many states. Such machinesare everywhere, for instance in control systems for machines and instruments, such as trafficlights and phones.

First, we consider automata whose behaviour is nondeterministic. This is a nice class ofautomata to reason about and to apply syntax processing. We devote Sections 13.1 to 13.5 tothis model, giving plenty of examples. Then we consider a more restricted form of automatonwhose behaviour is deterministic and which we can always predict. This is a nice class ofautomata to implement. Fortunately, we can always translate a nondeterministic automatoninto a deterministic automaton. This process is explained in Section 13.6.

Next, in Section 13.7, we show how given a regular grammar G we can construct a nonde-terministic automaton to recognise the language of G ; and, conversely, we show how to derivea regular grammar from a nondeterministic automaton.

Finally, in Section 13.8, we introduce a third method of defining regular languages basedon a class of algebraic formulae called regular expressions. These formulae are invaluable as an

479

480 CHAPTER 13. FINITE STATE AUTOMATA AND REGULAR EXPRESSIONS

algebraic notation that specify all and only regular languages. Like finite automata, regularexpressions are of general computational interest and practical use.

13.1 String Recognition

We consider the problem of string recognition where we are restricted to a very simple set ofrules. As these rules are simple, they allow for very efficient implementations. The downside,as we shall see, is that they will only allow us to recognise simple languages. More complexlanguages can be recognised by extending the rules, but at the cost of performance.

13.1.1 Rules of the Recognition Process

We shall consider string recognition processes that are based on the ability to examine charactersand remember what characters have already been seen. We have three rules that define thelimitations of these recognisers.

First, we impose the restriction that we can only read one input symbol at a time.Second, we allow ourselves the ability to distinguish between all the symbols of an alphabet

over which the input strings are composed of. Thus, we can recognise any particular inputsymbol.

Third, we also allow ourselves an extremely limited form of memory. Effectively, we areallowed to say under what circumstances we want to check for a certain symbol.

The memory comes in the form of a finite set of states. We are not allowed to storeinformation in a state as such: it is the act of being in a certain state that gives us information.Thus, we can use one state to represent that we have read a symbol a, and another to representthat we have read a symbol b. Just as importantly, we can use a state to represent that we haveread a symbol b after having read a symbol a. This allows us to distinguish between strings ofthe form bb and ab.

13.1.2 Example

Let us consider how we can recognise the two strings

‘start’ and ‘stop’

just by following the restrictions laid out above.We need to check if the first symbol of the input string is the letter ‘s’. So we have two

tasks: First, we need to know that we are looking at the first letter of the input string. Second,we need to be able to recognise that this letter is an ‘s’.

We introduce two different states to store information about our progress in processing theinput string. As shown in Figure 13.1, we have:

• a state q0 to represent that we have not yet looked at any of the input string;

• a state q1 to represent that we have looked at the first letter of the input string and wehave recognised the letter ‘s’.

13.1. STRING RECOGNITION 481

q0 q1

s

Figure 13.1: Recognising the letter ‘s’.

Having processed the first letter of the input string, we look at the second letter. Now, weneed to check that this letter is a ‘t’. So, if we are in the state q1 we check that the secondletter is a ‘t’. Again, we introduce a state to record whether we have processed the secondletter. Thus, as shown in Figure 13.2, we have:

• a state q2 to represent that we have looked at the second letter of the input string andwe have recognised the letter ‘t’.

q1 q2

t

Figure 13.2: Recognising the letter ‘t’.

Now, because of the way we are building up our recognition algorithm, we know that:

• the state q2 represents that we have looked at the first two letters of the input string andwe have recognised the string ‘st’.


q0 q1

s

q2

t

Figure 13.3: Recognising the string ‘st’.

For the third letter, we have three tasks to perform. First, we need to know that we arelooking at the third letter of the input string, having already checked the first two letters.Second, we need to be able to recognise that this letter is either an ‘a’ or an ‘o’. Third, we needto be able to distinguish between these two letters. Thus, we introduce two states for each ofthe cases; we have:

• a state q3 to represent that we have looked at the third letter of the input string and wehave recognised the letter ‘a’;

• a state q6 to represent that we have looked at the third letter of the input string and wehave recognised the letter ‘o’.

Hence,


• the state q3 represents that we have looked at the first three letters of the input stringand we have recognised the string ‘sta’, and

• the state q6 represents that we have looked at the first three letters of the input stringand we have recognised the string ‘sto’.


q3

q0 q1

s

q2

ta

q6o

Figure 13.4: Recognising and distinguishing between the strings ‘sta’ and ‘sto’.

For the fourth letter, we need to check that we have either got an ‘r’ if we have already read‘sta’, or a ‘p’ if we have already read ‘sto’. Here, we can use the state memory to determinewhich test we should be doing on the fourth letter. If we are in a state q3 , then we check thatthe fourth letter is an ‘r’. If we are in a state q6 , then we check that the fourth letter is a ‘p’.Again, we introduce states to record whether we have processed the fourth letter, and if so,which word we are checking against. Thus, we have:

• a state q4 to represent that we have looked at the fourth letter of the input string and wehave recognised the letter ‘r’; and

• a state q7 to represent that we have looked at the fourth letter of the input string and wehave recognised the letter ‘p’.

Hence,

• the state q4 represents that we have looked at the first four letters of the input string andwe have recognised the string ‘star’; and

• the state q7 represents that we have looked at the first four letters of the input string andwe have recognised the string ‘stop’.

This is illustrated in Figure 13.5Finally, we introduce a memory state q5 to recognise the letter ‘t’ if we have previously read

the string ‘star’. Hence,

• the state q5 represents that we have looked at the first five letters of the input string andwe have recognised the string ‘start’.

This is illustrated in Figure 13.6Thus, we have

13.1. STRING RECOGNITION 483

q3

r

q4

q0 q1

s

q2

ta

q6o

p

q7

Figure 13.5: Recognising and distinguishing between the strings ‘star’ and ‘stop’. The doublecircle around the state q7 indicates that the letter ‘p’ is the last letter of the string ‘stop’.

p

q7q6o

q4 q5

tr

q3a

q2

t

q1q0

s

Figure 13.6: Recognising and distinguishing between the strings ‘start’ and ‘stop’. The doublecircle around the states q5 and q7 indicates that the letters ‘t’ and ‘p’ are the last letters of thestrings ‘start’ and ‘stop’, respectively.

state processing memoryq0 the empty stringq1 the string ‘s’q2 the string ‘st’. . . . . . . . . . . . . . . . . . . . . .q3 the string ‘sta’q4 the string ‘star’q5 the string ‘start’. . . . . . . . . . . . . . . . . . . . . .q6 the string ‘sto’q7 the string ‘stop’

We need to know that if we reach either the state q5 or q7 and we have consumed all theinput string, that we have successfully recognised the string. If we are in any other state or ifwe have more input, then the string is not ‘start’ or ‘stop’.

Note that if we do not care which string we have recognised, we could use a single state q ′

instead of q5 and q7 .


13.1.3 Algorithm

We can extrapolate general principles from this example to give an algorithm to recognise alanguage consisting of a finite set of finite strings.

Suppose we want to recognise a string

w = a1a2 · · · an

containing n symbols. We introduce n new states

q1 , q2 , . . . , qn

to represent that we have read the strings

a1 , a1a2 , . . . , a1a2 · · · an ,

respectively. If all the input string has been read when the state qn is reached, then we recognisethat input.


q0

a1

qn

an

qn−1q1 q2

a2 . . .

Figure 13.7: Recognising an arbitrary string a1a2 · · · an .

Notice that the example of ‘start’ and ‘stop’ does not quite fit this technique as we optimisedthe recognition of the first two letters ‘s’ and ‘t’ that are common to both strings. Withoutthis optimisation, we get the situation illustrated in Figure 13.8

q ′2 q6q ′

1

q4q3q5q2q1

q7

q0

pot

s

s

t a r t

Figure 13.8: Recognising and distinguishing between the strings ‘start’ and ‘stop’. followingthe general technique.

13.1.4 Generalising

Clearly, we can recognise any finite string using this method, but how do we recognise infinitestrings without introducing an infinite number of states? Whilst this may not immediately

13.2. NONDETERMINISTIC FINITE STATE AUTOMATA 485

seem to be a practical concern, the solution to this problem can be adapted to produce morespace efficient solutions in general.

If we want to recognise a string that has some form of repetitive pattern, then we may beable to take advantage of it. For example, if we have a state q that remembers that we haveread a symbol a, and we can have an arbitrary number of a symbols in the string, then we canre-use the state q to represent that we have read the strings a, aa, aaa, etc. The situationswhere we can and cannot exploit this state memory re-use to recognise a language are preciselydefined in Section 12.4, as we shall see in Section 13.7.4.

First, we consider how we can embody the recognition rules in an abstract hardware modelcalled a finite state automaton.

13.2 Nondeterministic Finite State Automata

An automaton is an abstract machine. Its input consists of a string of symbols. The machinecan read one symbol at a time from its input. It has a finite set of rules that determine howit should react to its input. This program is hardwired into the machine and is fixed for thatautomaton. Thus, given the example in Section 13.1, our automaton would only ever be ableto recognise the strings ‘start’ and ‘stop’. Changing the rules gives a different automaton.

The automaton has a very simple form of memory; at any moment in time it is consideredto be in a particular state. In the case of the example of Section 13.1, the machine can only bein the states q0 , q1 , . . ., q7 .

Its program rules determine how it should behave when it is in some state reading somesymbol. These rules determine whether it should proceed to read another symbol, and whatstate it will be in when it does so. For the example of Section 13.1, the rules tell it how it goesabout recognising the strings ‘start’ and ‘stop’. The rules have the form:

• When in state q0 reading symbol ‘s’ change to state q1 and look at the next input symbol.

• When in state q1 reading symbol ‘t’ change to state q2 and look at the next input symbol.

• When in state q2 reading symbol ‘a’ change to state q3 and look at the next input symbol.

• When in state q3 reading symbol ‘r’ change to state q4 and look at the next input symbol.

• When in state q4 reading symbol ‘t’ change to state q5 and look at the next input symbol.

• When in state q2 reading symbol ‘o’ change to state q6 and look at the next input symbol.

• When in state q6 reading symbol ‘p’ change to state q7 and look at the next input symbol.

The set of possible states the automaton may ever be in is finite. Whilst the rules thatdetermine what the automaton may do are fixed, in a nondeterministic automaton the rulesare allowed to provide a choice of possible actions to take at any given point.

The machine starts in some given initial state. In the case of the example of Section 13.1,the initial state is q0 .

The automaton’s rules define how it may consume its input; if all the input has beenconsumed when it reaches a state that has been designated as a final accepting state, then that


input string is said to be recognised by the automaton. In Section 13.1 we designated the statesq5 and q8 as being final.

Figure 13.9 illustrates a typical finite state automaton. From its initial state it has already

a1 al al+1 al+2

qi

qnq0

q1

. . . . . .

direction

finite state control finite set of instructions

. . .when in state qi and read symbol al+1

change to state qj and move tape left one place

. . .

input tape

head

Figure 13.9: A typical finite state automaton with finite set of control states q0 , q1 , . . ., qi , . . .,qn that can read one input symbol at a time. The symbols that it can read are determined byits finite set of instructions.

consumed some of its input (a1 · · · al) by following its rules. It is now in a state qi and is readingthe symbol al+1 . If it has a rule that says when in a state qi , if it reads the symbol al+1 thenit is allowed to progress to reading the next symbol al+2 of its input providing it changes itsstate to qj .

If it is a non-deterministic state automaton, then it is allowed to have rules that give achoice of possible next states to change to when it is in a state qi reading symbol al+1 . Forsuch a non-deterministic finite state automaton, it may choose any of the rules that say whatstate it should change to given its current situation. For example, in recognising the strings‘start’ and ‘stop’, instead of the single state q1 to recognise the letter ‘s’, we could have hadtwo states q1 and q ′

1 to recognise this letter, as shown in Figure 13.8. Then the state q1 couldbe used when attempting to read the string ‘start’ and the state q ′

1 when attempting to readthe string ‘stop’.

This is a very abstract view of finite state automata that is designed to classify languagesaccording to their complexity by varying different aspects of the definition. We shall not pursuethese differences here and shall just consider the form of finite state automata that we havedescribed. Accordingly, we present the finite state automata that we are interested in using aform that is specifically designed for their study.

13.2.1 Definition of Finite State Automata

A finite state automaton has:

• a finite set Q of states,

• a single initial state q0 ∈ Q ,

• a set F ⊆ Q of final accepting states,

• a finite set T of terminal symbols,


• and a finite set δ : Q × T → P(Q) of rules that describe how the machine can movefrom one state to another by reading terminal symbols: δ(q , a) gives a set of possible nextstates that the automaton can be in after reading the terminal symbol a ∈ T when in astate q .

The most important feature of a finite state automaton is the set of rules that describes itsbehaviour.

13.2.2 Example

Consider our example from Section 13.2 to recognise and distinguish between the strings ‘start’and ‘stop’.

We constructed an automaton that had eight states

Q = q0 , q1 , q2 , . . . , q7.We used the state q0 to represent that we had not read any of the input string: this was ourinitial state

q0 .

We used the states q5 and q7 to represent that we had finished reading the strings ‘start’ and‘stop’, respectively. We designated these states

q5 , q7as being our accepting states.

We used the following rules to describe its behaviour:

When in state q0 reading input ‘s’, change to state q1 and read next input.When in state q1 reading input ‘t’, change to state q2 and read next input.When in state q2 reading input ‘a’, change to state q3 and read next input.When in state q3 reading input ‘r’, change to state q4 and read next input.When in state q4 reading input ‘t’, change to state q5 and read next input.When in state q2 reading input ‘o’, change to state q6 and read next input.When in state q6 reading input ‘p’, change to state q7 and read next input.

Mathematically, we can formalise these rules as the function

δ : Q × T → P(Q)

by

δ(q0 , s) = q1δ(q1 , t) = q2δ(q2 , a) = q3δ(q3 , r) = q4δ(q4 , t) = q5δ(q2 , o) = q6δ(q6 , p) = q7.

It is easier to visualise the automaton’s behaviour using the pictorial representation shownin Figure 13.6.


13.2.3 Different Representations

We shall represent finite state automata in different ways depending on the situation in whichwe are using an automaton.

Pictorial Representation

When dealing with real examples of finite state automata, we shall normally use a pictorialrepresentation of all the internal workings.

Each of the automaton’s finite set of states is shown enclosed in a circle. The rules thatdescribe how the machine may consume input are represented as arrows connecting one stateto another, labelled with input symbols, such that:

qj qk

ai

means that if the machine is in some state qi , then itmay read the symbol ai and move into the new state qj .

Final accepting states are denoted with a double circle.For example, Figure 13.10 (reprinted here for ease of reference) shows a finite state au-

tomaton with eight states (q0 , q1 , . . ., q7 ) where q0 is the initial state, and q5 and q7 are finalaccepting states. The automata can move out of the initial state q0 if the first letter of theinput string is the letter ‘s’. In this case it moves into the state q1 . It then moves into thestates q2 , . . . , q5 in turn if the next letters of the input string are ‘t’, ‘a’, ‘r’ and ‘t’. At thispoint, the automaton is said to accept the string ‘start’ as q5 is a final accepting state.

Similarly, it can accept the string ‘stop’ by progressing through from the initial state q0

through to q1 (read ‘s’), q2 (read ‘t’), q6 (read ‘o’), and the final accepting state q7 (read ‘p’).

q1 q2

q6

q3 q4 q5

q7

q0

s ta

r t

po

initial state: no source for input arrow final state: double circle

Figure 13.10: Finite state automaton that accepts the strings ‘start’ and ‘stop’ in pictorialform.

Display Representation

Sometimes, when we wish to give a more formal description of a finite state automaton we shallpresent it in displayed form.


We may display a finite state automaton by:

automaton

states q0 , q1 , . . . , qn

terminals a0 , a1 , . . . , al

start q0

final qf (1 ), . . . , qf (m)

transitions · · ·δ(qj , ai) = qk

· · ·

This is just a more user-friendly version of the definition of a finite state automaton as a 5-tuple

(q0 , q1 , . . . , qn, a0 , a1 , . . . , al, q0 , qf (1 ), . . . , qf (m), δ).

For example, we can display the finite state automaton illustrated in Figure 13.10 by:

automaton

states q0 , q1 , q2 , q3 , q4 , q5 , q6 , q7

terminals a, o, p, r, s, t

start q0

final q5 , q7

transitions δ(q0 , s) = q1δ(q1 , t) = q2δ(q2 , a) = q3δ(q3 , r) = q4δ(q4 , t) = q5δ(q2 , o) = q6δ(q6 , p) = q7

This is another representation of:

(q0 , q1 , q2 , q3 , q4 , q5 , q6 , q7, a, o, p, r, s, t, q0 , q5 , q7, δ),

where δ : Q × T → P(Q) is the transition function

δ(q0 , s) = q1, δ(q1 , t) = q2, δ(q2 , a) = q3, δ(q3 , r) = q4, δ(q4 , t) = q5,δ(q2 , o) = q6, δ(q6 , p) = q7


13.3 Examples of Automata

13.3.1 Automata to Recognise Numbers

We devote this section to trying to answer the question:

Can we build an automaton to recognise numbers?

What does this question mean? As a first attempt at answering this question, let us try:

Can we make an automaton M such that

L(M ) = n | n is a number.

We shall construct a series of increasingly refined automata to answer this question by increas-ingly refining what the question means.

Recognising each number separately

Let us start by trying to make an automaton that recognises each number separately. We wouldneed a separate transition for each number. So, to recognise the numbers 1 through to 3 , wewould have an automaton shown in Figure 13.11.

qI

q3

q2

q11

2

3

Figure 13.11: Automaton to recognise the numbers 1 to 3 .

We can represent this automaton in display format by:

automaton

states qI , q1 , q2 , q3

terminals 1, 2, 3

start qI

final q1 , q2 , q3

transitions δ(qI , 0)=q1 δ(qI , 1)=q2 δ(qI , 2)=q3

Thus, we would recognise

(i) the number 1 by moving from the initial state qI to the final state q1 ;

13.3. EXAMPLES OF AUTOMATA 491

(ii) the number 2 by moving from the initial state qI to the final state q2 ;

(iii) the number 3 by moving from the initial state qI to the final state q3 .

Clearly, this is a very extravagant way of doing things. Let us start to make some economies.We do not need separate final states q1 , q2 and q3 : provided we get to a final state after havingread a number we shall be happy. This is also something we need to do because we are onlyallowed to have a finite set of states, so this would be a sticking point if we tried to use thistechnique to recognise any number. The adapted automata for recognising just the numbers 1to 3 is shown in Figure 13.12.

3

2

1

qI

qF

Figure 13.12: Automaton to recognise the numbers 1 to 3 with the economy of a single finalstate qF .


automaton

states qI , qF

terminals 1, 2, 3

start qI

final qF

transitions δ(qI , 0)=qF δ(qI , 1)=qF δ(qI , 2)=qFIn our pictorial representation of the automaton it will become difficult to draw lots of

separate transitions, so we will abbreviate Figure 13.12 to Figure 13.13.

q0

q11, 2, 3

Figure 13.13: Automaton to recognise the numbers 1 to 3 with the three separate transitionsfrom qI to qF drawn as a single transition for clarity.

So can we extend our example for recognising the numbers 1 to 3 to all numbers? Wewould have just the two states qI and qF , where qI is the initial state and qF the final state.The set of rules δ : Q × T → P(Q) would be defined by

δ(qI , n) = qF


for every number n.Unfortunately, this is not a valid automata: it breaks the definition for a finite state au-

tomata of only having a finite set of terminal symbols.

Recognising numbers one digit at a time

We need to represent numbers using a finite set of terminal symbols. If we restrict our attentionto just the natural numbers then we can think of a number as a sequence of digits, where adigit is 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 or 9 . Our question becomes:


L(M ) = n | n is a sequence of digits.

We can produce an automaton that will recognise numbers by checking for one digit ata time in the style shown in Figure 13.14. (Note that we use the abbreviation of a singlearrow labelled with ten possible transitions in place of ten separate transitions as we did forFigure 13.13).

qI q1 q2 q3

0 , 1 , . . . , 90 , 1 , . . . , 90 , 1 , . . . , 9

Figure 13.14: Automaton to recognise strings of up to three digits: the numbers 0 , 1 , . . . , 999 .


automaton

states qI , q1 , q2 , q3

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

start qI

final q1 , q2 , q3

transitions δ(qI , 0)=q1 δ(qI , 1)=q1 · · · δ(qI , 9)=q1δ(q1 , 0)=q2 δ(q1 , 1)=q2 · · · δ(q1 , 9)=q2δ(q2 , 0)=q3 δ(q2 , 1)=q3 · · · δ(q2 , 9)=q3

The limitation of this automaton is that we can only recognise numbers with at most threedigits, i.e., numbers in the range 0 , 1 , 2 , . . . , 999 . This would prevent us from recognising thenumber 1066 , for example. We could add another state q4 which would solve that problem,but we cannot keep on doing this indefinitely to recognise any number, as our definition of afinite state automata requires that we only have a finite set of states.

So does our finite state automata definition only allow us to recognise finite languages?Fortunately, the answer is no: there is a way to recognise languages that have an infinitenumber of strings. We just need to have transitions that loop back to an earlier state.


Recognising sequences of digits

Consider the example automata given in Figure 13.15. This has a transition that loops backto an earlier state, namely itself.

qD∗

I

0, 1, . . . , 9

Figure 13.15: Automaton M D∗

to recognise strings of digits of length zero or more.


automaton M D∗

states qD∗

I

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

start qD∗

I

final qD∗

I

transitions δ(qD∗

I , 0)=qD∗

I δ(qD∗

I , 1)=qD∗

I · · · δ(qD∗

I , 9)=qD∗

I

How does this automaton work? We start in the state qD∗

I as this has an arrow pointing toit with no source. We can move from the state qD∗

I back to the state qD∗

I if we read a digit. Wecan repeat this as many times as we like, and at every repetition we end in a (potentially) finalstate. So, for example, we can perform the following sequence of transitions:

start in state qD∗

I move to state qD∗

I by reading the digit 1move to state qD∗



I by reading the digit 6

As qD∗

I is a final state, this means that we can recognise the string

1066

but also that we can recognise the strings:

1 10 106

by stopping after 1, 2 and 3 transitions, respectively.However, we can also stop after 0 transitions: we can start in state qD∗

I and do nothing.This means that we can also recognise the empty string

λ.


Recognising sequences of at least one digit

Now depending on what we mean by a “number”, we may or may not want to allow the emptystring. Suppose we want an automaton that does not include the empty string as a validnumber. Thus, we shall refine our original question to:


L(M ) = n | n is a sequence of at least one digit .

We know now that we have to ensure that the automaton we build has to force the stringto have at least one digit. So we need a transition between the start state and the final state.If we do this to Figure 13.15 we get Figure 13.16.

0, 1, . . . , 9

qI D+ qD+

F

0, 1, . . . , 9

Figure 13.16: Automaton M D+to recognise strings of digits.


automaton M D+

states qD+

I , qD+

F

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

start qD+

I

final qD+

F

transitions δ(qD+

I , 0)=qD+

F δ(qD+

I , 1)=qD+

F · · · δ(qD+

I , 9)=qD+

F δ(qD+

F , 0)=qD+

F δ(qD+

F , 1)=qD+

F · · · δ(qD+

F , 9)=qD+

F

How does this automaton work on our test string 1066? We can perform the followingsequence of transitions:

start in state qD+

I move to state qD+

F by reading the digit 1

move to state qD+


move to state qD+


move to state qD+


As qD+

F is a final state, this means that we can recognise the string

1066

and the strings

1 10 106


by stopping after 1, 2 and 3 transitions, respectively. This time we cannot stop after 0 transitionsas qD+

I is not a final state. This means that we do not recognise the empty string as a validnumber.

Are there any other strings that we do not want? Unfortunately, the answer is probablyyes, depending on what we mean by a “number”. Using Figure 13.16 we can generate stringssuch as 00 , 01 and 007 .

Recognising numbers without leading zeros

If we want to disallow numbers with leading zeros, we shall have to make another modification.We shall need to make sure that the first transition is not a 0 . Making this modification toFigure 13.16 gives Figure 13.17. We can still recognise our test string 1066 as before, and it

1, . . . , 9

qN+

0

qN+

1

0, 1, . . . , 9

Figure 13.17: Automaton M N+ recognises non-zero natural numbers..

does not recognise strings with leading zeros as there is no way of getting out of the start stateby reading a zero. Unfortunately this also means that we cannot read a single zero. So thisattempt is overly restrictive because we have no ruled out too many strings.

We need to add a route out of the start state that will allow us to read a single zero, butdoes not allow us to read strings of digits with leading zeros. We need to add a separate routeout from the start state that recognises just the string 0 . Thus, we add a new transition outof qI into a new final state qF ′ as shown in Figure 13.18.

0, 1, . . . , 9

0

1, . . . , 9

qNF ′

qNI

qNF

Figure 13.18: Automaton M N to recognise numbers.



automaton M N

states qNI , q

NF , q

NF ′

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

start qNI

final qNF , q

NF ′

transitions δ(qNI , 0)=qN

F ′ δ(qNI , 1)=qN

F · · · δ(qNI , 9)=qN

Fδ(qN

F , 0)=qNF δ(qN

F , 1)=qNF · · · δ(qN

F , 9)=qNF

Now this automaton satisfies the question:


L(M ) = n | n is a sequence of at least one digit without leading zeros .

This English definition is now a rather complex requirement. We can express it mathematicallyas a number is

d1d2 · · · dn | d1 , d2 . . . , dn ∈ 0 , 1 , 2 , . . . , 9,n ≥ 1 and

(d1 = 0 , d2 = 0 , . . . , di = 0 , i ≥ 1 )⇒ n = 1.

This definition says that a number

(i) is composed of a sequence of digits d1 , d2 . . ., dn , where a digit is a symbol 0 , 1 , . . ., 8or 9 ;

(ii) there must be at least one digit; and

(iii) if the number starts with zeros, then the only way it can do this is if the number is justa single zero.

13.3.2 Automata to Recognise Finite Matches

We have seen in Section 13.3.1 that it is possible to recognise languages that contain an infinitenumber of strings. In this section we shall explore a limitation on the types of strings thatwe can recognise when the language is infinite. The limitation is that of performing a formof matching where however many occurrences of one string of symbols are equal in number toanother string of symbols. This string of symbols might be

begin and end

or

start and stop

or parentheses such as


( and ).

Such requirements abound in programming languages.Instead of considering each type of matching problem separately, we suppose that we want

to match the number of symbols

a and b.

The details below work for an arbitrary finite string replacing a and b.

Matching on finite strings

Consider the example automata M given in Figure 13.19.

q0

a

q1

a

q2

a

q3

b

q ′1 q ′

2 q ′3

b

b

q4

a

b b

q ′4

bb

Figure 13.19: Automaton to recognise strings with matching numbers of a’s and b’s up to fouroccurrences.

The automata can recognise four strings:

(i) ab by moving through the states q0 , q1 and q ′1 ;

(ii) aabb by moving through the states q0 , q1 , q2 , q ′2 and q ′

1 ;

(iii) aaabbb by moving through the states q0 , q1 , q2 , q3 , q3 ′ , q2 ′ and q1 ′ ; and

(iv) aaaabbbb by moving through the states q0 , q1 , q2 , q3 , q4 , q4 ′ , q3 ′ , q2 ′ and q1 ′ .

Thus, M recognises the language

L(M ) = ab, aabb, aaabbb, aaaabbb= a ibi | 1 ≤ i ≤ 4.

We can represent M in display format by:


automaton

states q0 , q1 , q2 , q3 , q4 , q1 ′ , q2 ′ , q3 ′ , q4 ′

terminals a, b

start q0

final q1 ′

transitions δ(q0 , a)=q1 δ(q1 , a)=q2 δ(q2 , a)=q3 δ(q3 , a)=q4δ(q1 , b)=q ′

1 δ(q2 , b)=q ′2 δ(q3 , b)=q ′

3 δ(q4 , b)=q ′4

δ(q ′2 , b)=q ′

1 δ(q ′3 , b)=q ′

2 δ(q ′4 , b)=q ′

3

The automata M recognises a language with four matching pairs of symbols by employing1 + 2 ∗ 4 = 9 states. We can build new automata on the same principles as M to recogniselanguages which have a finite number l of matching pairs of symbols by employing 1 + 2 ∗ lstates.

However, we cannot use this automaton as a basis for recognising a language L which hasan infinite number of matching pairs of symbols, for example L = anbn | n ≥ 1. This isbecause we can only have automata with a finite number of states. In fact, the situation ismuch worse than this:

We cannot build any automata to recognise the language L = anbn | n ≥ 1.We cannot build any automata to recognise a language which requires infinite match-ing.

This is a fundamental limitation of finite state automata.Let us explore what goes wrong if we try to achieve infinite matching.

Infinite strings but no matching

We know how to generate infinite strings from Section 13.3.1 we just introduce a loop back toan earlier state. Using this technique, we can produce the automaton shown in Figure 13.20.

q0 q1

a

a

b

b

q2

Figure 13.20: Automaton recognises infinite strings but without matching.



automaton

states q0 , q1 , q2

terminals a, b

start q0

final q2

transitions δ(q0 , a) = q1δ(q1 , a) = q1δ(q1 , b) = q2δ(q2 , b) = q2

This automata recognises strings composed of at least one a followed by at least one b.However, it imposes no matching criteria: it recognises the strings:

ab, aab, aaab, . . . , abb, abbb, . . .

The language recognised by this automaton is

a ibj | i , j ≥ 1.

Infinite strings with matching but incorrect ordering

If we try to impose the matching requirement whilst maintaining the infinite strings, we en-counter another problem: we cannot preserve the relative ordering of all the a’s appearingbefore all the b’s.

Figure 13.21 shows an automaton that matches every a move with a b move. Thus, we areguaranteed to have the same number of a’s as b’s. However, the cost of preserving matching isto disturb the ordering by mixing the a’s and b’s together.

q0

a

q1 q2

b

a

Figure 13.21: Automaton recognises infinite strings with matching but with the ordering dis-turbed.



automaton

states q0 , q1 , q2

terminals a, b

start q0

final q2

transitions δ(q0 , a) = q1δ(q1 , b) = q2δ(q2 , a) = q1

This automata recognises strings composed of at least one repetition of the string ab. Itrecognises the strings:

ab, abab, ababab, abababab, . . .

The language recognised by this automaton is

(ab)i | i ≥ 1.

We have seen three examples of automata that do not recognise the required languageanbn | n ≥ 1, but this does not establish that an automaton could not exist. Later inSection 13.7.4, we will prove that there cannot exist such an automaton.

13.4 Automata with Empty Move Transitions

Currently, we can only move from one state of an automaton to another by reading someterminal symbol. We shall relax this definition to allow for the possibility of empty moves,whereby no terminal symbol needs to be read to move from one state to another. This willallow us to create automata on a modular basis by linking pre-existing automata to create newautomata.

We shall show that this extension of the basic definition is only a means of convenience anddoes not buy us any more power.

13.4.1 Description

In our basic definition of a finite state automaton, it is the state transition relation

δ : Q × T → P(Q)

acting on individual terminal symbols that determines when we can move from one state toanother. We now consider an extension of this definition that will also allow us to define emptymoves between states. We shall use a state transition relation

δλ : Q × (T ∪ λ)→ P(Q)

to determine when we can move from a state q ∈ Q to a state q ′ ∈ δλ(q , c) by either reading aterminal symbol c ∈ T , or by performing an empty move and reading the empty string λ.

13.4. AUTOMATA WITH EMPTY MOVE TRANSITIONS 501

Definition (Automata with Empty Moves) A finite state automaton with empty moveshas:

• a finite set Q of states,

• a single initial state q0 ∈ Q ,

• a set F ⊆ Q of final accepting states,

• a finite set T of terminal symbols, and

• a finite set δ : Q × (T ∪ λ)→ P(Q) of rules that describe how the machine can movefrom one state to another by reading either a terminal symbol or the empty string.

Thus, it is the transition relation δ that determines whether or not an empty move betweenstates is permissible or not.

Example

Figure 13.22 is an example of an automaton that has an empty move. The automaton recognisesthe language

a ibj | i , j ≥ 0.

a b

q1q0

λ

Figure 13.22: Automaton with empty moves to recognise the language a ibj | i , j ≥ 0.

Formally, the automata is described by:

automaton

states q0 , q1

terminals a, b

start q0

final q1

transitions δ(q0 , a) = q0

δ(q0 , λ) = q1

δ(q1 , b) = q1


13.4.2 Empty Move Reachable

Having automata with empty moves makes the theoretical machinery a little more complex tomanage. Every time we are considering a state q in an automaton, we need to remember totake into account that there could be

• an empty move from q to some other state q ′, or

• an empty move to q from some other state q ′.

Often, we shall need to think of not just some state q , but all those states which are reachableto and from q via empty moves. With this motivation in mind, we shall introduce a definitionthat tells us whether a state q ′ can be reached from q by following zero or more empty movesfrom either q ′ to q , or from q to q ′.

Definition (Empty move reachable) The set of empty move reachable states from a stateq consists of all those states

δ∗(q , λ)

which can be reached from q by following zero or more empty moves.

13.4.3 Simulating Empty Move Automata

Adding empty moves to our definition of finite state automata does not increase their power.We show that this is the case in two steps. First, we translate between automata with emptymoves and automata without empty moves. Then we show that the simulating automata andthe original automata recognise the same languages.

Algorithm to construct automata without empty moves to simulate automata withempty moves

LetMλ = (Q ,T , δλ, q

initial ,Fλ)

be some finite state automaton with empty moves. We construct a simulating finite stateautomaton without empty moves:

automaton M

states Q

terminals T

start q initial

final F = q | qF ∈ δ∗λ(q , λ) and qF ∈ Fλ

transitions δ(q , t) = δ∗λ(q , t).

We define: the set T of terminals of M to be precisely those of Mλ; the set Q of states ofM to be precisely those of Mλ; and the initial state q initial of M to be precisely that of Mλ.

13.4. AUTOMATA WITH EMPTY MOVE TRANSITIONS 503

We define the set F of final states of M to be those states

qF | qF ∈ δ∗λ(q , λ) and qF ∈ Fλ

from which any final state qF ∈ Fλ in Mλ is empty move reachable.We define the transition function δ : Q × T → P(Q) of M from the transition function

δλ : Q × T → P(Q) of Mλ byδ(q , t) = δ∗λ(q , t).

This gives us a move in M via t for every path in Mλ that we can reach using t and emptymoves.

Example

We apply the algorithm to remove empty moves to the automaton from Figure 13.22. The setof terminals, set of states and initial state of our simulating automaton are all the same as inthe original. The set of final states of the simulating automaton are those of the original (q1)to which we also add the state q0 because we can go from it via an empty move to a final state(q1 ) of the original. The transition function for the simulation has all the non-empty moves ofthe original

q0 ∈ δ(q0 , a)

q1 ∈ δ(q1 , b)

together with all those paths in the original that involve empty moves and produce a singleterminal symbol; in this case we can move from state q0 to state q1 , by either reading a thenperforming an empty move, or by performing an empty move then reading b:

q1 ∈ δ(q0 , a)

q1 ∈ δ(q0 , b)

The automaton that results is given below and is illustrated in Figure 13.23.

q0

aa, b

b

q1

Figure 13.23: Automaton without empty moves to recognise the language a ibj | i , j ≥ 0.


automaton

states q0 , q1

terminals a, b

start q0

final q0 , q1

transitions δ(q0 , a) = q0 , q1δ(q0 , b) = q1δ(q1 , b) = q1

This automaton has no empty moves and also recognises the language

a ibj | i , j ≥ 0.

Theorem (Simulation of finite state automata with empty moves by ones withoutempty moves) For any finite state automaton Mλ with empty moves, there exists a finite stateautomaton M without empty moves, such that they recognise the same language

L(Mλ) = L(M ).

Proof. LetMλ = (Q ,T , δλ, q

initial ,Fλ)

be some finite state automaton with empty moves that recognises a language L. We constructa finite state automaton

M = (Q ,T , δ, q initial ,F )

without empty moves from Mλ to recognise L according to the simulation construction algorithmabove.

We shall shortly show that for any state q ∈ Q and any string w ∈ T ∗,

δ∗(q ,w) = δ∗λ(q ,w). (1 )

Given this result, the languages

L(M ) = w | δ∗(q initial ,w) ∈ F

that are generated by M , and

L(Mλ) = w | δ∗λ(q initial ,w) ∈ Fλ

by Mλ, respectively, will be the same.We conduct the postponed stage (1 ) in two phases. First we consider if the empty string is

in the language, and then we consider all other strings by induction on the length of w .For |w | = 0, so w = λ, then λ ∈ L(M ) if either the start state is a final state or if there is a

path involving only empty moves from the start state to some final state. The initial state ofthe simulating automaton is the same as the empty move automaton, and the set of final states

13.5. MODULAR NONDETERMINISTIC FINITE STATE AUTOMATA 505

of the simulating automaton are defined to be those states which are empty move reachable.Thus, the simulating automaton will have the initial state as a final state exactly when theempty string is recognised by the empty move automaton and so it too will also recognise theempty string.

For |w | = 1, let w = a for a ∈ T , then

δ∗(q , a) = δ(q , a) by definition of δ∗,

= δ∗λ(q , a) by definition of δ.

For |w | > 1, let w = au for a ∈ T , u ∈ T ∗, then

δ∗(q ,w) = δ∗(q , au) by definition of w ,

= ∪q ′∈δ(q,a)δ∗(q ′, u) by definition of δ∗,

= ∪q ′∈δλ(q,a)δ∗(q ′, u) by definition of δ,

= ∪q ′∈δλ(q,a)δ∗λ(q

′, u) by Induction Hypothesis,

= δ∗λ(q , au) by definition of δ∗λ.

13.5 Modular Nondeterministic Finite State Automata

13.5.1 Recognising Integers

Recall from Section 13.3.1 that we can recognise the set of natural numbers using the automatonM N in Figure 13.18. Can we re-use this automaton in building a new automaton M Z to recognisethe set of integers?

Structure

The integers Z consist of

• the naturals N = 0 , 1 , 2 , . . ., together with

• the set of negative numbers −1 ,−2 , . . ..The first observation is that we want to be able to plug in the automaton M N to recognisenatural numbers as part of the definition of the integers. The second observation is thatthe negative numbers are similar in syntax to the naturals, so we should be able to use anautomaton similar in structure to M N. In fact, we need to be able to adjust M N so thatwe recognise a negative sign followed by a non-zero natural number. Fortunately, we evenproduced an automaton (Figure 13.17) M N+ to recognise non-zero natural numbers whilsttrying to construct M N.

Modular Automaton

We want to construct an automaton M Z by re-using the automata M N and M N+ as shown inFigure 13.24.

We can represent this automaton M Z in display format by:


AN

AN+

t

Figure 13.24: Automaton M Z to recognise the set Z of integers by re-using the automata M N

and M N+ .

automaton M Z

import M N as M N1 ,M N+ as M N+1

states

terminals −

start qN1I

final qN1F , qN1

F ′ , qN+1

F

transitions δ(qN1I ,−) = qN+1I

Flattened Automaton

When flattened, this produces the automaton M Flattened Z shown in Figure 13.25.In display format, this flattened automaton M Flattened Z is:


AN

AN+

qN+

0

qN+

1

qN1

qN2

qN0

0, 1, . . . , 9

0, 1, . . . , 9

1, . . . , 9

−

0

1, . . . , 9

Figure 13.25: Flattened automaton M Flattened Z to recognise the set Z of integers by re-usingthe automata M N and M N+ .

automaton M Flattened Z

states qN1I , qN1

F , qN1F ′ , q

N+1

I , qN+1

F

terminals 0, 1, . . . , 9,−

start qN1I

final qN1F , qN1

F ′ , qN+1

F

transitions δ(qN1I , 0)=qN1

F ′ δ(qN1I , 1)=qN1

F · · · δ(qN1F , 9)=qN1

F δ(qN1

F , 0)=qN1F δ(qN1

F , 1)=qN1F · · · δ(qN1

F , 9)=qN1F

δ(qN1I ,−)=qN+1

I δ(qN+1

I , 1)=qN+1

F · · · δ(qN+1

I , 9)=qN+1

F

13.5.2 Program Identifiers

Suppose we want to recognise program identifiers of the form

• a letter, followed by

• any number of letters or digits.

We shall see how we can perform this task by creating two automata: one to recognisedigits, and one to recognise letters.

Digits

We can very easily construct an automaton M D that will recognise a single digit.


automaton M D

states qDI , q

DF

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

start qDI

final qDF

transitions δ(qDI , 0)=qD

F δ(qDI , 1)=qD

F · · · δ(qDI , 9)=qD

F

Letters

We can define an automaton to recognise a single letter in the same manner as we have justdone for single digits.

automaton M L

states qLI , q

LF

terminals a, b, . . . , z,A,B, . . . ,Z

start qLI

final qLF

transitions δ(qLI , a)=qL

F δ(qLI , b)=qL

F · · · δ(qLI , z)=qL

Fδ(qL

I ,A)=qLF δ(qL

I ,B)=qLF · · · δ(qL

I ,Z)=qLF

Building Program Identifiers

We now need to plug together the automata for recognising single digits and letters in a waythat will create an automaton for recognising program identifiers. We start by using a copyM L1 of the automaton M L to read a single letter. We connect this automaton up to the startstate qI of the system with an empty move,

δλ(qI , λ) = qL1I .

After reading a letter, we can have a zero length string of digits or letters, or we can havea non-zero length string of digits or letters.

We can have a zero length string of digits or letters by simply moving to the end state qF

of the system with an empty move:

δλ(qL1F , λ) = qF.

Alternatively, we can generate a non-zero length string of digits or letters by repeatedlychoosing to either read a single digit, or read a single letter. So, we introduce a copy M D1 ofthe automaton M D to read a single digit, and another copy M L1 of the automaton M L to reada single letter. We shall use M D1 and M L1 to allow us to read a string of digits or letters byintroducing appropriate connections.


We use an empty-move link

δλ(qL1F , λ) = qL2

I , qD1I

to join the output of M L1 to the inputs of M L2 and M D1 . This allows us to read either a singleletter or a single digit. Then we connect the output of M L2 and M D1 back to qL1

F where wehave a choice of finishing or repeating the process of reading a single digit or letter and comingback to this point of choosing whether to extend the string or finish. We do this with the emptylink connections

δλ(qL2F , λ) = qL1

F between the outputs of M L2 and M L1 , and

δλ(qD1F , λ) = qL1

F

between the outputs of M D1 and M L1 .The whole automaton is shown in Figure 13.26.

M L1

M L2

M D1

qI qL1I qL1

F

qF

qD1FqD1

I

λ

λ

qL2FqL2

Iλ

λ

λ

λ

Figure 13.26: Modular automaton to recognise program identifiers formed as a letter followedby a string of digits or letters.

.

In display format, this automaton is:


automaton M Identifierλ

import M D as M D1 ,M L as M L1 ,M L2

states qI , qF

terminals

start qI

final qF

transitions δ(qI , λ) = qL1I

δ(qL1F , λ) = qF , q

L2I , qD1

I δ(qL2

F , λ) = qL1F

δ(qD1F , λ) = qL1

F

Flattened

If we flatten the definition in Figure 13.26, we get the automaton shown in Figure 13.27. In

M L2

M D1qF

λ

qI qL1I

a, b, . . . , zλM L1

qL1F

λqD1I qD1

F

λ

λ

qL2FqL2

Iλ

a, b, . . . , z

0, 1, . . . , 9

Figure 13.27: Flattened automaton M Flattened Identifier to recognise program identifiers.

display format, M Flattened Identifierλ is:

13.6. DETERMINISTIC FINITE STATE AUTOMATA 511

automaton M Flattened Identifierλ

states qI , qF

terminals a, b, . . . , z,A,B, . . . ,Z,0, 1, . . . , 9

start qI

final qF

transitions δ(qI , λ)=qL1I

δ(qL1I , a)=qL1

F δ(qL1I , b)=qL1

F · · · δ(qL1I , z)=qL1

F δ(qL1

I ,A)=qL1F δ(qL1

I ,B)=qL1F · · · δ(qL1

I ,Z)=qL1F

δ(qL1F , λ)=qF , q

L2I , qD1

I δ(qL2

I , a)=qL2F δ(qL2

I , b)=qL2F · · · δ(qL2

I , z)=qL2F

δ(qL2I ,A)=qL2

F δ(qL2I ,B)=qL2

F · · · δ(qL2I ,Z)=qL2

F δ(qL2

F , λ)=qL1F

δ(qD1I , 0)=qD1

F δ(qD1I , 1)=qD1

F · · · δ(qD1I , 9)=qD1

F δ(qD1

F , λ)=qL1F

13.6 Deterministic Finite State Automata

13.6.1 The Equivalence of Deterministic and Nondeterministic Fi-nite State Automata

The languages that can be recognised by deterministic finite state automata are precisely thosethat can be recognised by nondeterministic finite state automata. Thus,

L is a regular language ⇔ L is generated by a regular grammar G⇔ L is recognised by a nondeterministic finite state automaton⇔ L is recognised by a deterministic finite state automaton.

For any nondeterministic finite state automaton, we shall describe how we can construct adeterministic version that will simulate it.

Constructing a Deterministic Simulation

A nondeterministic finite state automaton is potentially able to choose from a number of possiblenext moves at any given time. The deterministic simulation that we will construct expands outall the non-deterministic choices.

Algorithm to Construct a Deterministic Simulation Let M = (Q ,T , δ, q initial ,F ) besome nondeterministic finite state automaton that recognises a language L. We construct adeterministic finite state automaton Mdet = (Qdet ,T , δdet , q

initialdet ,Fdet) from M to recognise L

as follows.We define the set T of terminals of Mdet to be precisely those of M .


We define the set Qdet of states of Mdet to be the set P(Q) of all possible subsets of statesof M :

Qdet = P(Q).

We will use these states to track every possible nondeterministic move that M can make.We define the initial state q initial

det of Mdet to be the set

q initial

containing precisely the initial state q initial of M .We define a state

q1 , . . . , qnof Qdet to be accepting if it contains an accepting state of M , i.e.,

q1 , q2 , . . . , qn ∈ Fdet ⇔ q1 ∈ F or q2 ∈ F or · · · or qn ∈ F .

Thus, the set Fdet of accepting states of the deterministic simulation Mdet represent the factthat there is a possible path through the nondeterministic model M to an accepting state.

We define the transition function

δdet : Qdet × T → P(Qdet)

byδdet(q1 , q2 , . . . , qn, u) = δ(q1 , u) ∪ δ(q2 , u) ∪ · · · ∪ δ(qn , u).

Thus, suppose we are in a state q1 , q2 , . . . , qn of the simulation. This represents that thereare n possible paths through the nondeterministic machine from the start state q initial to eachof the states q1 , q2 , . . . , qn when given some particular input. So, in the deterministic simulator,we represent the choices in the nondeterministic automaton, on reading the input u of movingto

• the state δ(q1 , u) or

• the state δ(q2 , u) or... or

• the state δ(qn , u).

Example

Consider the example automaton shown in Figure 13.23 in Section 13.4.3. This is non-deterministicbecause there are two possible moves out of state q0 upon reading the terminal symbol a; onepath goes back to q0 , and the other goes to q1 .

If we apply the algorithm to construct a deterministic simulation of this automaton, we get:

• the same set a, b of terminal symbols;

• the setP(q0 , q1) = ∅, q0, q1, q0 , q1

of states;

13.6. DETERMINISTIC FINITE STATE AUTOMATA 513

• the initial state q0;

• the setq0, q1, q0 , q1

of final states; and

• the transition function

δ(∅, a)=∅ δ(∅, b)=∅δ(q0, a)=q0 , q1 δ(q0, b)=q1δ(q1, a)=∅ δ(q1, b)=q1

δ(q0 , q1, a)=q0 , q1 δ(q0 , q1, b)=q1.

This automaton is represented in Figure 13.28 and is summarised below.

q0 b

q0 , q1a

a

b

b q1a, b

a

∅

Figure 13.28: Deterministic automaton to recognise the language a ibj | i , j ≥ 0.

automaton

states ∅, q0, q1, q0 , q1

terminals a, b

start q0

final q0, q1, q0 , q1

transitions δ(∅, a) = ∅δ(∅, b) = ∅

δ(q0, a) = q0 , q1δ(q0, b) = q1δ(q1, a) = ∅δ(q1, b) = q1

δ(q0 , q1, a) = q0 , q1δ(q0 , q1, b) = q1


Lemma (NFSA and its DFA simulation recognise the same language) If a nondeter-ministic finite state automaton recognises a language L, then the deterministic simulation usingthe algorithm described above recognises precisely the language L.

Proof. Let M = (Q ,T , δ, q initial ,F ) be some nondeterministic finite state automaton and letMdet = (Qdet ,T , δdet , q

initialdet ,Fdet) be the simulating deterministic finite state automaton con-

structed using the algorithm above.We prove the lemma by induction on the length of input strings u ∈ T ∗.

Base Case Case |u| = 0 .

As |u| = 0 , u = λ.

In the nondeterministic finite state automaton M , if u ∈ L, then the initialstate q initial must be a final accepting state of M . Thus, q initial ∈ F .

In the deterministic simulator, the initial state by definition is q initial. Ifq initial is a final accepting state in M , then by definition, q initial is a finalaccepting state in Mdet . And, if q initial is a final accepting state then Mdet

accepts the string λ = u ∈ L.

Alternatively, if u 6∈ L, then the initial state of the nondeterministic model Mcannot be a final accepting state.

If q initial is not a final accepting state in M , then by definition, q initial is nota final accepting state in Mdet . And, if q initial is not a final accepting statethen Mdet does not accept the string λ = u 6∈ L.

Induction Step Case |u| = n + 1 .

Suppose that the lemma holds for all strings up to length n. Now suppose wehave a string ua of length n + 1 , built up from a string u ∈ T ∗ of length nand a single character a ∈ T . Then, we can break the behaviour δ(q initial , ua)of the nondeterministic machine M on the string ua into two parts. First, weprocess the string u to give the possible states δ(q initial , u), and then we processthe character a on each of these possibilities:

δ(q initial, ua) = δ(q , a) | q ∈ δ(q initial, u)

By the induction hypothesis,

δ(q initial , u) = q1 , q2 , . . . , qn ⇔ δdet(qinitial , u) = q1 , q2 , . . . , qn.

And, by definition of δdet ,

δ(q1 , q2 , . . . , qn, a) = q ′1 , q

′2 , . . . , q

′n

⇔ δdet(q1 , q2 , . . . , qn, a) = q ′1 , q

′2 , . . . , q

′n.

Thus,

δ(q initial , ua) = q ′1 , q

′2 , . . . , q

′n ⇔ ˆδdet(q

initialdet , ua) = q ′

1 , q′2 , . . . , q

′n.

So, finally, if ua ∈ L then ua is accepted by M if, and only if,

13.7. REGULAR GRAMMARS AND NONDETERMINISTIC FINITE STATE AUTOMATA515

there exists q ∈ δ(q initial , ua) such that q ∈ F .

And, by definition, if there is a final accepting state in q ′1 , q

′2 , . . . , q

′n , then

q ′1 , q

′2 , . . . , q

′n is a final accepting state. Thus, Mdet will also recognise ua.

And, similarly, if ua 6∈ L, then ua is not accepted by M if, and only if,

there does not exist a state q ∈ δ(q initial , ua) such that q ∈ F .

And, by definition, if none of q ′1 , q

′2 , . . . , q

′n are accepting states in M , then

q ′1 , q

′2 , . . . , q

′n is not an accepting state in Mdet . Thus, Mdet will also not

recognise ua.

13.7 Regular Grammars and Nondeterministic Finite State

Automata

We shall see in this section how we can easily translate between

• regular grammars and nondeterministic finite state automata; and

• nondeterministic finite state automata and regular grammars.

In a grammar, we have terminals, nonterminals, a start symbol and rules. In a finite stateautomaton, we have terminals, states, an initial state, final states and transitions . We shallrelate the two by thinking of the nonterminals as states, and the rules as transitions.

13.7.1 Regular Grammars to Nondeterministic Finite State Automata

We shall translate a regular grammar into a finite state automaton in two stages. The firststage is to pre-process the grammar so that it is in a particular form ready for the second stagethat maps the grammar into an automaton.

Single Step Normal Form Regular Grammars

The translation from a regular grammar to a finite state automaton is a simple process providinga small amount of work is done up-front to restructure the grammar. We shall manipulate thegrammar so that each of its production rules is placed into a form that is even more restrictedthan it was originally for it to have qualified as a regular grammar. We call a grammar with theseformatted rules a single step normal form regular grammar. This is because the productionswill be restricted so that they can only grow a string by one terminal symbol at a time.

In a regular grammar G = (T ,N , S ,P), we require that either all of its rules p ∈ P havethe form

A→ uB or A→ u

or else they are all of the formA→ Bu or A→ u


where u ∈ T ∗ is some string of terminal symbols and A,B ∈ N are non-terminals.In a single step normal form regular grammar GSSNF = (T ,N SSNF , S ,PSSNF ) we restrict

the string of terminal symbols to a single terminal symbol. Thus, we require that either all itsrules p ∈ PSSNF have of the form

A→ aB or A→ a

or else they are all of the formA→ Ba or A→ a

where a ∈ T is a single terminal symbol, and A,B ∈ N are non-terminals.It is a simple two-step process to transform an arbitrary regular grammar G = (T ,N , S ,P)

into one that is in single step normal form G = (T ,N SSNF , S ,PSSNF ).The first step is to remove all chain rules of the form

A→ B

from the grammar. This is done by replacing the right-hand-side B of any such chain rule withthe expansions of B , i.e., the right-hand sides of any rules where B occurs on the left-hand-side.If after the recursive application of this algorithm there remains any chain rules, these mustbe useless productions in the sense that they cannot be used to derive any string of terminalsymbols. Thus, they may be safely removed from the grammar.

The second step of the transformation expands out all non-chain rules so that they onlygenerate a string one terminal symbol at a time.

For each production rule p ∈ P of the form

A→ u where u = t1 t2 · · · tnwe expand it in GSSNF into n production rules

A→ t1A1 ,A1 → t2A2 ,A2 → t3A3 , . . . ,An−1 → tn

where A1 ,A2 , . . . ,An−1 ∈ N SSNF are new terminal symbols that do not appear in N .The transformation of the remaining rules of a right-linear grammar is similar. For each

production rule p ∈ P of the form

A→ uB where u = t1 t2 · · · tnwe expand it in GSSNF into n production rules

A→ t1A1 ,A1 → t2A2 ,A2 → t3A3 , . . . ,An−1 → tnB

where A1 ,A2 , . . . ,An−1 ∈ N SSNF are new terminal symbols that do not appear in N .The transformation of the remaining rules of a left-linear grammar looks different simply

because strings are built up backwards from their last symbol to their first. For each productionrule p ∈ P of the form

A→ Bu where u = t1 t2 · · · tnwe expand it in GSSNF into n production rules

A→ A1 tn ,A1 → A2 tn−1 ,A2 → A3 tn−2 , . . . ,An−2 → Ant2 ,An−1 → Bt1 .


Informal Automaton Transformation

We consider first production rules of a regular grammar in single-step normal form that looklike

A→ a

where A ∈ N is a non-terminal and a ∈ T is a single terminal symbol.We can model this type of rule with a nondeterministic finite state automaton that has a

state A ∈ Q , a final state qF ∈ F and the transition relation δ : Q × T → P(Q) defines apossible move qF ∈ δ(A, a) from state A to final state qF provided the terminal symbol a ∈ Tis read. This is shown in Figure 13.29.

A

a

qF

Figure 13.29: Translation of rules of the form A→ a..

The second type of production rule will depend on whether the grammar is left-linear orright-linear. In a left-linear grammar, the remaining rules will be of the form

A→ Ba

and in a right linear grammar, the remaining rules are of the form

A→ aB

where A,B ∈ N are non-terminals and a ∈ T is a single terminal symbol.We transform such left-linear and right-linear rules in the same manner. We define the

transition function δ : Q × T → P(Q) to give a possible move B ∈ δ(A, a) from state A tostate B provided the terminal symbol a ∈ T is read. This is shown in Figure 13.30.

A

a

B

Figure 13.30: Translation of rules of the form A→ Ba or A→ aB .

Thus, given a grammar in single-step normal form:

grammar G

terminals . . . , a, . . .

nonterminals A,B , . . . , S

start symbol S

productions A → aBA → a

...


we translate it into the nondeterministic finite state automaton:

automaton MG

states A,B , . . . , S , qF

terminals . . . , a, . . .

start S

final qF

transitions δ(A, a) = Bδ(A, a) = qF

...

We now give this algorithm more formally.

Construction Algorithm for Automata Simulator

Given a grammar G = (T ,N , S ,P) in single-step normal form, we translate G into a nonde-terministic finite state automaton M = (Q ,T , δ : Q × T → P(Q),F ) where

• The set Q of states of the automaton is given by the set N of the grammar G , togetherwith a new state qF where qF 6∈ N . Thus,

Q = N ∪ qF.

• The set T of terminals of the machine M is precisely those of the grammar G .

• The transition function δ is defined by

δ(A, a) = B1 , . . . ,Bn if, and only if, A→ aB1 , . . . ,A→ aBn ∈ P ;

or A→ B1a, . . . ,A→ Bna ∈ P .

• The initial state S of the machine M is precisely that of the grammar G .

• The set F ⊆ Q of final states is precisely the new state qF ∈ Q :

F = qF

Example

Consider the grammar



terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals digit , digits , number

start symbol number

productions number → digitnumber → 1digitsnumber → 2digits

...number → 9digitsdigits → 0digitsdigits → 1digits

...digits → 9digitsdigits → 0digits → 1

...digits → 9digit → 0digit → 1

...digit → 9

from Section 12.2.2. First, we convert this into single-step normal form by removing the chainrules.



terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals number , digits

start symbol number

productions number → 0number → 1

...number → 9number → 1digitsnumber → 2digits

...number → 9digitsdigits → 0digitsdigits → 1digits

...digits → 9digitsdigits → 0digits → 1

...digits → 9

Then we convert it into a finite automaton:


automaton

states number , digits , qF

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

start number

final qF

transitions δ(number , 0) = qF

δ(number , 1) = qF

...δ(number , 9) = qF

δ(number , 1) = digitsδ(number , 2) = digits

...δ(number , 9) = digitsδ(digits , 0) = digitsδ(digits , 1) = digits

...δ(digits , 9) = digitsδ(digits , 0) = qF

δ(digits , 1) = qF

...δ(digits , 9) = qF

This automaton is represented in Figure 13.31.

0, 1, . . . , 9

0, 1, . . . , 9

1, 2, . . . , 9digits

number

0, 1, . . . , 9qF

Figure 13.31: Simulation automaton for GRegular Number .


13.7.2 Nondeterministic Finite State Automata to Regular Gram-mars

We can also convert a finite state automaton into a regular grammar. The translation in thisdirection is not as neat as that in the other direction in the sense that we may end up withsurplus production rules. Such rules are surplus because they could not be used in the derivationof any string, and hence are innocuous.

Construction Algorithm for Grammar Simulator

Given a nondeterministic finite automaton M = (Q ,T , q I , δ : Q ×T → P(Q),F ) we translateM into a grammar G = (T ,N , S ,P), where:

• The set T of terminals of the grammar G is precisely those of the machine M .

• The set N of the grammar G is given precisely by the set Q of states of the automatonThus,

N = Q

• The start symbol S is the initial state q I of the machine M .

• The production rules P of the grammar G are defined from the transition function δ ofthe machine M . We can choose to produce either a left-linear grammar or a right-lineargrammar. We illustrate here how to produce a right-linear grammar.

A→ uB ∈ P if, and only if, B ∈ δ(A, u)

A→ u ∈ P and A→ uB ∈ P if, and only if, B ∈ δ(A, u) and B ∈ F .

Example

Consider the automaton from Section 13.3.1 for recognising numbers:

automaton M N

states qNI , q

NF , q

NF ′

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

start qNI

final qNF , q

NF ′

transitions δ(qNI , 0)=qN

F ′ δ(qNI , 1)=qN

F · · · δ(qNI , 9)=qN

Fδ(qN

F , 0)=qNF δ(qN

F , 1)=qNF · · · δ(qN

F , 9)=qNF

Applying the grammar construction algorithm to this automaton gives the regular grammar:


grammar

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals qNI , q

NF , q

NF ′

start symbol qNI

productions qNI → 0

qNI → 0qN

F ′

qNI → 1

qNI → 1qN

F

...qNI → 9

qNI → 9qN

F

qNF → 0

qNF → 0qN

F

qNF → 1

qNF → 1qN

F

...qNF → 9

qNF → 9qN

F

In this grammar we can derive the strings

0, 1, 2, 3, 4, 5, 6, 7, 8 and 9

from the start symbol qNI

Lemma (Simulating Derivations by Automata Transitions) Let G be a regular grammarin single-step normal form. For any string w ∈ L(G) that we can derive from G via thederivation sequence

w0 ⇒ w1 ⇒ w2 ⇒ · · · ⇒ wn


then we can do so if, and only if, there exists t0 , t1 , . . . , tn ∈ T ∗ and q0 , q1 , . . . , qn ∈ Q, suchthat

(i) for stages 0 ≤ i ≤ n − 1 ,

wi =

qi ti · · · t1 if G is left-linear,

t1 · · · tiqi if G is right-linear.

(ii) and for stage n,wn = t1 · · · tn


(iii) for stages 0 ≤ i < n − 1 ,δ(qi , ti+1 ) = qi+1

(iv) and for stage n,δ(qn−1 , tn) = q and F = q

Proof. Let G = (T ,N , S ,P) be a regular grammar in single-step normal form.By the Regular Languages Sentence Derivations Lemma, for any string w ∈ L(G) that we

can derive from G via the derivation sequence

w0 ⇒ w1 ⇒ w2 ⇒ · · · ⇒ wn


then we can do so if, and only if, there exist strings t0 , t1 , . . . , tn ∈ T ∗ and states q0 , q1 , . . . , qn ∈Q , such that

(a) for stages 0 ≤ i ≤ n − 1 ,

wi =

qi ti · · · t1 if G is left-linear,

t1 · · · tiqi if G is right-linear.

(b) and for stage n,wn = t1 · · · tn

(c) for stages 0 ≤ i < n − 1 ,qi → ti+1qi+1 ∈ P

(d) and for stage n,qn−1 → tn ∈ P .

Thus, requirement (i) is given by (a) and requirement (ii) is given by (b). Requirements (iii)and (iv) are given by (c) and (d), respectively, under the Simulator Construction Algorithm.

Theorem (Nondeterministic Finite State Automata Recognise Regular Languages)For any regular grammar G, there exists an algorithm to transform G into a nondeterministicfinite state automata M such that they generate the same languages: L(G) = L(M ).

Proof Using the Simulator Construction Algorithm, we can construct a nondeterministic finitestate automata M and using the Simulating Derivations by Automata Transitions Lemma, wecan show that we can derive a string w ∈ L from G if, and only if, we can use the automatonM to recognise w . 2

13.7.3 Modular Regular Grammars and Modular Finite State Au-tomata

We can translate between a modular regular grammar and a finite state automaton, or amodular finite state automaton and a regular grammar by flattening the modular structureand applying the transformation algorithm. In doing so, we lose the modular structure inthe process. We can preserve modular structure by adding just a few modifications to thetransformations described in Sections 13.7.1 and 13.7.2.


Chain Rules

It is helpful to have chain rules to define a modular grammar. These are production rules ofthe form

A→ B .

A regular grammar is allowed to have chain rules as these are of the form

A→ uB

in the case of a right-linear grammar,A→ Bu

in the case of a left linear grammar, with u = λ ∈ T ∗.These chain rules are useful in a modular grammar because they allow us to easily separate

off separate components.

Single and Empty Step Normal Form Regular Grammars

If we relax the definition of single-step normal form regular grammars to allow chain rules, thenwe need to consider their automata simulation transformation.

Chain rules correspond to empty-move transformations. A chain rule of the form

A→ B

is mapped into a moveB ∈ δλ(A, λ)

in an automata which allows empty moves.Similarly, empty moves in a finite state automaton of the form

B ∈ δλ(A, λ)

are mapped into chain rulesA→ B .

Example

We can convert the modular automaton M Identifier of Section 13.5.2 into the modular grammar:


grammar Identifier

import D as D1 ,L as L1 ,L2

terminals

nonterminals qI , qF

start symbol qI

productions qI → qL1I

qL1F → qF

qL1F → qL2

I

qL1F → qD1

I

qL2F → qL1

F

qD1F → qL1

F

where the imported grammar for letters is:

grammar L

terminals a, b, . . . , z,A,B, . . . ,Z

nonterminals qLI , q

LF

start symbol qLI

productions qLI → aqL

F

qLI → bqL

F...

qLI → zqL

F

qLI → AqL

F

qLI → BqL

F...

qLI → ZqL

F

and the imported grammar for digits is:

grammar D

terminals 0, 1, . . . , 9

nonterminals qDI , q

DF

start symbol qDI

productions qDI → 0qD

F

qDI → 1qD

F...

qDI → 9qD

F


13.7.4 Pumping Lemma (Nondeterministic Finite State Automata)

Let M be a nondeterministic finite state automaton with a set T of terminals. Then there existsa number k = k(M ) ∈ N that will depend on the automata M such that, if a string z ∈ L(M )and the length |z | > k then we can write

z = uvw

as the concatenation of strings u, v ,w ∈ T ∗ and

(a) the length of the string |v | ≥ 1 (i.e., v is not an empty string);

(b) the length of the portion |uv | ≤ k; and

(c) for all i ≥ 0 , the stringuv iw ∈ L(M ).

Proof As M is a finite state automaton, it has a finite number of states. Suppose M has kdistinct states

q0 , q1 , . . . , qk .

Then the longest path through the machine will be k steps long if there are no cycles in thepath. Thus, the automaton will be unable to recognise z as |z | > k unless there is some cyclicpath through the machine. Let u be the initial portion of the string z before the cycle isencountered, v be the portion in the cycle and w the portion after the cycle. As we only havek states available, the length of the string |u| ≤ k and we know that the cycle must exist, so|w | ≥ 1 .

The path through the automaton M to recognise z is shown in Figure 13.32.

qj

q0

recognise u recognise w

qk

recognise v

q1 qi

Figure 13.32: Path chosen through the automaton to recognise the string z = uvw ∈ L..

So, in recognising z we travel once through the cycle in the path. This cycle corresponds tothe non-empty string v . But if there is a cycle, we cannot limit the number of times that wetraverse the cycle. So, if we travel through the cycle

• no times, we will be able to recognise the string uw ;


• exactly once, we will be able to recognise the string uvw ;

• exactly twice, we will be able to recognise the string uv 2w ;...

• exactly i times, we will be able to recognise the string uv iw ;...

2

13.8 Regular Expressions

We shall now consider another technique for describing regular languages. This will provide uswith a method for designing machines (and, hence, algorithms) to recognise regular languages.

13.8.1 Operators

When we come to define regular expressions, we will make use of three operations on languages.These operations are:

(i) the concatenationL1 .L2

of two languages L1 and L2 ;

(ii) the unionL1 |L2

of two languages L1 and L2 ; and

(iii) the iterationL∗

of a language L.

Concatenation

We define an operation. : P(T ∗)× P(T ∗)→ P(T ∗)

to concatenate two languages L1 ,L2 ⊆ T ∗ ∈ P(T ∗) together to form a new language

L1 .L2 .

This languageL1 .L2 = w1 .w2 | w1 ∈ L1 ,w2 ∈ L2

consists of every combination that can be formed by taking a string of L1 followed by a stringof L2 .

13.8. REGULAR EXPRESSIONS 529

For example, if L1 = 1 , 2 , . . . , 9 and L2 = 0 , 1 , . . . , 9 then their concatenation is theset

L1 .L2 = 10 , 11 , . . . , 98 , 99of all numbers between 10 and 99 .

As another example, consider the concatenation of three languages where

(i) the first language h, p consists of the two letters h and p,

(ii) the second language a, e, i, o, u consists of all the vowels of the English alphabet, and

(iii) the third language t consists of just the letter t.

The language that results is:

((h, p.a, e, i, o, u).t) = hat, het, hit, hot, hut, pat, pet, pit, pot, put

(As concatenation of strings is associative, we get the same answer if we had bracketed theinputs differently.)

Union

We define an operation| : P(T ∗)× P(T ∗)→ P(T ∗)

to create the union of two languages L1 ,L2 ⊆ T ∗ ∈ P(T ∗) to form a new language

L1 |L2 .

This languageL1 |L2 = L1 ∪ L2

consists simply of every string of L1 and every string of L2 .For example, if L1 = 1 , 2 , . . . , 9 and L2 = 0 , 1 , . . . , 9 then their union is the set

L1 |L2 = 0 , 1 , . . . , 9

of numbers that are either in L1 or in L2 , which in this case are those that are between 0 and9 .

Iteration

We define an operation∗ : P(T ∗)→ P(T ∗)

(called Kleene-star) to create the iteration of a single language L ⊆ T ∗ ∈ P(T ∗) to form a newlanguage

L∗.

This language

L∗ = w1 ,w2 ,w1w1 ,w2w1 ,w2w2 ,w1w2 , . . .w1w2 · · ·wn · · · , . . . | w1 ,w2 , . . . ,wn , . . . ∈ L0


consists of every combination of strings concatenated from L.For example, if L1 = 1 , 2 , . . . , 9 then its iteration is the set

L∗1 = 1 , 2 , . . . , 9 , 11 , 12 , . . . , 99 , 111 , 112 , . . . , 999 , . . ..

Similarly, if L2 = 0 , 1 , . . . , 9 then its iteration is the set

L∗2 = 0 , 1 , . . . , 9 , 00 , 01 , . . . , 99 , 000 , 001 , . . . , 999 , . . ..

13.8.2 Building Regular Expressions

We use an iterative algorithm to build up the set of regular expressions. We start from a setof regular expression constants and then construct additional regular expressions by iterativelyapplying the regular expression operators.

The regular expression constants consist of the empty set ∅, and a set of constants whichvary according to the set of terminal symbols that we want to consider. Different sets ofterminal symbols produce different sets of regular expressions.

Definition (Regular Expressions) The set of regular expressions consists of

(i) the empty set ∅;

(ii) the set of languages t | t ∈ T each of which consists of a single terminal symbol;

(iii) the concatenation (R1 .R2 ) of two languages that are defined by the regular expressionsR1 and R2 ;

(iv) the union (R1 |R2 ) of two languages that are defined by the regular expressions R1 andR2 ;

(v) the iteration (R∗0 ) of a language that is defined by the regular expression R0 .

Examples

We can define the set

Digit = 0|1|2|3|4|5|6|7|8|9

of digits as the union of ten separate languages, and the set

Digit+ = 1|2|3|4|5|6|7|8|9

of non-zero digits as the union of nine separate languages. Similarly, we can define the set

Letter = a|b|c| · · · |z|A|B|C| · · · |Z

of letters as the union of 52 separate languages.We can generate the language

Identifier = Letter .((Letter |Digit | )∗).

13.9. RELATIONSHIP BETWEEN REGULAR EXPRESSIONS AND AUTOMATA 531

of programming identifiers which are formed as a letter followed by any number of letters, digitsor underscores.

We have already observed that

Digit∗ = 0 , 1 , . . . , 9 , 00 , 01 , . . . , 99 , 000 , 001 , . . . , 999 , . . .

and(Digit+)∗ = 1 , 2 , . . . , 9 , 11 , 12 , . . . , 99 , 111 , 112 , . . . , 999 , . . ..

We can generate the set

Digit+.(Digit∗) = 10 , 11 , . . . , 99 , 100 , 101 , . . . , 999 , 1000 , 1001 , . . . , 9999 , . . .= n | n ≥ 10

of numbers that are greater than 10 by concatenating the languages Digit+ and Digit∗. If wealso union this with the language Digit of numbers up to 9 , then we get the complete set

Number = Digit |(Digit+.(Digit∗))

of natural numbers.

13.9 Relationship between Regular Expressions and Au-

tomata

In this section we show that we can

translate any regular expression into a finite state automaton

and we can

translate any finite state automaton into a regular expression

such that the regular expressions and finite state automata define the same languages.As a consequence of this equivalence we shall see how we can give additional structure to

finite state automata that is present in regular expressions.

13.9.1 Translating Regular Expressions into Finite State Automata

We take each of the elements that make up the set of regular expressions and translate eachinto finite state automata.

The Empty Set

We can translate the empty set constant ∅ :→ P(T ∗) of regular expressions into a finite stateautomata M∅ defined in the manner indicated in Figure 13.33.

More formally, we create the automaton M∅ which has

• the empty set ∅ of terminal symbols;


Figure 13.33: Empty set finite state automata.

• the set q init of states;

• the empty set ∅ of final states;

• the initial state q init ;

• the empty transition relationδ = ∅

that connects nothing together.

This gives us the automaton

automaton M∅

states q init

terminals ∅

start q init

final ∅

transitions

The Empty String

We can translate the empty string constant λ :→ P(T ∗) of regular expressions into a finitestate automata M∅ defined in the manner indicated in Figure 13.34.

Figure 13.34: Empty string finite state automata.

More formally, we create the automaton Mλ which has

• the empty set ∅ of terminal symbols;

• the set q init of states;

• the set q init of final states;



• the empty transition relationδ = ∅

that connects nothing together.


automaton Mλ

states q init

terminals ∅

start q init

final q init

transitions

Single Terminal Symbols

We can translate the single element sets t | t ∈ T of regular expressions into a finite stateautomata defined in the manner indicated in Figure 13.35.

t

Figure 13.35: Single terminal symbol finite state automata.

More formally, for each symbol t ∈ T we create the automata Mt which has

• the set t of terminal symbols;

• the set q init , qF of states;

• the set qF of final states;


• the transition relationδ(q init , t) = qF

that joins the initial state q init to the final state qF by reading the letter t ∈ T .



automaton Mt

states q init , qF

terminals ∅

start q init

final qF

transitions δ(q init , t) = qF

The Concatenation Operator

We can translate the concatenation operator . : P(T ∗)×P(T ∗)→ P(T ∗) on regular expressionsinto a concatenation operator on finite state automata defined in the manner indicated inFigure 13.36.

M 2M 1

λ

λ

Figure 13.36: Concatenation operation on finite state automata.

More formally, when we concatenate two automata M1 = (T1 ,Q1 ,F1 , qinit1 , δ1 ) and M2 =

(T2 ,Q2 ,F2 , qinit2 , δ2 ) we create the automata M1 .2 which has

• the set T1 ∪ T2 of terminal symbols from both machines;

• the set Q1 ∪Q2 of states from both machines;

• the set F2 of final states from just M2 ;

• the initial state q init1 from just M1 ; and

• the transition relation δ that is created from the union of δ1 , δ2 and an empty movejoining the final states of M1 to the initial state of M2 :

δ = δ1 ∪ δ2 ∪ (λ, qF1 , q

init2 ) | qF

1 ∈ F1

In modular automata terms, we get:


automaton M1 .2

import M1 ,M2

states

terminals

start q init1

final F2

transitions . . . , δ(λ, qF1 ) = q init

2 , . . .

The Union Operator

We can translate the union operator | : P(T ∗) × P(T ∗) → P(T ∗) on regular expressions intoa union operator on finite state automata defined in the manner indicated in Figure 13.37.

M 2

λ

λM 1

Figure 13.37: Union operation on finite state automata.

More formally, when we take the union of two automata M1 = (T1 ,Q1 ,F1 , qinit1 , δ1 ) and

M2 = (T2 ,Q2 ,F2 , qinit2 , δ2 ) we create the automata M1 |2 which has

• the set T1 ∪ T2 of terminal symbols from both machines;

• the set Q1 ∪Q2 of states from both machines;

• the set F1 ∪ F2 of final states from both machines;

• a new initial state q init ;

• the transition relation δ that is created from the union of δ1 , δ2 and an empty movejoining q init to the initial states of both M1 and M2 :

δ = δ1 ∪ δ2 ∪ (λ, q init , q init1 ), (λ, q init , q init

2 )



automaton M1 |2

import M1 ,M2

states

terminals

start q init

final F1 ,F2

transitions δ(λ, q init) = q init1 , q init

2

The Iteration Operator

We can translate the iteration operator ∗ : P(T ∗) → P(T ∗) on regular expressions into aniteration operator on finite state automata defined in the manner indicated in Figure 13.38.

M 1

λ

λ

λ

λ

λ

λ

Figure 13.38: Iteration operation on finite state automata.

More formally, when we take the iteration of the automaton M1 = (T1 ,Q1 ,F1 , qinit1 , δ1 ) we

create the automata M1∗ which has

• the set T1 of terminal symbols from M1 ;

• the set Q1 ∪ q init1∗ , qF

1∗ of states from M1 together with two new states;

• the set F1 ∪ qF1∗ of final states from M1 together with a new final state;

• a new initial state q init1∗ ;

• the transition relation δ that adds to δ1 empty moves joining:

(i) the new initial state q init1∗ to the initial state q init

1 of M1 ;

(ii) the new final state qF1∗ to the new initial state q init

1∗ ;

(iii) each of the final states of M1 to the initial state q init1 of M1 ; and

(iv) each of the final states of M1 to the new final state qF1∗ .


We can define this mathematically by,

δ = δ1 ∪ (λ, q init1∗ , q init

1 ), (λ, qF1∗ , q init

1∗ ) ∪ (λ, qF1 , q

init1 ), (λ, qF

1 , qF1∗) | qF

1 ∈ F1.


automaton M1∗

import M1

states q init1∗ , qF

1∗

terminals

start q init1∗

final qF1

transitions δ(λ, q init1∗ ) = q init

1 δ(λ, qF

1∗) = q init1

. . . , δ(λ, qF1 ) = q init1 , . . .

. . . , δ(λ, qF1 ) = qF

1∗, . . .

Theorem Any regular expression r can be modelled by a finite state automaton M where Mrecognises the language L(r).

Proof We use structural induction on the form of regular expressions. The three base casesare as follows.

• The empty set regular expression ∅ is modelled by the empty set automaton M∅ whichrecognises the language ∅ as the automaton has no final states.

• The empty string regular expression λ is modelled by the empty string automaton Mλ

which recognises the language λ as the initial state is the final state and there are noother transitions.

• The single terminal symbol regular expressions t | t ∈ T are modelled by automataMt which recognises the language t as this is the only transition to a final state.

The induction hypothesis is that the regular expressions r1 and r2 are modelled by automataM1 and M2 which recognise the languages L(r1 ) and L(r2 ). The three structural inductioncases are as follows.

• The concatenation r1 .r2 is modelled by the automaton M1 .2 . The initial state of thismachine is the initial state of M1 . Thus, we can recognise L(r1 and we will end up in afinal state of M1 . This is not a final state of M1 .2 and we have an empty move transitionfrom it to the initial state of M2 . So, it is not until we have also recognised L(r2 that weend up in a final state of M2 which is also a final state of M1 .2 . Hence, we recognise thelanguage L(r1 .r2 .


• The union r1 |r2 is modelled by the automaton M1 |2 . The initial state of this machine isa new state q init . There are two possible moves from this state: both are empty movesand one goes to the initial state of M1 , and the other to the initial state of M2 . Thus,following the one route we recognise L(r1 ) and we will end up in a final state of M1 whichis a final state of M1 .2 . And, following the other route, we recognise L(r2 ) and we will endup in a final state of M2 which is a final state of M1 .2 . Hence, we recognise the languageL(r1 |r2 ).

• The iteration r ∗1 is modelled by the automaton M1∗ . The initial state of this machine is a

new state q init . There are two possible moves from this state: both are empty moves andone goes to a new final state qF , and the other to the initial state of M1 . Thus, followingthe one route we recognise the empty string λ. Following the other route, we recogniseL(r1 ) and we will end up in a final state of M1 . This is not a final state of M1 .2 ; there aretwo possible moves from this state: both are empty moves and one goes to the final stateqF , and the other to the initial state of M1 . Hence, we recognise the language L(r ∗

1 ).

2

13.10 Translating Finite State Automata into Regular

Expressions

We can take any finite state automaton and translate it into a regular expression for definingthe language that it recognises. We perform this translation in a step-wise manner as indicatedby the following theorem.

Theorem Any finite state automaton M can be modelled by a regular expression r where rdefines the language L(M ).

Proof. The proof is by structural induction on the states used in progressing between anypair of states q and q ′. The base case is that no intermediate state is used. There are threepossibilities. In the case that

q ′ ∈ δ(q , t)then t can be recognised. The second possibility is that q = q ′ in which case the empty stringλ can be recognised. The third possibility is that neither of the first two cases can happen, inwhich case the empty set ∅ can be recognised.

The Induction Hypothesis is that in progressing between any pair of states q and q ′ usingany of the intermediate states q1 , . . . , qk , the language that is recognised is definable by a regularexpression.

For the Induction Step, we now consider paths that progress through any of the intermediatestates q1 , . . . , qk+1 . Any such path will either not use the intermediate state qk+1 , in whichcase it will immediately satisfy the Induction Hypothesis, or it will use the intermediate stateqk+1 . For any such path, we can split it into portions at the state qk+1 so that it starts or endsa path at this point. Each of the individual portions will satisfy the Induction Hypothesis. Wenow want to join the portions back together again and consider the string generated by thepath. The very first and last portions of the string need to be glued together to the potentiallylooped string at the point qk+1 . Joining the portions back together is equivalent to applying

13.10. TRANSLATING FINITE STATE AUTOMATA INTO REGULAR EXPRESSIONS539

the concatenation and iteration operators to the regular expressions describing the portions, sothe longer string will also be definable by a regular expression.

Finally, when we consider all the paths through the automaton, we have the union of regularexpressions for travelling through the sets

∅, q1, . . . , q1 , . . . , qk, . . . , q1 , . . . , qnof intermediate states where the automaton has n states. Thus, every path through the au-tomaton can be described by a regular expression.

Example

We take the example from Figure 13.28 in Section 13.6.1, but with the states relabelled for easeof reference. The resulting automaton is shown in Figure 13.39 and is defined formally by:

a

a

q0

b

b

q2 a

a, b

q3b

q1

Figure 13.39: Automaton to recognise the language a ibj | i , j ≥ 0.

automaton

states q0 , q1 , q2 , q3

terminals a, b

start q0

final q0 , q1 , q2

transitions δ(q0a) = q1

δ(q0b) = q2

δ(q1 , a) = q1

δ(q1 , b) = q2

δ(q2 , a) = q3

δ(q2 , b) = q2

δ(q3 , a) = q3

δ(q3 , b) = q3


a

a

b

q1

q0q3b

a.a∗.b

a

b

(a.a∗.b.b∗)|(b.b∗)

((a.a∗.b.b∗)|(b.b∗)).a

a, b

((a.a∗.b.b∗)|(b.b∗)).a.(a|b)∗

a.a∗

b aq2λ

Figure 13.40: Regular expressions associated with paths through the automaton.

What does this tell us about automata?

The mapping from regular expressions to finite state automata tells us that we can buildautomata from

• the basic blocks of automata that consist of a single transition and recognise a singleterminal symbol, and

• operations that join together existing automata.

The automata constructing operations are those of

• concatenation, which essentially sequentialises two automata;

• union, which essentially parallelises two automata; and

• iteration, which essentially places feedback on an automata.

Note that it is convenient for us to define these operations using empty moves; we can rewritethe resulting automata so that they have no empty moves.

These operations give us a method of both building and structuring finite state automata.

Theorem (Closure Properties) The regular languages are closed under the operations of

(i) concatenation;

(ii) union;

(iii) Kleene closure;

(iv) complementation; and

(v) intersection.


Proof. The theorems in Sections 13.9.1 and 13.10 showed that the languages defined by regularexpressions are precisely those of finite state automata. The regular expressions are closedunder the operations of concatenation, union and Kleene closure, and so therefore are regularlanguages.

For complementation, suppose we have a finite state automaton M = (Q ,T , q init ,QF , δ)that recognises a language L(M ). We can form its complement as the automaton M =(Q ,T , q init ,Q − QF , δ) which has final and non-final states swapped over. This recognisesthe language T ∗ − L(M ). Thus, regular languages are closed under complementation.

For intersection, we use the fact that

L1 ∩ L2 = ((L1 )c ∪ (L2 )c)c.

Thus, as regular languages are closed under union and complementation, they are also closedunder intersection.



1. Prove that the languages defined by the automata in Figures 13.22 and 13.23 in Sec-tion 13.4 both recognise the language

a ibj | i , j ≥ 0.

2. Construct an automaton without empty moves to simulate M Flattened Identifierλ from Sec-

tion 13.5.2.



To appear.

Chapter 14

Context-Free Grammars andProgramming Languages

We began the theory of syntax by introducing the concepts of formal language and grammarand using them to define languages for interfaces, specifications and imperative programs. Thegrammars we used were large but fairly easy to create and apply. However, we encounteredundesirable syntactic properties that seemed awkward to rule out.

The grammars we used in Chapters 10 and 11 were of a special kind, named context-freegrammars in the Chomsky Hierarchy. Recall that in a general grammar with terminals T andnon-terminals N , a production rule

u → v

defines how one string u ∈ (N ∪T )+ is transformed into another v ∈ (N ∪T )∗. Importantly, ina general grammar, the strings u and v on both sides of the rule can contain both non-terminalsand terminals. In the context-free grammars we actually used, the production rules all had asimpler form. A production rule

A→ v

in a context-free grammar defines how a non-terminal A ∈ N is transformed into a stringv ∈ (N ∪ T )∗.

Context-free grammars are an immensely important class of grammars since they are emi-nently useful in practice — as our earlier examples demonstrate vividly.

The regular grammars are also context-free, though they are too simple to define program-ming languages. In Chapters 12 and 13 we introduced the mathematical theory of regularlanguages. It is a theory that is

(i) conceptually rich and illuminating,

(ii) mathematically attractive, and

(iii) hugely practical and useful.

One wonders: Are the context-free languages also blessed with such a theory? Given thepractical role of context-free grammars in the definition of syntax we need a comprehensivemathematical theory of context free grammars and languages that answers the sorts of questionswe were able to answer for the regular grammars and languages.

In this chapter we introduce the mathematical theory of context-free grammars. We willexplain the basic theory of

545

546CHAPTER 14. CONTEXT-FREE GRAMMARS AND PROGRAMMING LANGUAGES

(a) derivation trees to better understand derivations,

(b) normal forms to analyse and simplify the form of rules in grammars, and

(c) recognition algorithms for parsing.

We will also prove a Pumping Lemma for context-free grammars, rather like the PumpingLemma for regular languages, which enables us to understand in detail the limitations of definingsyntax using the grammars. We have met several of these limitations in the course of earlierchapters, especially in the interface, specification and programming languages of Chapter 11.Using the Pumping Lemma, we can prove that certain desirable features concerning

(i) variable declarations in programs,

(ii) concurrent assignments in programs, and

(iii) type declarations in interfaces

cannot be defined by any context-free languages.

14.1 Derivation Trees for Context-Free Grammars

As we have seen in our examples, it is often useful to picture a derivation from a context-freegrammar as a tree. For example, we can represent the derivation

S ⇒ aSb

⇒ aaSbb

⇒ aaabbb

from the grammar Ganbn

of Section 10.2.2, with the tree in Figure 14.1.

S

a b

S

Sa

a b

b

Figure 14.1: Tree for the derivation of the string aaabbb from the grammar G ab .

What is the structure of this tree? Every node of the tree is labelled with either a non-terminal or a terminal symbol of the grammar. An interior node has label A and n childrenlabelled (from left to right) X1 to Xn if, and only if, there exists a production rule A →X1X2 · · ·Xn ∈ P , as shown in Figure 14.2.

Suppose we have a grammar G with terminals T , non-terminals N , start symbol S andproductions P . A string w ∈ L(G) in the language defined by the grammar always has aderivation tree with root S and terminal symbols at the leaves, such that the left-to-rightconcatenation of the terminal strings at the leaves gives the string w .

14.1. DERIVATION TREES FOR CONTEXT-FREE GRAMMARS 547

X2X1· · ·A

Xn

Figure 14.2: Representation of the application of a production rule A→ X1X2 · · ·Xn .

Definition (Derivation Tree) Consider a derivation

w ⇒∗ w ′

of a string w ′ from a string w consisting of n steps

w = w0 ⇒ w1

...

⇒ wi−1

⇒ wi

...

⇒ wn−1

⇒ wn = w ′.

At each step i a production rule is applied to produce wi from wi−1 . There are two cases,depending on whether the rule has any non-terminals on the right-hand-side or not.

1. If a production rule has only terminal symbols on its right-hand side, then it is of theform

A→ a1 · · · an

and a derivationA⇒ a1 · · · an

is possible from the grammar. This gives a derivation subtree with root A and a singleleaf child a1 · · · an as shown in Figure 14.3.

A

a1a2 · · · an

Figure 14.3: Derivation sub-tree for production rule A→ a1 . . . an .

2. If a production rule has at least one non-terminal symbol on the right-hand side, then itis of the form

A→ u0A1u1 · · · un−1Anun

where n ≥ 1 , A, A1 , . . ., An ∈ N are non-terminals, and u0 , . . ., un ∈ T ∗ are (possiblyempty) strings of terminals. Then a derivation

A⇒ u0A1u1 · · · un−1Anun


is possible from the grammar. This gives a derivation sub-tree with root A and childrenwhose concatenation gives the string u0A1u1 · · · un−1Anun . Further derivations from thenon-terminal symbols A1 , . . . ,An will, in later steps, give sub-trees with roots A1 , . . . ,An ,as shown in Figure 14.4.

· · ·A

A1u0 An un

Figure 14.4: Derivation sub-tree for production rule A→ u0A1u1 · · · un−1Anun ..

Given a derivation tree D ∈ Tree(N ,T ) we can define a function

frontier : Tree(N ,T )→ (N ∪ T )∗

to recover the string frontier(D) ∈ (N ∪ T )∗ produced by D by:

frontier(D) =

D if leaf (D) = tt ;frontier(D1 ) · frontier(D2 ) · · · · · frontier(Dn)

if leaf (D) = ff and D1 , . . . ,Dn are the subtrees of D ;

where leaf : Tree(N ,T )→ B is a predicate that determines whether a tree is a leaf or not.

Lemma (Derivation Trees and Language Generation) Given a context-free grammarG = (T ,N , S ,P), we can derive a string w ∈ (N ∪ T )∗

S ⇒∗ w

if, and only if, there exists a derivation tree D with frontier(D) = w.

Proof. We show a more general statement that

A⇒∗ w

for any A ∈ N and w ∈ (N ∪ T ∗, if, and only if, there exists a derivation tree D with root Aand frontier(D) = w . We split the proof into the if case, and the only if case.

if Suppose that there is a derivation tree D with root A and frontier(D) = w for some stringw ∈ (N ∪ T )∗. We show that

A⇒∗ w

for some A ∈ N by induction on the number of internal nodes of the derivation tree D .The base case is that the derivation tree D has one internal node, i.e., just root node A and

its children. Suppose

frontier(D) = X1 · · ·Xn

= w .


Then there exists a productionA→ X1 · · ·Xn

in P and a derivationA⇒G X1 · · ·Xn .

The Induction Hypothesis is that for any derivation tree D with up to k internal nodes,root A and frontier(D) = w then

A⇒∗G w .

For the Induction Step, we consider a derivation tree D with k + 1 internal nodes, root Aand frontier(D) = w . We consider the children of the root A. Suppose they are X1 , . . . ,Xn

with n ≤ k as there are k + 1 internal nodes. There will be a production rule A → X1 · · ·Xn

that generates the children of A. Each of the nodes will either by a terminal or a non-terminal.If the node Xi ∈ T is a terminal then it has no children. Otherwise, the node Xi ∈ N is anon-terminal with children of its own. It will be the root of some sub-tree Di which has atmost k nodes, so we can apply the Induction Hypothesis:

Xi ⇒∗G frontier(Di).

This gives us a derivation

A⇒G X1 · · ·Xi · · ·Xn

⇒∗G X1 · · · frontier(Di) · · ·Xn

...

⇒∗G frontier(D1 ) · · · frontier(Di) · · · frontier(Dn)

Note that here there are many possible derivations, depending on the order in which we chooseto expand each of the non-terminals of X1 , . . . ,Xn , and the expansion order within each ofthese generations.

Only if Suppose thatA⇒∗ w

for some A ∈ N and w ∈ (N ∪T )∗. We show that there is a derivation tree D with root A andfrontier(D) = w by induction on the length of derivation sequences.

The base case is that A ⇒1G w . Then there exists a production rule A → w in P and a

derivation tree D with root A and frontier(D) = w .The Induction Hypothesis is that for all derivation sequences A ⇒i

G w with i ≤ k , thereexists a derivation tree D with root A and frontier(D) = w .

For the Induction Step, we consider a derivation sequence

A⇒k+1G w .

We split this derivation into two parts, the first step and the remaining steps:

A⇒1G X1 · · ·Xn

⇒kG w1 · · ·wn

= w .


Each Xi is either a terminal or a non-terminal.If Xi ∈ T is a terminal then we have already derived an element of w : Xi = wi .If Xi ∈ N is a non-terminal then we have yet to derive an element of w : Xi ⇒∗

G wi . As thewhole derivation sequence is of length k + 1 , each of the derivation sequences from Xi will beof length at most k , and we can apply the Induction Hypothesis to each. Thus, there will exista derivation tree Di with root Xi and frontier(Di).

Now, we construct a derivation tree D from components: the root of D is A and its childrenare X1 , . . . ,Xn . The terminals amongst the children are leaves. The non-terminals are theroots of derivation subtrees.

14.1.1 Examples of Derivation Trees

Recall that⇒

indicates a 1 -step reduction in a derivation by applying 1 production rule, and that

⇒n

indicates n reductions in a derivation by applying n production rules.

1. Consider the grammar

grammar G0∗1∗

terminals 0, 1

nonterminals S

start symbol S

productions S → 1 S → 0S S → 1S

of Examples 10.2.2(1). Some sample derivations and their corresponding trees are shownin Figure 14.5.

S

0 S

1

S S

1 1 S

0 S

1

S ⇒ 1S ⇒ 10S ⇒ 101S ⇒ 1 S ⇒ 0S ⇒ 01

Figure 14.5: Derivation trees for Examples 10.2.2(1).



grammar Ganbn

terminals a, b

nonterminals S

start symbol S



S

ε

S

Sa b

ε

S

S

ε

S

a b...

a b

S ⇒ ε S ⇒n anSbn ⇒ anbnS ⇒ aSb⇒ ab



grammar Ga2n

terminals a

nonterminals S

start symbol S

productions S → ε S → aSa




S

ε

S

Sa a

ε

S

S

ε

S

a a...

a a

S ⇒ ε S ⇒n anSan ⇒ a2nS ⇒ aSa⇒ aa


grammar GBoolean Expression

import GArithmetic Expression

alphabet true, false, not, and, or,=, <

nonterminals BExp,BOp1 ,BOp2 ,RelOp

start BExp

rules BExp → BOp1 BExpBExp → BExp BOp2 BExpBExp → AExp RelOp AExpBExp → trueBExp → false

BOp1 → notBOp2 → andBOp2 → or

RelOp → =RelOp → <

of Section 10.3.1 for generating Boolean expressions. Some sample derivations and theircorresponding trees are shown in Figure 14.8.

BExp BExpBExp

true BOp2

and

BExp ⇒ true BExp ⇒ BOp1 BExp BExp ⇒ BExp BOp2 BExp

⇒3 true and false⇒2 not false

BOp1 BExp

falsenot

BExp BExp

falsetrue

Figure 14.8: Derivation trees for generating Boolean expressions.


5. Consider the WP Language, whose syntax was given by the BNF in Section 11.5.2. Thederivation tree for the derivation of the program statement

x := y;y := r;r := x mod y

in Example 11.5.2 is given in Figure 14.9.

554C

HA

PT

ER

14.

CO

NT

EX

T-F

RE

EG

RA

MM

AR

SA

ND

PR

OG

RA

MM

ING

LA

NG

UA

GE

S

〈Command〉

〈Statement〉

〈Assignment〉

〈Expression〉:=〈Identifier〉

〈Letter〉〈Expression〉 mod 〈Expression〉

r 〈Identifier〉

〈Letter〉

y

〈Identifier〉

〈Letter〉

x

〈Command〉

〈Statement〉

〈Assignment〉


〈Letter〉

y

〈Identifier〉

〈Letter〉

r

〈CommandList〉

〈Command〉

〈Statement〉

〈Assignment〉


〈Letter〉

x

〈Identifier〉

〈Letter〉

y

;

〈CommandList〉 ;

〈CommandList〉

Figure 14.9: Derivation tree for the program statement x :=y; y :=r; r :=x mod y.


Notice that one important feature of derivation trees is that they may factor out or hide theprecise order in which different production rules are applied to a string. Consider, for example,the grammar:

grammar Gab

terminals a, b

nonterminals S ,A,B

start symbol S

productions S → ABA→ aB → b

Then, there are two derivations for the string ab:

S ⇒ AB ⇒ aB ⇒ ab

andS ⇒ AB ⇒ Ab ⇒ ab.

The difference is in the order of applying the three production rules.However, these derivations are considered equivalent when drawn as trees, as there is only

one derivation tree for ab, namely:

S

A

a

B

b

14.1.2 Ambiguity

As we have just seen, in many grammars it is possible to derive the same string in more thanone way by varying the order of some production rules. In some grammars it is possible toderive the same string using different choices of production rules: in this case, there is morethan one derivation tree for a string that can be produced by the grammar. Such grammarsare called ambiguous :

Definition (Ambiguous Grammar) A grammar G = (T ,N , S ,P is ambiguous if there is astring w ∈ L(G) in the language generated by G which has more than one derivation tree.

An example of an ambiguous grammar is


grammar Gambiguous ab

terminals a, b

nonterminals S

start symbol S

productions S → aSS → bS → ab

The grammar G is ambiguous as the string ab can be derived with two different choices ofrules:

S ⇒ aS ⇒ ab

andS ⇒ ab

which have two different derivation trees as shown in Figure 14.10.

S

a b

S

a S

b

Figure 14.10: Two derivation trees for the string ab.

Another example of an ambiguous grammar involves programming languages with condi-tional statements where the else clause is optional. Consider the grammar that results fromthe BNF:

bnf DanglingElse

import Command ,BooleanExpression

rules〈Command〉 ::= if 〈BooleanExpression〉 then 〈Command〉 |

if 〈BooleanExpression〉 then 〈Command〉 else 〈Command〉

This arguably produces a nicer user syntax for conditional statements which have no else

clauses than the syntax for while programs that we had in Section 11.5.2: instead of writing

if b then S else skip fi

we can now just simply write

if b then S

However, with this grammar, there is a problem in knowing which conditional the else clausefits. Thus, the program


if b1 then if b2 then S1 else S2

could be read as the else clause matches with the second conditional:


or as the else clause matches with the first conditional:


The two possible derivations are illustrated in Figure 14.11.

then〈BooleanExpression〉if 〈Command〉 else 〈Command〉

then〈BooleanExpression〉if 〈Command〉

〈Command〉

b1

b2 S1 S2


〈Command〉

b1

b2 S1

S2


else 〈Command〉

Figure 14.11: Derivation trees produced by the ambiguous BNF DanglingElse.

In practice, programmers use layout and indentation to enhance the legibility of a program,so the two interpretations would be written as:

if b1 then

if b2 then

S1

else

S2

and:


if b1 then

if b2 then

S1

else

S2

However, this simply exposes a program to the danger that the intention of the programmermay not match the implementation. There are four possible solutions.

1. One solution to the problem is to incorporate layout and indentation into the syntax andsemantics of a language.

2. A dangerous solution is to adopt a convention that one of the interpretations always ap-plies. This can be enforced by changing the grammar to remove the ambiguity. However,this is dangerous because it contradicts human behaviour: it requires that a programmeris aware of the convention, and remembers the convention for every conditional statementthat is written.

3. A theoretically satisfactory solution is to require all conditionals to have an else statement.Practially, though, this is not ideal because it is cumbersome and can detract from thelegibility of a program.

4. A solution that is both technically and practically acceptable is to simply require that aconditional is closed in some way, for example with fi or end-if.

Adapting the problematic BNF DanglingElse in the manner of solution (4) in this fashiongives:

bnf UnambiguousPracticalConditionals


rules〈Command〉 ::= if 〈BooleanExpression〉 then 〈Command〉 fi |

if 〈BooleanExpression〉 then 〈Command〉 else 〈Command〉 fi

Now the problem is eliminated as the two cases simply produce different strings:

if b1 then if b2 then S1 else S2 fi fi

or

if b1 then if b2 then S1 fi else S2 fi

Layout and indentation can be used to enhance legibility, for example:

if b1 then

if b2 then

S1

else

S2

fi

fi


and:

if b1 then

if b2 then

S1

fi

else

S2

fi

Now, though, a programmer is free to adopt any layout convention, safe in the knowledge ofhow the statement will be interpreted.

Is it always possible to find a solution for any language though? We would like to ensurethat any derivation of a string is unique so that there is no confusion over how a string shouldbe read.

Can we always remove such ambiguities or not?

Clearly, in the case of Gambiguous ab , we can. For example, the grammar Gab of Section 14.1.1generates the same language

L(Gab) = L(Gambiguous ab) = a, b

as L(Gambiguous ab). In the case of the DanglingElse BNF, we have explored solutions that changethe language to remove a possible source of danger and confusion from human programmers.If, for some reason, we did want such strings in the language (solution (2)), the rules can bealtered to allow this. (Thus, at least the solution is specified, and not open to interpretationby different compiler writers.)

To produce a language where an else branch is matched against the nearest unmatchedthen, we can alter the BNF DanglingElse to produce the unambiguous grammar:

bnf MatchElseToClosestUnmatchedIf


rules〈Command〉 ::= 〈UnmatchedIf〉 |

〈MatchedIf〉〈UnmatchedIf〉 ::= if 〈BooleanExpression〉 then 〈Command〉 |〈MatchedIf〉 if 〈BooleanExpression〉 then 〈MatchedIf〉 else 〈Command〉

Now, the string


has exactly one interpretation:


This is because we can only generate this string with the derivation tree in in Figure 14.12.There are, though, languages where we cannot remove such ambiguities.


then〈BooleanExpression〉if 〈Command〉 else 〈Command〉

b2 S1 S2


〈UnmatchedIf〉

b1

〈Command〉

〈MatchedIf〉

Figure 14.12: Derivation tree produced by the unambiguous BNFMatchElseToClosestUnmatchedIf .

Definition (Inherently Ambiguous Languages) A context-free language L is said to be in-herently ambiguous if every possible context-free grammar G that generates L = L(G) containsan ambiguity.

There exist inherently ambiguous languages.

Theorem (Existance of Inherently Ambiguous Languages) The language

Linherently ambiguous = anbncmdm | n ≥ 1 ,m ≥ 1 ∪ anbmcmdn | n ≥ 1 ,m ≥ 1

is inherently ambiguous.

Proof. See Hopcroft and Ullman [1979]. (Note that this is hard.)

This demonstrates that whilst for some particular context-free languages we may be ableto produce an unambiguous grammar, this will not be true in general.

Fortunately, the problem of inherent ambiguity does not seem to arise in programminglanguages. However:

Theorem There does not exist an algorithm that can decide whether or not a given context-freegrammar is ambiguous.

For a proof of this, see Hopcroft and Ullman [1979].


14.1.3 Leftmost and Rightmost Derivations

We have seen that we can abstract away from the order in which we apply production rulesin a derivation of a string by expressing the derivation as a derivation tree. When a grammarG = (T ,N , S ,P) is unambiguous, we know that any string w ∈ L(G) of the language generatedby G has exactly one derivation tree.

Sometimes, though, just having an unambiguous grammar is insufficient. Although a deriva-tion tree gives a unique representation of a derivation, we want to have the same power overderivations themselves. How can we ensure that there is only one derivation sequence of anystring of a language for which there exists some unambiguous grammar?

Let us examine the problem. In an unambiguous grammar, we are guaranteed to have aunique derivation tree for any string of the language. The derivation tree abstracts away fromthe order in which we apply production rules. In a context-free grammar, we get a choice ofproduction rules we can apply in a derivation whenever a derivation produces a string withmore than one non-terminal:

uAy ⇒ uvA1wA2xy

for non-terminals A,B ,C ∈ N and strings u, v ,w , x , y ∈ (N ∪ T )∗. As a derivation sequencestarts from a single non-terminal, the only way we can reach this point is that there is aproduction

A→ vBwCx

with more than one non-terminal on the the right-hand side.If we have a regular grammar, then we can guarantee that all the production rules have

only have one non-terminal on the right-hand side of productions, Hence, we can only ensurethat there is a unique derivation for any string of a regular language.

So what can we do for context-free languages? We shall see in Section 14.2 that we canrestrict the production rules of context-free grammars in various ways whilst retaining theirpower. However, all these restrictions still require context-free grammars to have more thanone non-terminal on the right-hand side of production rules.

The only option that we have available is to impose additional restrictions on derivations.We shall simply require that we apply production rules in a specific order when deriving astring. There are two methods we can use to guarantee that, if the grammar is unambiguous,then we can only ever have one derivation for a string:

(i) we always apply a production rule to the left-most non-terminal in a string; or

(ii) we always apply a production rule to the right-most non-terminal in a string.

First, we formalise the definitions for left-most derivations.

Definition (Left-Most Derivation) Given a context-free grammar G = (T ,N , S ,P), aderivation w ⇒ w ′ of a string w ′ ∈ (N ∪ T )∗ from a string w ∈ (N ∪ T )+ is a left-mostderivation if,

(i) w = sAt ,

(ii) w ′ = svt ,

(iii) A→ v ∈ P , and


(iv) s ∈ T ∗ is a possibly empty string of terminals,

where t , v ∈ (N ∪ T )+, and A ∈ N .

Definition (Left-Most Derivation Sequence) Given a context-free grammar G = (T ,N , S ,P),a derivation sequence w1 ⇒ · · ·wn is a left-most derivation sequence if each of the derivationsw1 ⇒ w2 , . . ., wn−1 ⇒ wn is left-most.

The definition of right-most derivations is analagous.

Definition (Right-Most Derivation) Given a context-free grammar G = (T ,N , S ,P), aderivation w ⇒ w ′ of a string w ′ ∈ (N ∪ T )∗ from a string w ∈ (N ∪ T )+ is a right-mostderivation if,

(i) w = sAt ,

(ii) w ′ = svt ,

(iii) A→ v ∈ P , and

(iv) t ∈ T ∗ is a possibly empty string of terminals,

where s , v ∈ (N ∪ T )+, and A ∈ N .

Definition (Right-Most Derivation Sequence) Given a context-free grammar G = (T ,N , S ,P),a derivation sequence w1 ⇒ · · ·wn is a right-most derivation sequence if each of the derivationsw1 ⇒ w2 , . . ., wn−1 ⇒ wn is right-most.

Lemma (Unique Derivations from Left-Most Derivations) The left-most derivation ofany string w ∈ (N ∪ T ∗) from an unambiguous grammar G = (T ,N , S ,P) is unique.

Proof The Derivation Trees and Language Generation Lemma in Section 14.1 tells us thatthere is a derivation tree for any derivation S ⇒∗

G w . Furthermore, within the Derivation Treesand Language Generation Lemma, we can choose the derivation that is left-most. In addition,for any two left-most derivations

v1 ⇒G · · · ⇒G vn

andw1 ⇒G · · · ⇒G wn

if v1 , . . . , vn differs from w1 , . . . ,wn , then the derivation trees will be different.Finally, as the grammar is unambiguous, there will be a unique derivation tree for any

string. 2

14.2 Normal and Simplified Forms for Context-Free Gram-

mars

We have seen that a language can be defined by different grammars. In practice, it is easyto find different grammars that define the same language. In fact, for any language thereare infinitely many grammars available to define it. This raises the possibility of developingmethods for simplifying grammars by transforming their rules.

14.2. NORMAL AND SIMPLIFIED FORMS FOR CONTEXT-FREE GRAMMARS 563

Simplification and transformation methods allows a grammar to be designed for clarity, butthen manipulated into another form that is suited for some other purpose, such as parsingefficiency or theoretical analysis of its properties.

Any context-free grammar can be transformed into an equivalent grammar that has all ofits production rules in certain simple formats called normal forms. There are two importantnormal forms for context-free grammars:

Chomsky Normal Form and Greibach Normal Form.

14.2.1 Removing Null Productions

Production rules of the form A→ ε can make the design of a language clearer. Often, though,it is desirable to remove such productions, for example to improve the efficiency of parsing, orto prove properties of the language.

We start with a definition of null productions.

Definition (Null Production) Given a grammar G = (T ,N , S ,P), a production

A→ ε

of P where A ∈ N is a non-terminal is a null production of G .

We can transform a context-free grammar G with null productions to an equivalent context-free grammar without, except if the language L(G) contains the empty string. In such cases,we also require a production rule S → ε, in order to ensure that ε ∈ L(G).

We shall perform this transformation in a two-stage process. First, we calculate those non-terminals from which it is possible to derive ε. Then, we shall modify the production rules totake account of these nullable non-terminals. We shall then show that this modification doesnot alter the language that is produced.

Definition (Nullable Non-Terminal) Given a grammar G = (T ,N , S ,P), a non-terminalA ∈ N is nullable if, and only if,

(i) there is a production rule A→ ε in P , or

(ii) there is a production rule A → B1 · · ·Bn in P and each of the non-terminals B1 , . . . ,Bn

are nullable.

We can easily translate this definition into an algorithm to calculate the set of nullablenon-terminals of a grammar.

Algorithm to find nullables

We use the setnulls(i) = A ∈ N | A⇒i

G ε


to store the set of non-terminals that can derive the string ε in i steps.

for i:=1 to |N| do

nulls(i):=∅od;for each A→ ε ∈ P do

nulls(1):=nulls(1) ∪ Aod;i:=2;while nulls(i) 6= nulls(i-1) do

nulls(i):=nulls(i-1);for each A→ B1 · · ·Bn ∈ P | B1 , . . . ,Bn ∈ nulls(i) do

nulls(i):=nulls(i) ∪ Aod;i:=i+1

od;nullable:=nulls(i)

Algorithm to remove nullables

The next stage in the process of removing nullable productions is to add to the set of produc-tions, all variations of the rules that are produced by considering that a nullable variable is theempty string. Then, we remove the null productions. Algorithmically, this gives:

Q :=P ;while Q 6= ∅ do

choose p ∈ Q ;for A ∈ nullable do

for each xi ∈ (T ∪ N ) | A→ x1 · · · xn ∈ P do

q :=A→ x1 · · · xi−1xi+1 · · · xn ;Q :=Q ∪ q;P :=P ∪ q

od

od;Q :=Q − p

od

Example Consider the grammar


grammar Declarations0

import Type, IdentifierList

terminals var, ;, :

nonterminals Declarations ,Declaration,DeclarationList

start symbol Declarations

productions Declarations → var DeclarationList ;DeclarationList → εDeclarationList → Declaration;DeclarationListDeclaration → IdentifierList :Type

The non-terminal DeclarationList is nullable because of the rule DeclarationList → ε. NeitherDeclarations or Declaration is nullable because all their productions have at least one terminalon the right-hand side and hence cannot satisfy the condition that all the right-hand side ofsome production consists of nullable nonterminals.

When we convert Declarations to have no null productions, we convert the rules that involveDeclarationList on the right-hand sides; all the other rules are not affected. Thus, removingthe null production

DeclarationList → ε

makesDeclarations → var DeclarationList ;

become two rules, the original:

Declarations → var DeclarationList ;

and that which results from treating the nullable nonterminal DeclarationList as though it isthe empty string:

Declarations → var ;

The grammar that results is:

grammar Declarations

import Type, IdentifierList

terminals var, ;, :

nonterminals Declarations ,Declaration,DeclarationList

start symbol Declarations

productions Declarations → var DeclarationList ;Declarations → var ;DeclarationList → Declaration;DeclarationListDeclaration → IdentifierList :Type


Example Consider the grammar:

grammar G

terminals a, b

nonterminals S ,A,B

start symbol S

productions S → ABA → aA → εB → bB → ε

This has two immediately null productions A→ ε and B → ε, so A and B are nullable. Thus,the rule S → AB causes S to be nullable.

Dealing with A being nullable, we remove the rule A → ε and we compensate the ruleS → AB , by adding the production S → B . Similarly, with B being nullable, we remove therule B → ε and add the production S → A. If there were any rules that involved S on theright-hand side we would also have to deal with them as S is nullable. However, there are not,but S is the start symbol so ε is in the language L(G) generated by G . In this case, we needto add a rule S → ε to our transformed grammar to enable ε to be generated. The grammarthat results is:

grammar G ′

terminals a, b

nonterminals S ,A,B

start symbol S

productions S → ABS → AS → BS → εA → aB → b

Now, we prove that the transformed grammar is equivalent to the original grammar.

Theorem (Null-Free Equivalence) A grammar G = (T ,N , S ,P) that is transformed usingthe above algorithm into a grammar G ′ = (T ,N ′, S ,P ′) produces the same language L(G) =L(G ′).

Proof We show that for any non-terminal A ∈ N and any non-empty string w 6= ε ∈ T ∗ ofterminals,

A⇒∗G w if, and only if, A⇒∗

G ′ w .

We split the proof into the if-case and the only-if-case.


if Suppose A→∗G w for some non-terminal A ∈ N and some non-empty string w 6= ε ∈ T ∗ of

terminals. We prove by induction on the length of derivation sequences that A⇒∗G ′ w .

For the base case, we can assume that A ⇒1G w in order to show that A ⇒∗

G ′ w . Now asw 6= ε, the derivation A ⇒ w in G is produced by the production rule A → w in G . Sincew ∈ T ∗, this is not a nullable production. Thus, this production A→ w is in G ′. We can usethis G ′ production to generate w : A⇒1

G ′ w .For the Induction Hypothesis, we assume that if A ⇒i

G w then A ⇒∗G ′ w for any value of

i ≤ k .For the induction case, we suppose that A⇒k+1

G w . We split this derivation sequence apartinto its first step and its remaining k steps:

A⇒1G X1 · · ·Xn

⇒kG w

where each X1 , . . . ,Xn ∈ (N ∪T ) is either a terminal or a non-terminal. We can consider thateach terminal wi ∈ T or wi = ε of the string

w = w1 · · ·wn ∈ T ∗

is derivedXi ⇒∗

G wi

from one of the symbols Xi in at most k steps. (In the case that Xi is a terminal, then thisuses zero steps.) However, G has nullable productions, so some of w1 , . . . ,wn may be ε. Weremove any wi = ε and its corresponding Xi . As w 6= ε, there must be some terminal symbolsremaining from w1 , . . . ,wn and some symbols remaining from X1 , . . . ,Xn . So we have

Y1 · · ·Ym ⇒kG v1 · · · vm

= w

where Y1 , . . . ,Ym ∈ (N ∪T ) are the remaining non-empty symbols and v1 , . . . , vm ∈ T are theremaining non-empty terminal symbols. Furthermore, the Xi ⇒∗

G ε that we have removed arenullable symbols, and A→ Y1 · · ·Ym is a production in G ′. Thus,

A⇒1G ′ Y1 · · ·Ym

⇒kG v1 · · · vm

= w .

We know that each of v1 , . . . , vm are derivable from Y1 , . . . ,Ym in at most k steps, so we canuse the Induction Hypothesis. This tells us that we can perform these derivations in G ′, so weget:

A⇒1G ′ Y1 · · ·Ym

⇒∗G ′ v1 · · · vm

= w .

Hence,

A⇒∗G ′ w .


only if Suppose A ⇒∗G ′ w . As G ′ has no ε productions, w 6= ε. We prove by induction on

the length of derivation sequences that A⇒∗G w .

The base case is that A⇒∗G w . There are two ways in which w can be produced in G . The

simplest case is that there is a production rule A → w in G , so A ⇒1G w . The alternative is

that there is a production rule A→ α in G with nullable non-terminals and when we generateε from each of them, we are left with w :

A⇒G α

⇒∗G w

For the induction hypothesis, we assume that if A ⇒iG ′ w then A ⇒∗

G w for any value ofi ≤ k .

For the induction case, we suppose that A⇒k+1G ′ w . We split this derivation sequence apart

into its first step and its remaining k steps:

A⇒1G ′ X1 · · ·Xn

⇒kG ′ w

where each X1 , . . . ,Xn ∈ (N ∪T ) is either a terminal or a non-terminal. We can consider thateach terminal wi of the string

w = w1 · · ·wn ∈ T ∗

is derivedXi ⇒∗

G ′ wi

from one of the symbols Xi in at most k steps. (In the case that Xi is a terminal, then thisuses zero steps.)

Note that as G ′ does not have nullable productions, none of w1 , . . . ,wn are the empty stringε.

We know that each of w1 , . . . ,wn are derivable from X1 , . . . ,Xn in at most k steps, so wecan use the Induction Hypothesis. This tells us that we can perform these derivations in G , sowe get:

A⇒1G ′ X1 · · ·Xn

⇒∗G w1 · · ·wn

= w .

We can deal with the first step A ⇒1G X1 · · ·Xn of the derivation in the same way that we

handled the base case: either there is already a production A→ X1 · · ·Xn in G , or else there isa production A→ α involving nullable non-terminals so that we have the derivation sequenceA⇒1

G α⇒∗G X1 · · ·Xn .

Hence,

A⇒1G X1 · · ·Xn

⇒∗G w1 · · ·wn

= w


and so

A⇒∗G w .

Putting both halves of the proof together, we have shown that

A⇒∗G w if, and only if, A⇒∗

G ′ w .

Thus, in the case that A = S ,

S ⇒∗G w if, and only if, S ⇒∗

G ′ w .

Hence,L(G) = L(G ′).

2

14.2.2 Removing Unit Productions

Just as it can help to make the definition of a language clearer by using a grammar with nullproductions, it can also be helpful to have productions of the form A → B for non-terminalsA,B ∈ N in a grammar G = (T ,N , S ,P). Such unit productions can be removed to make, forexample, parsing more efficient, or proofs of language properties easier.

We start with a definition.

Definition (Unit Production) Given a grammar G = (T ,N , S ,P), a unit production is arule of the form

A→ B ∈ P

where A,B ∈ N are both non-terminals.

Unit productions in a grammar are superfluous; we can create an equivalent grammarwhich has no unit productions. First, we introduce some helpful definitions, then we give analgorithm to perform the transformation based on this definition, and finally we prove that thetransformation perserves the generated language.

We start with a definition of unit derivation.

Definition (Unit Derivation) Given a grammar G = (T ,N , S ,P), a unit derivation is aderivation

w ⇒G w ′

of the string w ∈ (N ∪ T )+ from w ′ ∈ (N ∪ T )+ using a unit production.

Now we extend this to the notion of unit derivable, which we simplify by assuming that agrammar has already had any null productions removed.

Definition (Unit Derivable) Given a grammar G = (T ,N , S ,P) with no null productions,we define that a non-terminal is unit derivable if, and only if:

(i) there is a production rule A→ B in P , then B ∈ N is unit derivable from A ∈ N ; or

(ii) B is unit derivable from A and C is unit derivable from B then C is unit derivable fromA.

We can easily translate this definition into an algorithm to calculate the set of unit derivablenon-terminals of a grammar.


Algorithm to find unit derivables

We use the setsunits(i,A) = A′ ∈ N | A⇒ A1 ⇒ · · · ⇒ An ⇒ A′

to store the set of non-terminals that are unit derivable from A ∈ N in i steps.

unit-derivables:=∅;for i:=1 to |N | do

for A ∈ N do

units(i,A):=∅od

od;for each A→ B ∈ P do

units(1,A):=units(1,A) ∪ Bod;i:=2;for each A ∈ N do

while units(i,A) 6= units(i-1,A) do

units(i,A):=units(i-1,A);for each B → C ∈ P | B ∈ units(i-1,A) do

units(i,A):=units(i,A) ∪ Cod

od;unit-derivable(A):=units(i,A);for each B ∈ N | B ∈ unit-derivable(A)do

unit-derivables:=unit-derivables ∪ unit-derivable(A→ B)done;i:=i+1

od

Next, we give an algorithm to remove unit productions.

Algorithm to remove unit productions

Again, we assume that the input grammar has already had any null productions removed.

P ′:=P ;for each B ∈ unit-derivable(A) do

for each B → w ∈ P where w 6∈ unit-derivable(B) doP ′:=P ′ ∪ A→ w

od

od;P ′:=P ′−unit-derivables

Example Consider the grammar


grammar Number

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals Digit ,Digits ,NonZero,Zero

start symbol

productions Zero → 0NonZero → 1

...NonZero → 9Digit → ZeroDigit → NonZeroDigits → DigitDigits → Digit DigitsNumber → DigitNumber → NonZero Digits

There are four unit productions: Digit → Zero, Digit → NonZero, Digits → Digit andNumber → Digit . Thus, Zero,NonZero are unit derivable from Digit

units(1, Digit) = Zero,NonZero,

and Digit is unit derivable from Digits and Number

units(1, Digits) = Digitunits(1, Number) = Digit.

We can have a derivation of length 2 that just involves unit productions. As Digits → Digitand units(1, Digit) = Zero,NonZero,

units(2, Digits) = Zero,NonZero.

Similarly, Number → Digit and units(1, Digit) = Zero,NonZero,

units(2, Number) = Zero,NonZero.

There are no derivations of length 3 that just involve unit derivations, so

unit-derivable(Digit) = Zero,NonZerounit-derivable(Digits) = Digit ,Zero,NonZero

unit-derivable(Number) = Digit ,Zero,NonZero.

Now, we transform the grammar to remove its unit productions. We add rules

Digit → 0, . . . ,Digit → 9

to compensate for the set unit-derivable(Digit);

Digits → 0, . . . ,Digits → 9


to compensate for the set unit-derivable(Digits); and

Number → 0, . . . ,Number → 9

to compensate for the set unit-derivable(Number). Finally, we remove the unit productions toproduce the grammar:

grammar Number ′

terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

nonterminals Digit ,Digits ,NonZero,Zero

start symbol

productions Zero → 0NonZero → 1

...NonZero → 9Digit → 0

...Digit → 9Digits → 0

...Digits → 9Digits → Digit DigitsNumber → 0

...Number → 9Number → NonZero Digits

Theorem (Unit-Free Equivalence) A grammar G = (T ,N , S ,P) that is transformed usingthe above algorithm into a grammar G ′ = (T ,N ′, S ,P ′) without unit productions generates thesame language L(G) = L(G ′).

Proof The grammar G ′ is constructed to have no unit productions.Now, suppose that w ∈ L(G). Then there is a left-most derivation sequence

S = X0

⇒G X1

⇒G

...

⇒G Xn

= w .


If each Xi ⇒G Xi+1 is by a non-unit production, then these rules are also in G ′ and so

S = X0

⇒G ′ X1

⇒G ′

...

⇒G ′ Xn

= w .

Suppose, then, that there is at least one unit production used in the derivation of w ∈ T ∗.Consider all such sequences without cycles of unit derivations (as we can always detect andremove any such cycles uAv ⇒ · · · ⇒ u ′Av ′):

Xi ⇒G Xi+1

⇒G

...

⇒G Xj

of unit derivations present in the derivation of w ∈ L(G). To ensure that we have the fullsequence of unit derivations, we require that

(i) either Xi−1 ⇒G Xi is a non-unit derivation or i = 0 ; and

(ii) Xj ⇒G Xj+1 is a non-unit derivation.

Now, because we are using left-most derivations, we are always replacing a symbol that isin the same position within the string. But that means we can have the same effect by a singleproduction in P ′ and we can replace the sequence of unit derivations with a single derivation

Xi ⇒G ′ Xj+1

in G ′. 2

14.2.3 Chomsky Normal Form

Recall, from Chapter 12, that the rules in regular grammars are very simple: all the productionrules P in a regular grammar G = (T ,N , S ,P) have either the form

A→ a or A→ aB

orA→ a or A→ Ba

where a ∈ T is a terminal and A,B ∈ N are non-terminals. In terms of syntax, the reg-ular grammars define very simple languages, such as identifiers and other building blocks ofprograms.

In contrast, context-free grammars define very complicated languages, such as programminglanguages. In this section, we shall show that we can capture all of this complexity simply by


allowing two non-terminals on the right-hand side of rules. In particular, all the productionrules P in a Chomsky Normal Form grammar G = (T ,N , S ,P) have the form

A→ a or A→ BC

where a ∈ T is a terminal and A,B ,C ∈ N are non-terminals. We shall see that this simpleextension will allow us to define any context-free language.

We start with a formal definition of the grammars we shall study in this section.

Definition (Chomsky Normal Form) A context-free grammar G = (T ,N , S ,P) is said tobe in Chomsky Normal Form if all of its productions are either of the form

A→ BC or A→ a

where a ∈ T is a terminal symbol and A,B ,C ∈ N are non-terminal symbols.

In this section, we shall work our way up to the Chomsky Normal Form Theorem, whichtells us that for any context-free grammar we can always find some equivalent Chomsky NormalForm grammar. We prove this theorem by defining an algorithm to perform this transformationthat preserves the language defined by the grammar.

Transformation Algorithm

The algorithm that transforms a context-free grammar without null productions and unit pro-ductions into a Chomsky Normal Form grammar has two stages.

1. Deal with productions that involve terminals by transforming rules that are not in theform A→ a where A ∈ N is a non-terminal and a ∈ T is a terminal.

2. Deal with productions that involve non-terminals by transforming rules that are not inthe form A→ BC for non-terminals A,B ,C ∈ N .

Note that we simplify the algorithm by assuming that we have already removed any null pro-ductions and any unit productions.

Step 1 The first step of the algorithm deals with terminal symbols. A production of the formA→ a where A ∈ N and a is already in the correct form, so such productions are not altered.This step in the algorithm deals with production rules that have at least two symbols on theright-hand side. Consider such a production p ∈ P where p is of the form

A→ X1 · · ·Xm

where m ≥ 2 , A ∈ N and X1 ∈ (T ∪ N ). For each Xi ∈ T that is a terminal, say a ∈ T , thenintroduce a new non-terminal Ba and a new production Ba → a (which is in the allowed form).

Let the new set of non-terminals be N ′ and the new set of productions be P ′. This gives usa grammar G ′ = (T ,N ′, S ,P ′).


Step 2 In the second stage, we now modify the grammar G ′ = (T ,N ′, S ,P ′) to deal withrules that have more than two non-terminals on the right-hand side of production rules. Notethat as a result of the previous steps, we only have to examine rules that have more than twosymbols on the right-hand.

For each productionA→ B1 · · ·Bm

where m > 2 , we create new non-terminals C1 , . . . ,Cm−2 and replace A → B1 · · ·Bm by theset of productions:

A → B1D1

D1 → B2D2

...

Dm−3 → Bm−2Dm−2

Dm−2 → Bm−1Bm

Letfresh : Grammar × N→ NonTerminal

such thatfresh(G , n)

returns a non-terminal that is not present in G , and is different from any fresh(G , i) for i < n.We can give an algorithm for the whole transformation process.

N ′:=N ;P ′:=P ;count:=0;for each A→ X1 · · ·Xn ∈ P do

for each Xi ∈ T do

count:=count+1;p:=fresh(G , count)→ Xi ;P ′:=P ′ ∪ p;N ′:=N ′ ∪ fresh(G , count)

od;P ′:=P ′ − A→ X1 · · ·Xn

od;N ′′:=N ′;P ′′:=P ′;for each A→ B1 · · ·Bm ∈ P ′ do

count:=count+1;P ′′:=P ′′ ∪ A→ B1 fresh(G , count),

fresh(G , count+1)→ B2 fresh(G , count+2),...fresh(G , count+m-2)→ Bm−1Bm−2;

N ′′:=N ′′ ∪ fresh(G , count), . . . fresh(G , count+m-2)od


Note that the function fresh is actually stronger than is required for Step 1; whenever we replacea terminal t ∈ T within a production rule with more than one symbol on the right-hand-side,we can use the same non-terminal each time. In Step 2 we require the full strength that freshprovides, except in those cases where the right-hand side of a production rule occurs as theright-hand side of another production.

Theorem (Chomsky Normal Form) Any context-free grammar G can be transformed to agrammar GCNF , which is in Chomsky Normal Form, such that their languages are equivalent,i.e., L(G) = L(GCNF ).

Proof. Let G = (T ,N , S ,P) be any context-free grammar. Then we can find an equivalentgrammar GCNF which is in Chomsky Normal Form.

Consider the grammar G ′ = (T ,N ′, S ,P ′) that results from the first step of the algorithm.We prove that L(G ′) = L(G).

It is easy to see from the construction of G ′ for any derivation w ⇒G w ′ where w ,w ′ ∈(N ∪ T )+ that, w ⇒∗

G ′ w . Thus, L(G) ⊆ L(G ′).Now we demonstrate by induction on the length of derivation sequences that if A ⇒∗

G ′ wthen A ⇒∗

G w , for A ∈ N and w ∈ T ∗. By the construction of G ′, the result is trivial for onestep derivations. For the Induction Step, we consider a derivation A ⇒k+1

G ′ w . We split thisderivation sequence into two parts, that which results from the first derivation, and the otherk steps of the derivation. By the construction of G ′, this gives us:

A⇒G ′ B1 · · ·Bm

⇒kG ′ w1 · · ·wm

= w

for m > 1 .By construction of P ′ there is a production A→ X1 · · ·Xn of P where

Xi =

Bi if Bi ∈ N ;

ui otherwise.

For those Bi ∈ N , we know Bi ⇒∗G ′ wi takes at most k steps. So, by the Induction Hypothesis,

Xi ⇒∗G wi . Hence, A⇒∗

G w .We have now proved that any context-free language can be generated by a grammar with

rules of the formA→ a or A→ B1 · · ·Bm

for m ≥ 2 , where each Bi ∈ N is a non-terminal and a ∈ T is a terminal symbol.Now for the second stage of the algorithm. If N CNF is the new set of non-terminals and

PCNF is the new set of productions, then GCNF = (T ,N CNF , S ,PCNF ) is in Chomsky NormalForm and it is clear that L(G ′) = L(GCNF ), by essentially the same proof as above, and henceL(G) = L(GCNF ).

Example Consider the grammar:


grammar Basic Boolean Expressions

terminals true, false, not, and, or

nonterminals B

start symbol B

productions B → trueB → falseB → not BB → B and BB → B or B

The grammar Basic Boolean Expression has no null productions, and no unit productions.Applying the first step of the algorithm to convert Basic Boolean Expression into ChomskyNormal Form produces:

grammar Basic Boolean Expressions ′


nonterminals B ,N ,A,O

start symbol B

productions B → trueB → falseB → NBB → BABB → BOBN → notA → andO → or

Applying the second step of the algorithm produces:


grammar CNF Basic Boolean Expressions


nonterminals B ,N ,A,O ,C ,D

start symbol B

productions B → trueB → falseB → NBB → BCB → BDN → notA → andO → orC → ABD → OB

Let us compare the generation of the string

true and false and not true

using the original grammar Basic Boolean Expression and its Chomsky Normal Form variantCNF Basic Boolean Expression. In the original grammar, we can derive this string by:

B ⇒ B and B

⇒ B and Band B

⇒ B and B and not B

⇒ true and B and not B

⇒ true and false and not B

⇒ true and false and not true

In the Chomsky Normal Form variant, we can derive the same string by:

B ⇒ BC

⇒ BAB

⇒ B and B

⇒ B and BC

⇒ B and BAB

⇒ B and B and B

⇒ B and B and NB






In the transformed grammar, in any derivation step we either replace a single non-terminalwith a terminal string, or we replace a single non-terminal with two non-terminals. This iseven more evident in Figure 14.13 which illustrates the derivation trees for this string.

and BB

B

true

B

CB

BAtrue

and

B Band

B C

false A B

N B

not true

Chomsky Normal FormOriginal Grammar

false

and

B

true

not

Figure 14.13: Derivation trees for the string true and false and not true from original grammarand its CNF variant.

14.2.4 Greibach Normal Form

Another important form of context-free grammar is an apparently modest generalisation ofregular grammars.

Definition (Greibach Normal Form) A context-free grammar is said to be in GreibachNormal Form if all of its productions are either of the form

A→ a or A→ aA1 · · ·An

where a ∈ T is a terminal and A1 , . . . ,An ∈ N are non-terminals.

Thus, the right-hand side of any production rule consists of a terminal symbol, followed byzero or more occurrences of non-terminal symbols.

Transformation algorithm

The first stage in the transformation is to separate out those productions which are responsiblefor causing cyclic behaviour in a left-most derivation. These productions are either:

• directly responsible because they are immediately left recursive, having the form A→ Awfor non-terminals A ∈ N and w ∈ (T ∪ N )∗; and


• indirectly responsible because they have the form A→ Bv when there also exist rules ofthe form B → Cw for non-terminals A,B ∈ N and v ,w ∈ (T ∪ N )∗.

Algorithmically, the first stage in detecting and dealing with these productions is to re-writethe grammar to have non-terminals

N = A1 ,A2 , . . . ,Am.

We suppose that this has already been done.The first stage of the algorithm categorises productions into one of three different forms.

1. Production rules of the formAi → Ajw

for i < j are added to the new set of production rules as they are not the cause of cyclicbehaviour in left-derivations.

2. Production rules of the formAk → Ajw

for k > j result in intermediate productions

Ak → v1w , . . . ,Ak → vnw . . . ,

forAj → v1 , . . . ,Aj → vn .

This step is repeated for up to k−1 times to deal with productions of the form Ak → Aj ′w′

for k > j ′ that result. Finally, the productions

Ak → v

that result at the end of this iteration are added to the new set of production rules. Atthis point, v will either be of the form

• ay for some a ∈ T and y ∈ (T ∪ N ), or

• Aly with k ≤ l for some Al ∈ N and y ∈ (T ∪ N ).

3. Production rules of the formAk → Akw

that are either from the original production set, or have resulted from the previous step,result in productions

Bk → w and Bk → wvk

where Bk is a new, fresh, non-terminal that is added to the set of non-terminals to createthe new set N ′ of new non-terminals.

At the end of this first stage, the new set of production rules are of one of three forms

• Ai → Ajw for i < j , Ai ,Aj ∈ N , w ∈ (T ∪ N )∗,

• Ai → aw for i < j , Ai ∈ N , w ∈ (T ∪ N )∗, and


• Bi → v for Bi ∈ N , v ∈ (N ∪B1 , . . . ,Bi−1 )∗. (The form of v results from the algorithmand the original grammar being in Chomsky Normal Form.)

The second stage of the algorithm transforms rules of the form Ai → Ajw to rules of theform Ai → aw ′ for a ∈ T , Ai ,Aj ∈ N , and w ,w ′ ∈ (T ∪N )∗. Given the form of the productionrules after the first stage, we know that any production rule with Ai on the left-hand side caneither have a terminal symbol or a higher numbered Aj for j > i as the first symbol. Thus,Am must have a terminal as the left-most symbol of the production rule’s right-hand side. Wesystematically work our way down from Am to A1 to ensure all the remaining rules with Ai onthe left-hand side also start with a terminal. We do this by replacing the rule Ai → Ajw withthe rules Ai → v1w , . . ., Ai → vnw , where Aj → v1 , . . ., Aj → vn . Note that each of v1 , . . .,vn , already starts with a terminal symbol, so the replacements we make will also starts with aterminal symbol.

At the end of this second stage, the new set of production rules are of one of two forms

• Ai → aw for a ∈ T , Ai ∈ N , w ∈ (T ∪ N )∗, and

• Bi → v for Bi ∈ N ′, v ∈ (N ∪ B1 , . . . ,Bi−1 )∗.

The third stage of the algorithm applies an analogous process to the productions of the formBi → v . The lowest numbered, B1 , can only refer to the original non-terminals A1 , . . . ,An .Each production rule that starts with one of these non-terminals is in the correct form afterstage 2. Thus, we can replace the rule B1 → Aiw with rules B1 → y1w , . . ., B1 → ynw ,where A1 → y1 , . . ., A1 → y1 , and furthermore, we know that each of y1 , . . . , yn starts witha terminal symbol. We repeat the process with the next non-terminal symbol B2 , which canonly refer to the original non-terminals, or to B1 , all of whose productions are already in theright form. We continue in this fashion for each of the Bi .

At the end of this third stage, the new set of production rules are all in the form

• X → aw for a ∈ T , X ∈ N , w ∈ (T ∪ N ′)∗.


The complete algorithm is:

N ′:=N ;P ′:=P ;for k:=1 to m do

for j:=1 to k-1 do

for each Ak → Ajw ∈ P do

for each Aj → v ∈ P do

P ′:=P ′ ∪ Ak → vwod;P ′:=P ′ − Ak → Ajw

od

od;for each Ak → v ∈ P ′ do

if v=Akw then

N ′:=N ′ ∪ Bk;P ′:=P ′ ∪ Bk → w ,Bk → wBk;P ′:=P ′ − Ak → Akw

else

P ′:=P ′ ∪ Ak → vBkfi

od

od;

P ′′:=P ′;for k:=m-1 downto 1 do

for each Ak → Alw ∈ P ′ do

for each Al → v ∈ P ′ do

P ′′:=P ′′ ∪ Ak → vwod

od;P ′′:=P ′′ − Ak → Alw

od

P ′′′:=P ′′;for k:=1 to m do

for each Bk → Aiw ∈ P ′′ do

for each Ai → v ∈ P ′′ do

P ′′′:=P ′′′ ∪ Bk → vwod

od;P ′′′:=P ′′′ − Bk → Aiw

od

Theorem (Greibach Normal Form) Any ε-free context-free grammar G can be transformedto a grammar GGNF , which is in Greibach Normal Form, such that their languages are equiva-lent, i.e., L(G) = L(GGNF ).


Example Consider the CNF grammar CNF Basic Boolean Expressions that we generated inthe example in Section 14.2.3:

grammar CNF Basic Boolean Expressions


nonterminals B ,N ,A,O ,C ,D

start symbol B

productions B → true N → notB → false A → andB → NB O → orB → BC C → ABB → BD D → OB

The grammar CNF Basic Boolean Expression is in Chomsky Normal Form, so we can nowtransform it into Greibach Normal Form. First, we re-write the grammar using suitable labelsfor the non-terminals.

grammar G0


nonterminals A1 ,A2 ,A3 ,A4 ,A5 ,A6

start symbol A1

productions A1 → true A2 → notA1 → false A3 → andA1 → A2A1 A4 → orA1 → A1A5 A5 → A3A1

A1 → A1A6 A6 → A4A1

Applying the first step of the algorithm to convert G0 into Greibach Normal Form produces:

grammar G1



start symbol A1

productions A1 → true A2 → not A7 → A5

A1 → false A3 → and A7 → A5A7

A1 → A2A1 A4 → or A8 → A6

A1 → true A7 A5 → and A1 A8 → A6A8

A1 → true A8 A6 → or A1

A1 → false A7

A1 → false A8


Applying the second step of the algorithm produces:

grammar G2



start symbol A1

productions A1 → true A2 → not A7 → A5

A1 → false A3 → and A7 → A5A7

A1 → not A1 A4 → or A8 → A6

A1 → true A7 A5 → and A1 A8 → A6A8


A1 → false A7

A1 → false A8

Applying the third, and final, step of the transformation gives:

grammar G3



start symbol A1

productions A1 → true A2 → not A7 → and A1

A1 → false A3 → and A7 → and A1A7

A1 → not A1 A4 → or A8 → or A1

A1 → true A7 A5 → and A1 A8 → or A1A8


A1 → false A7

A1 → false A8

Let us compare the generation of the string

true and false and not true

using the original grammar Basic Boolean Expression and its Greibach Normal Form variantG3 . In the original grammar, we can derive this string by:

B ⇒ B and B

⇒ B and B and B





14.3. PARSING ALGORITHMS FOR CONTEXT-FREE GRAMMARS 585

In the Greibach Normal Form variant, we can derive the same string by:

A1 ⇒ true A7

⇒ true and A1A7

⇒ true and A1 and A1

⇒ true and A1 and not A1

⇒ true and false and not A1


In the transformed grammar, in any derivation step we replace a single non-terminal with aterminal string, or we replace a single non-terminal with a terminal string followed by a non-terminal. This is even more evident in Figure 14.14 which illustrates the derivation trees forthis string.

A1not

true

A1

A7trueand BB

B

true B Band and A7A1

false

Greibach Normal FormOriginal Grammar

false A1andB

true

not

Figure 14.14: Derivation trees for the string true and false and not true from original grammarand its GNF variant.

Notice that, in fact, there is more than one derivation tree possible for this string usingG3 . This is a result of the original grammar Basic Boolean Expressions being ambiguous. Infact, if a context-free grammar is unambiguous, then the algorithms to convert it to ChomskyNormal Form and to Greibach Normal Form will produce unambiguous grammars.

14.3 Parsing Algorithms for Context-Free Grammars

The purpose of a grammar G is to define a language L(G) by means of rules for rewritingstrings. Recall that the basic problem is this:

Definition (Recognition Problem) Let L(G) ⊆ T ∗ be a language defined by a grammar Gwith alphabet T . How can we decide, for any given word w ∈ T ∗, whether or not w ∈ L(G)?


An algorithm AG that decides membership for L(G) may be called a recogniser for G . Thedifficulty of deriving a recogniser will depend on the complexity of the production rules for thegrammar.

Does every grammar have a recogniser?If a grammar has a recogniser, then what is its complexity?

14.3.1 Grammars and Machines

The analysis of the recognition problem for the grammars of the Chomsky Hierarchy was animportant early achievement in the theory of grammars. The problem was solved by charac-terising the grammars and their languages in terms of simple machine models. The prototyperesult was the following:

Theorem A language L is definable by an unrestricted grammar if, and only if, it is semicom-putable by a Turing Machine.

This means that for any language L ⊆ T ∗, the following are equivalent:

(i) there is a type 0 grammar G with alphabet T and start symbol S such that

w ∈ L ⇔ S ⇒∗G w

and

(ii) there is a Turing Machine M with set T of symbols such that

w ∈ L ⇔ M halts on input string w , i.e., M (w) ↓.

The sets of strings over an alphabet recognised by the halting of a Turing Machine are called

semidecidable or computably enumerable

sets of strings. Such sets of strings are precisely the sets of strings that can be listed orenumerated by algorithms.

Corollary A formal language is definable by a grammar if, and only if, it can be enumeratedby an algorithm.

The theorem came naturally from the equivalence between Emil Post’s production systemsand Turing Machines discovered earlier in Computability Theory. A basic discovery of Com-putability Theory is that

there exist semicomputable (or semidecidable recursively enumerable) sets that arenot computable decidable (or recursive).

Corollary There exist grammars for which no recognition algorithm exists, i.e., the recognitionproblem is algorithmically undecidable.


The classification of the recognition problem for other grammars of the Chomsky Hierarchyled to some new machine models and algorithms. We studied the finite state automata inChapter 13 which characterise the regular languages. The context-free languages have anequivalent relationship with a set of machines called push-down state automata. These machinesuse a stack as a memory mechanism; the single states of finite state automata are inadequatefor context-free languages. Just as finite state automata implement a recognition algorithm forregular languages, push-down stack automata implement a recognition algorithm for context-free languages.

Unlike the regular languages though, non-deterministic push-down state automata are morepowerful than deterministic push-down state automata, and the conversion of a context-freegrammar produces, in general, a non-deterministic push-down state automata even when thereexists a deterministic variant. The conversion algorithm, therefore, has limited practical ap-plication unless the class of grammars considered is reduced. Thus, a study of push-downautomata leads to either practical issues concerning compilation, or to theoretical issues con-cerning computability theory.

Although the majority of programming languages are context-sensitive, their theoreticalfoundations are less pleasing and are much less developed. They can be recognised by a class ofmachines called linear bounded automata. In practice, programming languages are dealt withby considering that they are context-free languages which also have to satisfy extra constraints.In fact, techniques from both context-free and regular theories are plundered for dealing withprogramming languages.

We can measure, or estimate, the complexity of parsing with AG (and hence that of G) interms of resources, such as space and time needed to recognise, or fail to recognise, a word.This can be taken as a function f of word length

f : N→ N

The algorithm AG is of order f (n) if, to decide for any w ∈ T ∗ with |w | = n, whether or notw ∈ L(G), takes order f (n) steps; as usual we will write: AG is O(f (n)). The number of stepsin the computation AG(w) to decide if w ∈ L(G) is bounded by f (|w |).

The languages and grammars of the Chomsky Hierarchy can be characterised by the com-plexity of the recognition problem, as measured by machine models.

Theorem (Language Recognition) A language Li is definable by a grammar Gi of type i,for i = 0 , 1 , 2 and 3 if, and only if, Li is recognisable by a machine Mi of the type given inTable 14.1.

Language Type Li Machine model equivalent Mi Complexity of recognition of word w

0: Unrestricted Turing machine Undecidable

1: Context-sensitive Linear bounded automaton Decidable

2: Context-free Push down stack automaton Cubic time O(|w|3 )

3: Regular Finite state automaton Linear time O(|w|)

Table 14.1: Machine characterisations of languages.


This type of classification motivated the development of new grammars with practical recog-nition algorithms.

14.3.2 A Recognition Algorithm for Context-Free Grammars

Language recognition is central to any grammar, so there are standard techniques for devel-oping recognition algorithms for classes of grammars. As practical techniques for dealing withprogramming languages are based around context-free grammars, there exist many parsing al-gorithms for context-free grammars. These general-purpose algorithms make use of standardforms that context-free grammars can be manipulated into.

We shall consider the Cocke-Younger-Kasami (CYK) algorithm. This is a general parser forany Chomsky Normal Form grammar. For a word w of length n, the algorithm requires O(n3 )steps in the worst case. There exist more efficient algorithms, particularly when more restric-tions are placed upon the grammar G , but the CYK algorithm can be used on any context-freegrammar (as we can always express a context-free grammar in CNF), and is relatively simple.

Let G be a context-free grammar in Chomsky Normal Form. Let

w = t1 · · · ti · · · ti+j · · · tn ∈ L(G)

with |w| = n. Let wijj = ti · · · ti+j be the substring of w starting at the i th character andhaving length j . So for example, w1n = w. We consider the general problem of deciding forany i , j ∈ N and non-terminal A ∈ N , whether or not

A⇒∗ wij .

Clearly w ∈ L(G) if, and only if, S ⇒∗ w1n . We will then solve the more general problem byinduction on the length j of strings.

Base Case These are unit segments. If A ⇒ wi1 then there must be a production ruleA→ wi1 , which can be trivially decided by inspection of the set P .

Induction Step Here A ⇒∗ wij if, and only if, there is some production A → BC and somebreakpoint k with 1 ≤ k < j such that B ⇒∗ wik and C ⇒∗ w(i+k)(j−k). Sincethe size of the two new segments wik and w(i+k)(j−k) are less than j , being oflength k and j − k respectively, we can exploit the induction step (this gives usan inductive algorithm) to decide whether such derivations exist.

To formalise the algorithm, we define for any word w ∈ T ∗ and i , j ∈ N the set

Nij = A | A is a non-terminal such that A⇒∗ wij.

Notice that 1 ≤ i ≤ n−j +1 since starting a substring at i of length j means that j ≤ n−i +1 .Thus, w ∈ L(G) if, and only if, S ∈ N1n .

begin (* bottom up construction of N1n*)for i:=1 to n do

N(i, 1):=A | A→ a ∈ P and wi = a


od ;for j := 2 to n do

for i:=1 to n-j+1 do

N(i, j):=∅ ;for k := 1 to j - 1 do

N(i,j) := N(i,j) ∪ A | A→ BC ∈ P , B ∈ N(i, k) and C ∈ N(i+k, j-k) od

od

od

end

14.3.3 Efficient Parsing Algorithms

In situations where parsing efficiency is of concern, the cost of converting a context-free grammarinto even more tailored formats can be justified. The most commonly employed methods inpractice operate in linear time. The performance is gained by dealing with the current symbolof the string whilst using information from symbols further ahead; these are termed look-aheadsymbols.

There are two main classes of practical parsing algorithms, termed LL and LR. LL parsingreads a string from left to right, based on left-most derivations, LR parsing also reads a stringfrom left to right, but based on right-most derivations. In LL parsing, a parse tree is built fromthe top-down, whereas LR parsing builds a parse tree from the bottom-up.

If i ∈ N is the number of look-ahead characters that a parser uses, the method of parsing isreferred to as LL(i) or LR(i). Only a single look-ahead symbol is needed for the vast majorityof programming languages. Thus, most practical parsers are either LL(1 ) or LR(1 ).

There do not exist transformation algorithms to convert any context-free grammar into theappropriate form for these very efficient parsing algorithms, but conversions are usually madeby a combination of algorithms and heuristics.

A grammar in Greibach Normal Form is LL(1 ) if, for every non-terminal A, its productionrules are of the form

A→ t1A1

...

A→ tnAn

andti = tj implies that i = j .

In general, though, a GNF grammar does not have this form, and further work has to be doneto factorise common left-hand sides of production rules so that there never needs to be a choicemade about which rule to apply. Thus, the production rules

A→ αβ1

...

A→ αβn


are factorised into the production rules

A→ Bβ1

...

A→ Bβn

B → α.

In the original version, when looking at a string αβi , we need to be able to see this string inits entirety to know that it matches against the production rule A → αβi ; if we can only seesome of this string, there are n production rules that are potential matches. In the factorisedversion, we can make an immediate match against a single production rule by only looking atthe first part, α, of the string.

Performing such factorisations can introduce other problems, and although these can bedealt with, an algorithm to perform factorisations will not, in general, terminate.

14.4 The Pumping Lemma for Context-Free Languages

The study of derivations for a class of grammars is an important task for theoretical under-standing and practical applications.

The derivations of context-free grammars have several properties that enhance considerablythe usefulness of context-free grammars. One such property is the Pumping Lemma. Origi-nally discovered by Bar-Hillel et al. [1961], it is a fundamental theorem in the study of formallanguages.

14.4.1 The Pumping Lemma for Context-Free Languages

Theorem (Pumping Lemma for Context-Free Languages) Let G be a context-free gram-mar with alphabet T . Then there exists a number k = k(G) ∈ N that will depend on thegrammar G such that, if a string z ∈ L(G) and the length |z | > k then we can write

z = uvwxy

as the concatenation of strings u, v, w, x , y ∈ T ∗ and

( a) the length of the string |vx | ≥ 1 (i.e., v and x are not both empty strings);

(b) the length of the mid-portion string |vwx | ≤ k; and

( c) for all i ≥ 0 , the stringuv iwx iy ∈ L(G).

Before proving the result let us explore what it means and consider an application.The theorem says that for any context-free grammar G there is a number k that depends

on the grammar G such that for any string z that is longer than length k , i.e.,

|z | > k ,

14.4. THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES 591

it is possible to split up the string z into five segments u, v , w , x and y , i.e.,

z = uvwxy

with the property that either v or x is non-trivial, the middle segment vwx is smaller thanlength k , and, in particular, removing v and x , or copying v and x , gives us infinitely manystrings that the rules of the grammar will also accept, i.e.,

i = 0 uwyi = 1 z = uvwxyi = 2 uvvwxxyi = 3 uvvvwxxxy...

uv iwx iy...

are all in L(G).

Proof. We show that if a word z ∈ L(G) is long enough then z has a derivation of the form:

S ⇒∗ uAy ⇒∗ uvAxy ⇒∗ uvwxy

in which the non-terminal A can be repeated in the following manner

A⇒∗ vAx (1)

before it is eliminated

A⇒∗ w (2)

Having found derivations of the appropriate form it is possible to “pump” using G as follows

S ⇒∗ uAy ⇒∗ uvAxy by (1)

⇒∗ uv2Ax 2y by (1)

... i times

⇒∗ uv iAx iy by (1)

⇒∗ uv iwx iy by (2).

An observation is that

long words need long derivations.

Letl = max|α| | A→ α ∈ P+ 1 ,

i.e., the longest right hand side of any production in G . In any derivation of a word z ∈ L(G),the replacement of the non-terminal A by the string α, by applying A → α, can introduce|α| < l symbols.


Lemma If the height of the derivation tree for a word z is h then |z | ≤ l h . Conversely if|z | > lh then the height of its derivation tree is greater than h.

Let m = |N | be the number of non-terminals in the grammar. We choose k = lm+1 . Bythe above Lemma, the height of derivation tree Tr for the string z with |z | > k is greaterthan m + 1 . So at least one path p through Tr is longer than m + 1 and some non-terminalin this path must be repeated. Travel path p upwards from the leaf searching for repeatednon-terminals. Let A be the first such non-terminal encountered. We can picture Tr as shownin Figure 14.15:

wvu x y

A

A

S

last occurence of

p

height ≤ m

height > m + 1

length ≤ lm+1

length > lm+1

penultimate occurence ofrepeated non-terminal A on path p


Figure 14.15: Derivation tree Tr for the string z = uvwxy where the height of Tr is suchthat there must be at least one repetition of a non-terminal A. This is enforced through theconstant k of the Pumping Lemma. The value of k = lm+1 is calculated from the length l ofthe longest right-hand side of the production rules and the number m = |N | of non-terminalsin the grammar G .

Take w to be the terminal string of subtree whose root is the lower A, vwx to be the terminalstring of subtree whose root is the upper A, and u and y the remains of z . Clearly we can alsodeduce S ⇒∗ uv iwx iy for i ≥ 0 .

From our choice of A there can be no repeated non-terminals on the path p below the upperA so the height of the subtree of root upper A is less than m + 1 . From the above Lemma,|vwx | ≤ lm+1 = k . Finally we choose the shortest derivation sequence for z so we cannot have

S ⇒∗ uAy ⇒+ uAy ⇒∗ uwy = uvwxy .

Thus |vx | ≥ 1 .

14.4.2 Applications of the Pumping Lemma for Context-Free Lan-guages

An application of the Pumping Lemma is to demonstrate that certain languages are not context-free.

Theorem (anbncn is not context-free) The language

L = a ibici | i ≥ 0is not context-free.

14.4. THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES 593

Proof. Suppose L = L(G) for some context-free grammar G . Let k be the constant fromthe Pumping Lemma. We can choose n > k/3 and consider z = anbncn , as this satisfies|anbncn | = n + n + n > k . By the Pumping Lemma, we can rewrite z as

z = uvwxy ,

with v and x not both empty and such that for i ≥ 0 , uv iwx iy ∈ L.Consider v and x in z . Suppose v contained a break, i.e., two symbols from a, b, c. Then

uv2wx 2y ∈ L but in v 2 either b would precede a, or c would precede b, so v cannot containa break; similarly x cannot contain a break. Thus v and x must contain only 1 symbol each(otherwise from the Pumping Lemma the symbol order will be disturbed). However in thiscase the Pumping Lemma destroys the requirement that all three symbols be present in equalnumbers, since if v contains a and x contained c then pumping increases the number of a’sand c’s but not b’s. Hence, v and x must both be empty, contradicting the requirements, sothere is no context-free grammar for L.

The Pumping Lemma is not always used to give negative results regarding the non-existenceof context-free grammars for languages.

Theorem Let G be a context-free grammar. Let k = k(G) ∈ N be the constant of the PumpingLemma. Then

1. L(G) is non-empty if, and only if, there is a z ∈ L(G) with |z | < k.

2. L(G) is infinite if, and only if, there is a z ∈ L(G) with k ≤ |z | < 2k.

Proof. Consider Statement 1. Trivially, if z ∈ L(G) with |z | < k then L(G) 6= ∅. Conversely,suppose L(G) 6= ∅. Let z ∈ L(G) have minimal length so that for any other z ′ ∈ L(G),

|z | ≤ |z ′|;such shortest strings exist but are not, of course, unique. Now we claim that |z | < k .

Suppose for a contradiction that |z | ≥ k . Then, by the Pumping Lemma for context-freelanguages, we can write z = uvwxy and know that uv iwx iy ∈ L(G) for all i = 0 , 1 , . . .. Inparticular, uwy ∈ L(G) and |uwy | < |z | since we have removed both v and x , and at least oneof v or x is non-empty. This contradicts the choice of z as a word in L(G) of smallest possiblelength. Therefore, |z | < k .

Consider Statement 2. If z ∈ L(G) with k ≤ |z | < 2k then, by the Pumping Lemmafor context-free languages, we can write z = uvwxy and know that uv iwx iy ∈ L(G) for alli = 0 , 1 , . . .. Thus, L(G) is infinite.

Conversely, suppose L(G) is infinite. Then there must exist z ∈ L(G) such that |z | ≥ k . If|z | < 2k then we have proved Statement 2.

Suppose that no string z ∈ L(G) has k ≤ |z | ≤ 2k − 1. Let z ∈ L(G) have minimal lengthoutside this range so that at least |z | ≥ 2k . By the Pumping Lemma for context-free languages,we can write z = uvwxy with

1 ≤ |vx | ≤ k

since |vwx | ≤ k and, by pumping with i = 0 , deduce that

uwy ∈ L(G).

Now |vwx | < |z |. Thus, either |z | was not of shortest length ≥ 2k or k < |uwy | < 2k which isa contradiction in both cases.


14.5 Limitations of context-free grammars

In this and the following section, we will consider two programming features that cannot bedefined by any context-free grammar. In Exercises for Chapter 10, we saw that we can introduce

(i) variable declarations, and

(ii) concurrent assignments

into our while language by context-free grammars but that in each case the language possessedsome undesirable features. Suppose we wish to remove these features by adding the restrictionsthat:

(a) all identifiers in a program appear in the declaration; and

(b) variables are not duplicated in a concurrent assignment.

We will show that these restrictions cannot be imposed on a programming language, if wewant the language to remain definable by a context-free grammar. The theorem for variabledeclarations is known as Floyd’s Theorem.

We first met the problem of undesirable features in a language defined by a context-freegrammar when studying the interface definition language for signatures. The Sort DeclarationProperty also escapes context-free grammars. We see this in the assignment for this chapter.

Roughly speaking, these examples illustrate that context-free grammars cannot guaranteethat:

a list of syntactic items is “complete”

or that

a list of syntactic items contains “no repetitions”.

Context-free grammars define a set of strings on which to base the definition of a programminglanguage but cannot include even simple restrictions necessary to prepare for its intendedsemantics.

14.5.1 Variable Declarations

Consider the simple language for while programs over the natural numbers. We can extendWP to include the declaration of identifiers by altering the BNF for programs of Section 11.5.2:

bnf While Programs over Natural Numbers with Variable Declarations

import Statements , I /O ,Declarations

rules〈while program〉 ::= begin 〈declaration〉〈command list〉 end |

begin 〈command list〉 end

〈command list〉 ::= 〈command〉 | 〈command list〉 ; 〈command〉〈command〉 ::= 〈statement〉 | 〈i/o statement〉

14.5. LIMITATIONS OF CONTEXT-FREE GRAMMARS 595

And adding a BNF for variable declarations.

bnf Declarations

import Identifiers

rules〈declaration〉 ::= nat 〈identifier list〉 ;

〈identifier list〉 ::= 〈identifier〉 | 〈identifier〉 , 〈identifier list〉

Thus, the overall structure of while programs is shown in Figure 14.16.

Expressions

Statements

BooleanExpressions

Identifiers

Letter Number

I/ODeclarations

While Programs over Natural Numbers with V ariable Declarations

Figure 14.16: Architecture of while programs over the natural numbers with declarations.

And the flattened grammar that results is:


bnf Flattened While Programs over Natural Numbers with Variable Declarations

rules〈while program〉 ::= begin 〈declaration〉〈command list〉 end |

begin 〈command list〉 end

〈declaration〉 ::= nat 〈identifier list〉 ;〈identifier list〉 ::= 〈identifier〉 | 〈identifier〉 , 〈identifier list〉〈command list〉 ::= 〈command〉 | 〈command list〉 ; 〈command〉〈command〉 ::= 〈statement〉 | 〈read statement〉 | 〈write statement〉〈statement〉 ::= 〈assignment statement〉 | 〈conditional statement〉 |

〈iterative statement〉 | 〈null statement〉〈assignment statement〉 ::= 〈identifier〉 := 〈expression〉〈conditional statement〉 ::= if 〈comparison〉 then 〈command list〉

else 〈command list〉 fi

〈iterative statement〉 ::= while 〈comparison〉 do 〈command list〉 od

〈null statement〉 ::= skip

〈read statement〉 ::= read (〈identifier list〉 )〈write statement〉 ::= write (〈identifier list〉 )〈identifier list〉 ::= 〈identifier〉 | 〈identifier list〉 , 〈identifier〉〈comparison〉 ::= 〈Boolean expression〉 |

〈expression〉〈relational operator〉〈expression〉〈Boolean expression〉 ::= 〈Boolean term〉 |

〈Boolean expression〉 or 〈Boolean term〉〈Boolean term〉 ::= 〈Boolean factor〉 |

〈Boolean term〉 and 〈Boolean factor〉〈Boolean factor〉 ::= 〈Boolean atom〉 | (〈comparison〉)|

not 〈Boolean factor〉〈expression〉 ::= 〈term〉 | 〈expression〉〈adding operator〉〈term〉〈term〉 ::= 〈factor〉 | 〈term〉〈multiplying operator〉〈factor〉〈factor〉 ::= 〈atom〉 | (〈expression〉)〈adding operator〉 ::= + | -〈multiplying operator〉 ::= * | / | mod〈relational operator〉 ::= = | < | > | <= | >= | <>〈Boolean atom〉 ::= true | false〈atom〉 ::= 〈identifier〉 | 〈number〉〈identifier〉 ::= 〈letter〉 | 〈identifier〉〈letter〉 | 〈identifier〉〈digit〉〈number〉 ::= 〈digit〉 | 〈number〉〈digit〉〈letter〉 ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q |

r | s | t | u | v | w | x | y | z |A | B | C | D | E | F | G | H | I | J | K | L | M | N |O | P | Q | R | S | T | U | V | W | X | Y | Z |

〈digit〉 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

But now consider the condition on WP that:


Declaration Condition

In each program all the identifiers of the program are declared.

This property is not guaranteed by the given grammar. Can we define a context-freegrammar that does meet this requirement? Floyd’s Theorem tells us that this is impossible.

14.5.2 Floyd’s Theorem

Theorem (Floyd’s) Let

Decln,m = begin nat anbm ;anbm :=0 end

and letL ⊇ Decln,m | n,m ≥ 0 , n + m > 0

be a programming language containing the set of all such declarations.Suppose that

(1) in any program of L all the identifiers of the program are declared; and

(2) identifiers may be of any length.

Then L cannot be defined by a context-free grammar.

Proof. Suppose, by way of contradiction, that there did exist a context-free grammar G thatdefined the language L, i.e., L = L(G). Let k be the constant for G provided by the PumpingLemma. For n,m > k , we have Decln,m ∈ L and |Decln,m | > k . Thus, all the conditions forthe Pumping Lemma are satisfied, so we can pump on the string Decl n,m .

Hence, we can writeDecln,m = uvwxy

with

(a) v and x not both empty, i.e., |vx | ≥ 1 ;

(b) |vwx | < k ; and

(c) for any i ≥ 0 , Pi = uv iwx iy ∈ L(G).

In particular, each Pi is also a valid program of L.

Consider the structure of

P1 = Decln,m = begin nat anbm ;anbm :=0 end.

Since n,m > k and condition (b) must be satisfied, we deduce that one of three cases musthold.

First, suppose the symbol ; is not in vwx . Then there are two cases as ; is either in u or iny :

(i) if ; is in u, then vwx is contained in the segment anbm :=0 end, or


(ii) if ; is in y , then vwx is contained in the segment begin nat anbm .

Otherwise, suppose ; is in vwx . Then

(iii) vwx is contained in the segment bm ;an because |vwx | < k .

We consider these three cases in turn:

(i) Consider P0 = uwy . This is in L by the Pumping Lemma. If begin were in v or x ,then P0 would invalidate the property that begin and end match. If nat were in vor x , then P0 would invalidate the property that identifiers must be declared. Thus, vand x are substrings of anbm . However, since |vx | ≥ 1 the program P0 must containan assignment to an identifier which is not the identifier that was declared because thedeclared identifier has been made shorter than the identifier in the assignment. Thiscontradicts our assumption so this case cannot hold.

(ii) Consider P2 = uv2wx 2y . This is in L by the Pumping Lemma. If :=, 0 or end were in vor x , then P2 would not be syntactically correct for various reasons (exercise). However,if v and x are substrings of anbm , then the program P0 would involve an assignment inwhich the identifier had not been declared because the identifier in the assignment hasbeen made longer than the declared identifier. Again, therefore this case cannot hold.

(iii) Consider P0 = uwy and the location of ;. This cannot lie in v or x since then P0 wouldnot be syntactically correct. Thus either

(a) vwx is a substring of bm ;

(b) vwx is a substring of an ; or

(c) v is a substring of bm , w is a substring of bm ;an and x is a substring of an .

Since |vx | ≥ 1 , P0 would involve an undeclared identifier reducing the b’s of the declara-tion, or the a’s of the assignment, or both. Thus, this final case cannot hold.

14.5.3 Sort Declaration Property

In Sections 11.1 and 11.2, we presented an interface definition language for data types, namely

a language for signatures.

The desirable Sort Declaration Property is not definable by context-free grammars.

Theorem (Sort Declaration Property is not context-free) The set of signatures with theSort Declaration Property is not context-free.

Proof. The basic stages are as follows:

(i) Assume, for a contradiction, that there is a context-free grammar G that defines thelanguage

L = Σ ∈ Sig | Σ has the Sort Declaration Property .


(ii) Consider the simple single-sorted signature

Σn,m = signature sorts anbm ; operations c :→ anbm endsig ∈ L

where n,m ≥ 0 and n +m > 0 which contains only one constant c of sort anbm . Use thePumping Lemma for context-free languages to derive the contradiction that L(G) = L.

14.5.4 The Concurrent Assignment Construct

Consider the while language, whose syntax was given by the BNF in Section 11.5.2. We nowaugment this language with a deterministic parallel construct — the concurrent assignment —to form the concurrent while language.

The concurrent assignment allows us to perform multiple assignments in parallel. Its syntaxis given by:

〈concurrent assign〉 ::= 〈assignment statement〉 |〈identifier〉,〈concurrent assign〉,〈expression〉

The intention is that a concurrent assignment of the form

x1 ,, . . . ,xn :=e1 ,, . . . ,en

will assign the value of the expression e1 to x1 , . . . , en to xn in parallel.With this motivation in mind, it would seem natural to apply the restriction:

Definition (Distinctness Condition) All the variables on the left-hand side of a concurrentassignment must be distinct.

Clearly, this is just a syntactic restriction we are placing on our language, so we should beable to specify this condition by using a grammar. Without the restriction, our language iscontext-free, by inspection of the production rules above. However, we show that adding thisrestriction takes the language outside the class of context-free languages.

Theorem (Concurrent Assignments are not Context-Free) Let

ConcAssignn = begin a,a2 , . . . ,an :=0,0, . . . ,0 end

and letL ⊇ ConcAssignn | n ≥ 1

be a programming language containing the set of all such concurrent assignment programs.Suppose that

(1) all the variables on the left-hand side of the assignment are distinct; and

(2) there are the same number of expressions on the right-hand side of the assignment, asthere are variables on the left-hand side;

(3) and variables may be of any length.


Then L cannot be defined by a context-free grammar.

Proof. Assume for contradiction that L can be generated by a context-free grammar G . Let k bethe constant for G postulated by the Pumping Lemma. For n > k/2 , we have ConcAssignn ∈ Land |ConcAssignn | > k . Thus, all the conditions for the Pumping Lemma are satisfied, so wecan pump on ConcAssignn .

Hence, we can write ConcAssignn = uvwxy with

(a) |vx | ≥ 1 ;

(b) |vwx | < k ; and

(c) for any i ≥ 0 , Pi = uv iwx iy ∈ L(G).

From the syntax of L, we cannot have any of the symbols begin, := or end being presentin either of the strings v or x , as they could be removed by pumping. As we also require that|vx | ≥ 1 , one of the following conditions must hold, depending on whether := is in w (case(iii)), or not (cases (i) and (ii)):

(i) vwx is contained in the segment 0, . . . ,0; or

(ii) vwx is contained in the segment a,a2 , . . . ,an ; or

(iii) vwx is contained in the segment a,a2 , . . . ,an :=0, . . . ,0.

We consider each of these cases in turn.

(i) vwx is contained in the segment 0, . . . ,0. If a , appears in either v or x , then the programP0 = uwy would be syntactically incorrect, having more variables on the left-hand side,than values on the right-hand side. If v or x were just a 0 though, then the programP0 = uwy would also be syntactically incorrect, having two adjacent commas.

(ii) vwx is contained in the segment a,a2 , . . . ,an . Again, if a , appears in either v or x , thenthe program P0 = uwy would be syntactically incorrect, this time having fewer variableson the left-hand side, than values on the right-hand side. So v and x can contain nocommas.

Suppose v contains no commas. Then v is part, or all of an identifier, say a q .

v = aq If v is all of an identifier aq then the program P0 = uwy would be syntacticallyincorrect, having two adjacent commas.

v = ap Suppose that v is part of an identifier ap of aq , where there exists some r > 0such that p + r = q . Then consider the program P0 = uwy . It will contain anidentifier that has already appeared in the list, as it will be shorter than a q . (Theparticular identifier that is repeated in P0 need not be ar , as the string x could alsobe part of this same identifier.)

So v must be the empty string ε.

Similarly, the string x must also be the empty string ε. But this violates condition (a) ofthe Pumping Lemma which requires that v and x cannot both be empty.


(iii) vwx is contained in the segment a, . . . ,an :=0, . . . ,0. We first note that the := must becontained in w , otherwise it could be removed in program P0 . Hence, v must be containedin the segment a, . . . ,an , and x in the segment 0, . . . ,0.

From the reasoning of (ii), we can exclude the possibility that v contains no comma. Inaddition v cannot consists only of a comma, and cannot start and end with a commaor the program P2 = uv2wx 2y would be syntactically incorrect, having two adjacentcommas. Thus, v must contain at least one comma and part, or all, of at least oneidentifier.

If v has more than one comma then v contains at least all of one identifier, say a q . Butthen the program P2 would repeat the variable aq on the left-hand side of the assignment.

Therefore v contains only one comma, and part(s) of either 1 or 2 identifiers. Thus, thereare three possibilities for v :

v = ap , or v = ,ap or v = ap ,as

depending on whether v contains part of one or two variables.

v = ap , or v = ,ap . Consider that v contains one comma and part of a variable ap ofaq , where there exists some r ≥ 0 such that p + r = q . Then the program P2 wouldcontain a repeated variable on the left-hand side of the assignment, namely ap , asp ≤ q .

v = ap ,as . Consider that v contains parts of two identifiers which are separated by acomma. Let ap be part of an identifier aq and as part of an identifier a t where thereexists some m, r ≥ 0 such that p + r = q and s + m = t .

Now consider the program P3 = uv3wx 3y . It will have v repeated three times, sothe string v 3 will have the form:

ap ,asap ,asap ,as

Thus, P3 has the variable as+p repeated on the left-hand side of the assignment.

Therefore v cannot contain part of an identifier. Thus, v = ε.

As v = ε, pumping can only affect the right-hand side of the assignment. But we knowfrom (i) that we cannot pump only on some segment of 0, . . . ,0.

Hence, L is not context-free.



1. Consider the grammars

grammar Gcontext-free a2n

terminals a

nonterminals S

start symbol S

productions S → εS → aSa

and

grammar G regular a2n

terminals a

nonterminals S

start symbol S

productions S → εS → aAA → aS

Show that these grammars generate the same language L(G context-free a2n

) = L(G ′regular a2n

).

2. Give the derivation tree for the derivation of the string aaabbb from the grammar

grammar Ganbn

terminals a, b

nonterminals S

start symbol S


of Example 10.2.2(2). Convert this grammar first to Chomsky Normal Form, and then toGreibach Normal Form. Produce derivation trees for the same string using the Chomskyand Greibach Normal Form grammars.

3. Consider the grammar:


grammar Signatureε

import Names , SortList ,Constant ,Operation,Newline

terminals signature, imports, constants, operations, endsig

nonterminals Signature,Constants ,Operations ,

start symbol Signature

productions Signature → signature Name Newline Imports NewlineSorts NewlineConstants NewlineOperations Newline endsig

Imports → εImports → imports NameSorts → sorts SortListConstants → constants ConstantListConstantList → εConstantList → Constant ConstantListOperationList → εOperationList → Operation OperationList

First, transform this grammar into an equivalent grammar which has no null productions.Then transform it again to remove any unit productions.

4. Prove that for any context-free grammar G = (T ,N , S ,P), the terminal symbols occuronly at the leaves and nowhere else in the derivation tree of any string w ∈ (N ∪T )∗ thatcan be derived from G .

5. Prove that for any context-free grammar G , any string w ∈ L(G) of the language onlyhas terminal symbols at the leaves.

6. Give another derivation that will correspond to the derivation tree given in 14.1.1(v).

7. Produce an unambiguous form of the BNF DanglingElse so that any else branch of anested conditional matches against the outermost unmatched then branch. Draw thederivation tree for the statement:


8. Consider the following sets of characters:

• B = 0, 1,&,∨,⊃;• AE = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, (, ),+, -, /, *.

(a) Give four words that can be generated from each set.

(b) Give grammars:

(i) G1 to produce Boolean expressions using B as the terminal set; and

(ii) G2 to produce arithmetic expressions using AE as the terminal set.


(c) Give four examples of each of the following:

(i) strings that are in B ∗ but not in L(G1 ); and

(ii) strings that are in AE ∗ but not in L(G2 ).

(d) Give the productions of your grammar G1 in Greibach Normal Form, and the pro-ductions of G2 in Chomsky Normal Form.

(e) Does the Pumping Lemma apply to grammars G1 and G2 , and if so why?

9. Give derivation trees in G1 for the following:

(a) 11 & 01;

(b) 1 ⊃ 0 & 10; and

(c) 010 ∨ 1 & 111.

10. Give derivation trees in G2 for the following:

(a) 21*5;

(b) (3+4)*10; and

(c) (5-3)/(10+2).

11. Remove the null productions from the grammar for Expressions in BNF format:

bnf Expressions

import Identifiers

rules〈Expression〉 ::= 〈Identifier〉 | 〈Number〉 |


Using appropriate grammars from Section 11.5.2, flatten the grammar for Expressionsand remove its unit productions.

12. Complete the details of the proof of the Chomsky Normal Form Theorem.

13. The parity of a binary number is defined to be the sum modulo 2 of all its digits, forexample:

parity(11 ) = 0

parity(10 ) = 1

parity(111 ) = 1 .

Consider the following language where the bi range over the terminal set 0 , 1:L = pb1b2 · · · bn | p = parity(b1b2 · · · bn).


(a) Give a grammar that produces L.

(b) Is the grammar context-free?

(c) Use the Pumping Lemma to discover if there is a context-free grammar for L.

14. Show that the CYK algorithm is dominated by the inner loop and that its computationalcost is O(n3 ).

15. What assumptions about programming language constructs for manipulating sets areimplicit in the CYK algorithm, and the analysis of its complexity?

16. What is the complexity of the CYK algorithm in terms of the input size of the grammar?

17. Provide a proof for the lemma used in the proof of the Pumping Lemma for context-freelanguages.

18. Prove that the concurrent assignment construction generates all strings of the form

x1 , . . . ,xn :=e1 , . . . ,en

for variables x1 , . . . , xn and expressions e1 , . . . , en .

19. Extend the WP grammar of Section 11.5.2 to define the alternative construct:

x1 , . . . ,xm :=e1 , . . . ,en

where m need not be equal to n.

20. Consider the proof of the theorem for concurrent assignments are not context-free inSection . There are many places where we can offer alternative proofs of the individualsub-cases. For example, in case (i), when we consider v = ,, we said that this case couldnot hold, as the program P2 would then have two adjacent commas. We could equallywell have used the explanation that P0 would have two adjacent zeros, which would makeit syntactically incorrect. Find some more places where we could use different argumentsto prove that a sub-case could not hold. Give full explanations as to why your alternativesare equally valid.

21. Consider the proof of the theorem for concurrent assignments are not context-free inSection . Here, we set n > k/2. By the Pumping Lemma we also require that |vwx | < k ,and hence |vwx | < 2n. But the longest identifier an has length n. Thus, |an−1 ,an | =2n > k . What consequences result from this observation in the proof of the theorem?



As explained in the historical notes of the Introduction, the origins of formal language the-ory lie outside computer science, in theoretical linguistics and logic. In summary, the followingdescribes the history of our notions concerning formal languages. The simple mathemati-cal definitions are to be found in two publications by Noam Chomsky: Chomsky [1956] andChomsky [1959].

In 1959, John Backus introduced BNF notations Backus [1959] for the formal definition ofAlgol 60 with machine independence and compilation in mind.

Bar-Hillel et al. [1961] produced a study of context-free grammars and regular languages,including the Pumping Lemma.

Floyd [1962] contains Floyd’s Theorem and the definition of ambiguity. This was followedin Cantor [1962] where it was shown that ambiguity is undecidable.

Knuth [1965] defined the class of LR(k) grammars, unambiguity and linear time parsing.Cocke et al. [1966] gave a general-purpose algorithm for parsing context-free grammars with

O(n3 ) complexity. In 1975, Valiant [1975] improved on this result with an O(n2 .81) parsingalgorithm.

Chapter 15

Abstract Syntax and Algebras ofTerms ∗

The diverse set of examples of syntax we have met in earlier chapters have all been modelledas

strings of symbols.

The theory of grammars, which is based on the simple concept of a string, provides a theoryof syntax that combines theoretical depth, with ease of understanding, and practical breadth,with ease of application.

However, from the examples we have met, it is clear that syntax has more structure thanthat of a string, i.e., linear sequence of written symbols. In a simple program text, thereare plenty of syntactic components playing different roles. There are identifiers, constants,operations and tests, built up from alpha-numeric symbols. These are combined to form ex-pressions, assignments and control constructs. These syntactic components refer to the modelof computation, i.e., they denote semantic components.

Indeed, some syntax can be justly called two-dimensional and so have a kind of geometricstructure. The postal addresses are commonly displayed in two dimensions (recall that inChapter 10, we used the newline token to code that feature). More dramatically, standardmusical notation is two dimensional.

Syntax has hierarchical structure based on

(i) how it is built up;

(ii) how one checks its validity;

and, of course, on

(iii) what it is intended to mean.

The structure of syntax is only poorly represented in the simple linear notion of a string.Commonly, we associate derivation trees to strings to better organise and reveal importantstructural aspects of the derivation of syntax.

The particular strength of grammars is that they provide tools for specifying in completedetail the actual syntax that is to be used by programmers, the physical form of their program

∗This is an advanced chapter that requires knowledge of algebras and terms from Part I: Data.

607

608 CHAPTER 15. ABSTRACT SYNTAX AND ALGEBRAS OF TERMS ∗

texts represented as strings. Invariably, to a user a syntax is a set of strings of particularsymbols from a particular alphabet, though visual notations based on graphs are an exception.The form of syntax that we must specify for users is commonly called

concrete syntax.

So far, in Part II, we have only considered concrete syntax.There is a more abstract approach to syntax, which aims to analyse syntactic structure

based on (i)–(iii) above. It recognises the general computational ideas in the forms defined bythe syntactic rules, and how syntax is built up and broken down. It is especially concernedwith how to process syntax when giving it a meaning or semantics (iii). This form of syntax iscommonly called

abstract syntax.

Abstract syntax is independent of the particular symbols and notations used. It representsprograms abstractly and is easily translated into several different concrete syntaxes. Whileabstract syntax is simple and easy to process, concrete syntax is more messy and clumsy.

In this chapter we will focus on modelling syntax with a more a structured representation.We will complement the theory of syntax based on strings with a theory based on

terms.

A term is a composition of operation symbols applied to constants and variables. It allows bothsmall and large syntactic components to be combined in many ways — each way is given by anoperation. In contrast, the string based approach allows syntactic components to be combinedin essentially one way, namely some form of string concatenation. Terms have a natural treestructure.

Terms are general formulae. In Part I, we devoted Chapter 8 to the idea of a term and howit is transformed by functions defined by structural induction and evaluated in algebras. Termsare a fundamental data type for doing any kind of algebraic manipulation.

We will apply this abstract algebraic view of syntax to programming languages and modelabstract syntax using the data type of terms. We begin by explaining the relationship betweenabstract and concrete syntax in Section 15.1. In Section 15.2, as an instructive and representa-tive example, we develop an algebraic model of the abstract syntax of the while language. Theapproach is quite general and, in Section 15.5, we explain the general method. At the heartof the method is the use of term algebras to capture the abstract syntax of the language tobe specified. This corresponds with the use of the context-free grammar to lay down the rightsyntactic components. In fact, we show that

context-free grammars are equivalent with term algebras

by studying the Rus-Goguen-Thatcher algorithm that transforms a context-free grammar Ginto a signature ΣG which determines a term algebra T (ΣG) that can be used to define thesame language. This term algebra method is applied in case studies of programming languagesyntax definitions. In Section 15.6 we discuss the algebraic approach to languages that are notcontext-free.

This chapter is designated as advanced because of the algebraic ideas it employs. The ideaof terms is based on the idea of a signature. To fully appreciate abstract syntactic objects thereader must understand signatures and terms. Signatures are the subject of Chapter 4. Termsare the subject of Chapter 8. In both these chapters we keep a gentle pace in our expositionto assist readers new to these simple algebraic ideas.

15.1. WHAT IS ABSTRACT SYNTAX? 609

15.1 What is Abstract Syntax?

We have been studying the concrete syntax of programming languages and methods for makingdetailed specifications of how program texts should appear. We have a general theory ofgrammars that allows us to master what may be called the surface structure of languages.For example, recall the different forms or faces of while programs, as we saw in the gallery ofEuclid’s algorithms for gcd in Chapter 1. These programs are different in appearance but theyhave a common computational structure.

Now we will study abstract syntax and methods for making representations of texts thatmake explicit this common structure and are close to what the texts may mean computationally.We will examine something of the deep structure of languages.

To see what in concrete syntax may be ignored by a more abstract representation of aprogram, consider the strings generated by a context-free grammar. In Chapter 14, we definedderivation trees for strings based on derivations. A derivation tree had these features:

(i) each internal node was labelled with a non-terminal;

(ii) the children of each node were labelled with the symbols of the right-hand side of aproduction; and

(iii) each leaf node was labelled by a terminal of the string.

The derivation trees depict the structure of a string in terms of a derivation of how the stringis built up from its parts. But such trees soon become large because every terminal and non-terminal in the derivation has a node, and trees contain punctuation and other notationaldevices that are not relevant to computation.

Concrete syntax specifies not just the style of programming notations, it also prescribes andallows all kinds of conventions, such as

(i) shorthands for operator precedence (e.g., x+y .z );

(ii) layouts based on spacing;

(iii) redundancies (e.g., ((x+y))).

Abstract syntax aims at representations that are compact and express the component con-structs relevant to computation. In fact, abstract syntax was invented to simplify the definitionof the behaviour of programs, i.e., to give semantics to programs. An abstract syntax aims ata representation that complements many concrete syntaxes: see Figure 15.1

Definition The process of extracting abstract syntax from concrete syntax is called parsing.The process of evaluating or mapping abstract syntax to concrete syntax is called unparsingor, (pretty)-printing.

Given these observations, let us consider the requirements we have in mind for an abstractsyntax AS and one of its concrete syntaxes CS as shown in Figure 15.2.

There are two transformations implementing the mappings

parse : CS → AS and print : AS → CS .

In general, neither map need be an inverse of the other.


parse i

unparse i

...

...

Syntax ASAbstract

ConcreteSyntax CS 1

ConcreteSyntax CS i

ConcreteSyntax CS n

Figure 15.1: Parsing abstract syntax and unparsing concrete syntax.

CSA

CS AS Semantics

ASC

parse

print

Figure 15.2: Possible relationships between abstract syntax, concrete syntax and semantics

For example, different concrete syntaxes may have the same abstract syntax: for somec1, c2 ∈ CS ,

c1 6= c2 but parse(c1) = parse(c2).

Perhaps, c1 and c2 differed in their layout. Thus, parse is not injective.Similarly, there may be abstract syntax that is used for some concrete syntaxes but not the

particular CS : in this case of CS , for some a ∈ AS , and any c ∈ CS ,

parse(c) 6= a.

Thus, parse is not surjective.Let AS 0 be precisely the abstract syntax determined by parsing CS , i.e.,

AS 0 = parse(CS ).

So parse : CS → AS 0 is surjective.Let CS 0 be precisely the abstract syntax determined by printing AS , i.e.,

CS 0 = print(AS ).

So print : AS → CS 0 is surjective.Then, we may hope that the transformations may be related as follows: for c ∈ CS 0,

print(parse(c)) = c

15.1. WHAT IS ABSTRACT SYNTAX? 611

and for a ∈ AS 0,parse(print(a)) = a.

Designing an abstract syntax is an experimental process. We can give some beautiful theoryto guide this process.

Example

Thus, as suggested by this example, we can expect to associate with abstract syntax so calledabstract syntax trees.

assign

seq

seq

y id

x assign

r

id id

mod

assignid

y

r

x y

Figure 15.3: Abstract syntax tree for the program x :=y; y :=r; r :=x mod y.

Our strategy for developing abstract syntax is to represent programs as terms. The idea isto use the data type of terms and apply it make a theory of abstract syntax. In Chapter 8, weintroduced and studied at length

• algebras of terms;

• equations and laws that terms might satisfy;

• functions defined by structural induction on terms;

• trees to display terms;

• term evaluation; and

• how to represent any algebra using terms.

This general mathematical machinery can now be applied to

• make algebras of abstract syntax;

• propose equations and laws that programs in abstract syntax might satisfy;

• define functions by structural induction on abstract syntax;

• draw trees to display programs;


• unparse or print abstract syntax as concrete syntax; and

• provide an abstract syntax for any programming language

respectively!

15.2 An Algebraic Abstract Syntax for while Programs

We shall now introduce a more abstract method for specifying programming language syntaxthat focuses on the components in syntactical clauses that have a semantical meaning. Thisnew method is based on operations that construct new syntax from given syntax, includingoperations that make new programs from old programs. Thus, the use of operations meansthat syntax is modelled using algebras rather than grammars.

Specifically, let us reconsider the while programming language WP from a more abstractpoint of view and develop an algebraic model of its syntax. In each of its many concretesyntaxes, WP can be displayed as a finite collection of sets of syntactic components. Sec-tion 11.5.2 gives a concrete syntax for the while programming language, taking a low-levelapproach to syntax, using grammars to specify which texts are valid programs. The grammarWhile Programs over Natural Numbers of Section 11.5.2 contains 27 syntactic objects, not allof which have a meaning. On reflection, it is easy to see that the language is based primarilyon three types of syntactic object:

• commands;

• expressions; and

• Boolean expressions.

These are the key syntactic objects that have a meaning or semantics.To explore these ideas further we will design an abstract syntax for a simplified version of

the language WP by developing an algebra WP(Σ ) based on natural numbers.

15.2.1 Algebraic Operations for Constructing while programs

For simplicity, suppose our while language has only two data types, the natural numbers andBooleans. Let Σ be any signature of natural numbers with the Booleans. What operationsand tests Σ contains we will discuss later. Let WP(Σ ) be any syntax for while programscomputing with a fixed data type interface Σ . We shall determine the structure of the languageby constructing an algebra WP(Σ ) of while programs, by collecting together sets of syntacticcomponents and defining operations on them that build up the while programs.

What are the primary syntactic categories that make up WP(Σ )?

The answer is: the statements or commands, expressions, and Boolean expressions or tests.Let

Cmd(Σ ) Exp(Σ ) BExp(Σ )

be sets of commands, expressions and Boolean expressions, respectively, of the arbitrarily chosenwhile language syntax WP(Σ ).

15.2. AN ALGEBRAIC ABSTRACT SYNTAX FOR WHILE PROGRAMS 613

We start by assuming these are simply non-empty sets. To build all the familiar syntacticforms of the language, we define suitable functions over these carriers. Ultimately, the assump-tions we place on the set of commands, expressions and Boolean expressions are that they areclosed under the operations. These sets, together with the sets N and Var of natural numbersand variables, respectively, will be the carriers of the algebra WP(Σ ) of the language of while

programs over Σ .First, we note that a while program consists of the following kinds of commands:

• the atomic commands which consist of

– empty commands; and

– assignments;

and are combined together with

• the command forming constructs of

– sequencing ;

– conditional branching ; and

– iteration.

These suggest the following constants and operations on the set Cmd(Σ ) of commands.

Atomic Commands

The “identity” or “empty” command is a constant

Null :→ Cmd(Σ ).

The assignments are constructed by an operation

Assign : Var × Exp(Σ )→ Cmd(Σ )

that from variables v ∈ Var and expressions e ∈ Exp(Σ ) makes an assignment command:

Assign(v , e) ∈ Cmd(Σ )

Command Operations

We will model the command-forming operations to build new commands from old.For sequencing, we define an operation

Seq : Cmd(Σ )× Cmd(Σ )→ Cmd(Σ )

that will combine any two given commands S1 , S2 ∈ Cmd(Σ ) into a new command

Seq(S1 , S2 ) ∈ Cmd(Σ ).

For conditionals, we define an operation

Cond : BExp(Σ )× Cmd(Σ )× Cmd(Σ )→ Cmd(Σ )


such that for any test b ∈ BExp(Σ ), and given commands S1 , S2 ∈ Cmd(Σ ), it will create thenew command

Cond(b, S1 , S2 ) ∈ Cmd(Σ ).

And for iteration, we define an operation

Iter : BExp(Σ )× Cmd(Σ )→ Cmd(Σ )

such that for any test b ∈ BExp(Σ ), and given command S ∈ Cmd(Σ ), it will create the newcommand

Iter(b, S ) ∈ Cmd(Σ ).

Building Expressions

To proceed further, and reveal the structure of expressions and Boolean expressions, we needto make assumptions about which arithmetic operations and tests are in Σ . Suppose we have

addition, multiplication, . . .

on the natural numbers.An atomic expression is a number or an identifier. More complex expressions can be built

up by applying the operations to other expressions. Thus, the set Exp(Σ ) of expressions isgiven by the following operations.

First, we have a means of forming atomic expressions. An operation

Id : Var → Exp(Σ )

such thatId(v) = v

ensures that each variable is also an expression.Similarly, an operation

Num : N→ Exp(Σ )

such thatNum(n) = n

ensures that each natural number can be named in expressions.Then, we have a means of forming new expressions from old expressions. Given expressions

e1 , e2 ∈ Exp(Σ ), we have operations such as

Add : Exp(Σ )× Exp(Σ )→ Exp(Σ )

that creates an expressionAdd(e1 , e2 ) ∈ Exp(Σ )

andTimes : Exp(Σ )× Exp(Σ )→ Exp(Σ )

that creates an expressionTimes(e1 , e2 ) ∈ Exp(Σ )

and similarly for whatever else is in Σ .


Building Tests

Finally, we need operations on the set BExp(Σ ) of tests. Suppose a Boolean expression can be

• true or false;

• a comparison of two numerical expressions; or

• some Boolean combination of a less complex Boolean expression.

We model the atomic tests with constants

true : → BExp(Σ )false : → BExp(Σ )

We define the operation

Eq : Exp(Σ )× Exp(Σ )→ BExp(Σ )

to create a test from comparing expressions e1 , e2 ∈ Exp(Σ ) by

Eq(e1 , e2 ) ∈ BExp(Σ )

We create new tests from old with test-forming operations

Not : BExp(Σ )→ BExp(Σ )And : BExp(Σ )× BExp(Σ )→ BExp(Σ )Or : BExp(Σ )× BExp(Σ )→ BExp(Σ )

which we define for any Boolean expressions b, b1 , b2 ∈ BExp(Σ ) by:

Not(b) ∈ BExp(Σ )

And(b1 , b2 ) ∈ BExp(Σ )

Or(b1 , b2 ) ∈ BExp(Σ )

respectively.Thus, given any syntax WP(Σ ) for the while commands on the natural numbers with

operations listed in Σ , collecting these five sets

Cmd(Σ ), Exp(Σ ), BExp(Σ ), N, Var

together and adding the above constants and functions as operations, we have turned WP(Σ )into a five-sorted algebraic structure WP(Σ ). Specifically,

Any syntax WP(Σ ) for the while language can be organised as an algebraic structureWP(Σ ) whose operations build these syntactic components


15.2.2 A Signature for Algebras of while Commands

The sets Cmd(Σ ), Exp(Σ ), BExp(Σ ) were any sets of notations for commands, expressions andtests. Therefore, we have just shown how any syntax for the while language over the naturalnumbers can be organised into an algebra WP(Σ ). Furthermore, the choice of commandoperations is independent of the concrete nature of the syntax, though it is influenced by thelanguage’s choice of kinds of constructs and data type interface. To define this properly, weneed a signature.

We can choose one signature to fix standard names for these operations. Therefore, wecan choose one abstract algebraic syntax to describe the structure of a whole class of algebrasof different concrete string syntaxes. Let us equip these algebras with the following signatureΣWP .

signature WP

import nat var

sorts cmd exp bexp

constants skip : → cmdtrue, false : → bexp

operations assign : var × exp → cmdseq : cmd × cmd → cmdcond : bexp × cmd × com → cmditer : bexp × cmd → cmdid : var → expnum : nat → expadd : exp × exp → exptimes : exp × exp → expnot : bexp → bexpor : bexp × bexp → bexpand : bexp × bexp → bexpeq : exp × exp → bexp

Thus, we say that all algebras of the form WP(Σ ) are ΣWP algebras.

Example

Here is an illustrative example of a while syntax defined as one such algebra.


algebra WP(Σ )

import N V ar

carriers Cmd(Σ ) Exp(Σ ) BExp(Σ )

constants skip : → Cmd(Σ )True,False : → BExp(Σ )

operations Assign : Var × Exp(Σ )→ Cmd(Σ )Seq : Cmd(Σ )× Cmd(Σ )→ Cmd(Σ )Cond : BExp(Σ )× Cmd(Σ )× Cmd(Σ )→ Cmd(Σ )Iter : BExp(Σ )× Cmd(Σ )→ Cmd(Σ )Id : Var → Exp(Σ )Num : N→ Exp(Σ )Add : Exp(Σ )× Exp(Σ )→ Exp(Σ )Times : Exp(Σ )× Exp(Σ )→ Exp(Σ )Not : BExp(Σ )→ BExp(Σ )Or : BExp(Σ )× BExp(Σ )→ BExp(Σ )And : BExp(Σ )× BExp(Σ )→ BExp(Σ )Eq : Exp(Σ )× Exp(Σ )→ BExp(Σ )

definitionsAssign(v , e) = v :=eSeq(S1 , S2 ) = S1;S2

Cond(b, S1 , S2 ) = if b then S1 else S2 fi

Iter(b, S ) = while b do S od

Id(v) = vNum(n) = n

Add(e1 , e2 ) = e1+e2

Mult(e1 , e2 ) = e1*e2

Not(b) = not bOr(b1 , b2 ) = b1 or b2

And(b1 , b2 ) = b1 and b2

Eq(e1 , e2 ) = e1=e2

Figure 15.4: The algebra WP(Σ ) of while commands and summary of definitions of operations.


The connection between the signature ΣWP and the specific algebra WP(Σ ) is as follows.We interpret the sorts

comm, exp and bexp

of the signature ΣWP as the sets of

commands Comm(Σ ), expressions Exp(Σ ), and Boolean expressions BExp(Σ )

in the algebra WP(Σ ). Furthermore, each of these carriers is a set of strings. The operationsof the algebra define how these strings are constructed. For example, the operation assign inthe signature is interpreted as the function Assign in the algebra that constructs an assignmentcommand from a variable v and an expression e by placing the string :=between the v and thee.

15.3 Representing while commands as terms

In Section 15.2, we showed how the idea of forming commands, expressions and Boolean ex-pressions by applying “high-level” operations was modelled by an algebra of concrete syntax.In particular, we showed how one choice of such operations, whose notations were fixed inthe signature ΣWP , could service an infinite class of (algebras of) concrete syntaxes for while

commands.To construct a command in one of these algebras we apply a finite sequence of operations,

named in ΣWP . Now the composition of operation symbols in ΣWP define the set

T (ΣWP)

of ΣWP closed terms (see the definition in Section ??). T (ΣWP) is, in fact, a ΣWP -algebraitself (see Section 8.6). Now, with each choice of ΣWP term

t ∈ T (ΣWP),

we have an abstract and general description of a construction process for a command that canproduce the text of a command in any chosen concrete syntax. Let us look at some examples.

15.3.1 Examples of Terms and Trees

Example Let x1 , x2 , x3 ∈ T (ΣWP)Var be closed term representations for the command vari-ables. Then typical examples of closed terms are

assign(x1 , add(id(x1 ), num(1 )))assign(x1 , add(id(x2 ), id(x3 )))

seq(assign(x3 , x1 ), seq(assign(x1 , x2 ), assign(x2 , x3 )))cond(true, assign(x1 , x2 )assign(x2 , x1 ))

iter(not(eq(id(x1 ), num(0 ))), assign(x1 , add(id(x1 ), num(1 )))).

These terms correspond with the while commands:

x1 :=x1+1x1 :=x2+x3

x3 :=x1 ;x1 :=x2 ;x2 :=x3

if true then x1 :=x2 else x2 :=x1 fi

while not x1=0 do x1 :=x1+1 od

15.3. REPRESENTING WHILE COMMANDS AS TERMS 619

Example As the algebra T (ΣWP) is a term algebra, we can represent command fragments asterms, which in turn we can represent as trees. For example:

1. We can represent the command fragment

x2 :=3;skip

by the termseq(assign(x2 , num(3 )), skip)

whose tree is shown in Figure 15.5.

assign

num

skip

3

seq

x2

Figure 15.5: The tree representation of the term x2 :=3; skip.

2. We can represent the command fragment

while not x0 =2 do x0 :=x0 + 1 od

by the term

iter(not(eq(id(x0 ), num(2 ))), assign(x0 , add(id(x0 ), num(1 ))))

whose tree is shown in Figure 15.6.

iter

not assign

add

id num id num

2 1x0 x0

x0eq

Figure 15.6: The tree representation of the term while not x0 = 2 do x0 :=x0 + 1 od.


15.3.2 Abstract Syntax as Terms

We can see a natural correspondence in the examples between representations:

while command ⇔ term over ΣWP ⇔ tree over ΣWP

More abstractly, we have arrived at the general correspondence for programming languages:

abstract command text ⇔ abstract term ⇔ tree

Definition (Abstract Syntax) An abstract syntax of the while language is simply the termalgebra

T (ΣWP).

A term t ∈ T (ΣWP)cmd is an abstract algebraic representation of a while command. The treeTr(t) of the term t is an abstract syntax tree.

What other general algebraic properties of terms can become useful tools for processingabstract syntax?

The fundamental idea that an abstract syntax serves many concrete syntax is illustrated inFigure 15.1. Now that we have created the algebra T (ΣWP) to be an abstract syntax we candefine the relation between abstract and concrete syntax as a relation between algebras.

Definition (Pretty-Printing) Consider mapping abstract syntax to concrete syntax, Supposethe concrete syntax can be turned into an algebra WP(Σ ). Then the mapping

T (ΣWP)→WP(Σ )

is called pretty-printing or unparsing.

If this mapping can be defined by structural induction then, equivalently, it qualifies as ahomomorphism and term evaluation map.

The case of serving n different concrete syntaxes is illustrated in Figure 15.7.

...

...

T (ΣWP )

WP1 (Σ )

WPn(Σ )

WPi(Σ )

Figure 15.7: Pretty-printing from abstract syntax to concrete syntax via term evaluation ho-momorphisms.

This general correspondence can be illustrated by pretty-printing the terms. We define afamily

φ = 〈φs : T (ΣWP)s →WP(Σ )s | s ∈ cmd , exp, bexp, var , nat〉

15.3. REPRESENTING WHILE COMMANDS AS TERMS 621

of maps that interpret terms as while commands. In fact, φ is defined by structural inductionon terms. For commands, we define φcmd by:

φcmd(skip) = skip

φcmd(assign(x , e)) = φvar (x ):=φexp(e)

φcmd(semi(S1 , S2 )) = φcmd(S1 );φcmd(S2 )

φcmd(cond(b, S1 , S2 )) = if φbexp(b) then φcmd(S1 ) else φcmd(S2 ) fi

φcmd(iter(b, S )) = while φbexp(b) do φcmd(S ) od

For expressions, we define φexp by:

φexp(id(x )) = φvar (x )

φexp(num(n)) = φnat(n)

φexp(add(x , y)) = φexp(x )+φexp(y)

φexp(times(x , y)) = φexp(x )*φexp(y)

and for Boolean expressions, we define φbexp by:

φbexp(not(b)) = not φbexp(b)

φbexp(or(b1 , b2 )) = φbexp(b1 ) or φbexp(b2 )

φbexp(and(b1 , b2 )) = φbexp(b1 ) and φbexp(b2 )

φbexp(eq(e1 , e2 )) = φexp(e1 )=φexp(e2 )

We will also need maps φvar and φnat for variables and natural numbers.

Example

φcmd(assign(x1 , add(id(x2 ), id(x3 )))) = φvar (x1 ):=φvar (x2 )+φvar (x3 ).

Lemma (Tree map properties) The pretty-printing map

φ : T (ΣWP)→WP(Σ )

is a ΣWP -homomorphism.

Proof The map φ is an example of a term evaluation map φ = v defined from a trivialassignment map

v : ∅ →WP(Σ );

and, hence by the Theorem in Section 8.7.1, a ΣWP -homomorphism. 2

Notice that if each syntactic component can be constructed in Cmd(Σ ), Exp(Σ ) andBExp(Σ ) by applying the basic operations of the algebra, then the algebra of while commandsis minimal (see Section 7.4). Then φ is onto.

If the printing homomorphism φ has an inverse φ−1, it is the parsing homomorphism,

φ−1 : WP(Σ )→ T (ΣWP)

that extracts the abstract representation of the structure.


Theorem The algebra WP(Σ ) of concrete syntax for while commands over the signature Σ ofnaturals and Booleans is isomorphic with the closed term algebra of T (Σ WP) of abstract syntaxfor the language.

Proof. Exercise.

15.4 Algebra and Grammar for while Programs

We will now investigate the relationship between term algebras and grammars. We take thealgebra that we used in Section 15.2 to model while commands and construct a context-freegrammar for this language. We deliberately construct the grammar to have the same structureas the algebraic model. Later, in Section 15.5, we will show how we can systematically relatethe context-free grammar and term algebras.

15.4.1 Algebra and Grammar for while Program Commands

First, we consider while command statements.

Null Commands

When modelling the null commandskip

we need to ensure that the model

(i) gives a command

that is

(ii) independent of anything else.

In the algebra of Section 15.2, we modelled the null command with the constant

skip :→ Cmd(Σ ).

We modelled it to be

(i) of type Cmd(Σ ) so that it represents a command; and

(ii) a constant so that it is independent of any other while command element.

Alternatively, we can generate the null command using the grammar rule (in BNF format):

〈command〉::=skip

This gives a string which is

(i) a command because it is generated from the non-terminal 〈command〉; and

(ii) independent of any other while command element, as we cannot rewrite the right handside of the rule any further because it has no nonterminal symbols.

15.4. ALGEBRA AND GRAMMAR FOR WHILE PROGRAMS 623

Assignment Commands

When modelling assignment commands,

x :=e


(i) gives a command

that is

(ii) dependent on precisely one variable and one expression.

In the algebra of Section 15.2, we modelled assignment commands with the operation

Assign : Var × Exp(Σ )→ Cmd(Σ )

defined by:Assign(v , e) = v :=e



(ii) an operation that takes one argument of type Var and two arguments each of type Exp(Σ ),so that it is dependent on precisely one variable and one expression.

Alternatively, we can generate conditional commands using the grammar rule (in BNFformat):

〈command〉::=〈var〉:=〈exp〉This gives a string which is


(ii) dependent on precisely one variable and one expression which are generated from thenon-terminal 〈var〉 and the non-terminal 〈exp〉 on the right hand side of the rule.

Sequenced Commands

When modelling sequenced commands,S1 ;S2


(i) gives a command

that is

(ii) dependent on precisely two other commands.


In the algebra of Section 15.2, we modelled sequenced commands with the operation

Seq : Cmd(Σ )× Cmd(Σ )→ Cmd(Σ )

defined by:Seq(S1 , S2 ) = S1 ;S2



(ii) an operation that takes two arguments each of type Cmd(Σ ), so that it is dependent onprecisely two other commands.

Alternatively, we can generate sequenced commands using the grammar rule (in BNF for-mat):

〈command〉::=〈command〉;〈command〉This gives a string which is


(ii) dependent on precisely two other commands which are generated from the two instancesof the non-terminal 〈command〉 on the right hand side of the rule.

Conditional Commands

When modelling conditional commands,

if b then S1 else S2 fi


(i) gives a command

that is

(ii) dependent on precisely one test and two other commands.

In the algebra of Section 15.2, we modelled conditional commands with the operation

Cond : BExp(Σ )× Cmd(Σ )× Cmd(Σ )→ Cmd(Σ )

defined by:Cond(b, S1 , S2 ) = if b then S1 else S2 fi



(ii) an operation that takes one argument of type BExp(Σ ) and two arguments each of typeCmd(Σ ), so that it is dependent on precisely one test and two other commands.


Alternatively, we can generate conditional commands using the grammar rule (in BNFformat):

〈command〉::=if 〈Bool exp〉 then 〈command〉 else 〈command〉 fi



(ii) dependent on precisely one test and two other commands which are generated from thenon-terminal 〈Bool exp〉 and the two instances of the non-terminal 〈command〉 on theright hand side of the rule.

Iterative Commands

When modelling iterative commands,

while b do S od


(i) gives a command

that is

(ii) dependent on precisely one test and one other command.

In the algebra of Section 15.2, we modelled iterative commands with the operation

Iter : BExp(Σ )× Cmd(Σ )→ Cmd(Σ )

defined by:Iter(b, S ) = while b do S od



(ii) an operation that takes one argument of type BExp(Σ ) and one argument of typeCmd(Σ ), so that it is dependent on precisely one test and one other command.

Alternatively, we can generate iterative commands using the grammar rule (in BNF format):

〈command〉::=while〈Bool exp〉do〈command〉od



(ii) dependent on precisely one test and one other command which are generated from thenon-terminal 〈Bool exp〉 and the non-terminal 〈command〉 on the right hand side of therule.


Commands

Thus, when we can model the set of commands with the algebra given in Section 15.2, we have:

structure

depends on defines command

skip :Assign :Seq :Cond :Iter :

︷︸︸︷

Var × Exp(Σ )Cmd(Σ )× Cmd(Σ )BExp(Σ )× Cmd(Σ )× Cmd(Σ )BExp(Σ )× Cmd(Σ )

︷︸︸︷

→ Cmd(Σ )→ Cmd(Σ )→ Cmd(Σ )→ Cmd(Σ )→ Cmd(Σ )

Form

create string by

Assign(v , e)Seq(S1 , S2 )Cond(b, S1 , S2 )Iter(b, S )

====

︷︸︸︷

v :=eS1 ;S2


while b do S od

Alternatively, we can define the set of commands with the grammar defined by the BNFrules:

structureand

form

defines command depends on and creates string︷︸︸︷

〈command〉〈command〉〈command〉〈command〉〈command〉

::=::=::=::=::=

︷︸︸︷

skip

〈identifier〉:= 〈expression〉〈command〉;〈command〉if〈Bool exp〉then〈command〉else〈command〉fiwhile〈Bool exp〉do〈command〉od

15.4.2 Algebra and Grammar for while Program Expressions

Next, we consider while command expressions.

Variables as Expressions

When modelling variables as expressionsv


(i) gives an expression

that is

(ii) dependent on precisely one variable.

In the algebra of Section 15.2, we modelled variable expressions with the operation

Id : Var → Exp(Σ )

defined by:Id(v) = v



(i) of type Exp(Σ ) so that it represents an expression; and

(ii) an operation that takes one argument of type Var , so that it is dependent on preciselyone variable.

Alternatively, we can generate variables as expressions using the grammar rule (in BNFformat):

〈expression〉::=〈identifier〉This gives a string which is

(i) an expression because it is generated from the non-terminal 〈expression〉; and

(ii) dependent on precisely one variable which is generated from the non-terminal 〈var〉 onthe right hand side of the rule.

Variable Expressions

When modelling variables as expressionsv



that is

(ii) dependent on precisely one variable.

In the algebra of Section 15.2, we modelled natural number expressions with the operation

Num : N→ Exp(Σ )

defined by:Num(n) = n



(ii) an operation that takes one argument of type N, so that it is dependent on precisely onenatural number.

Alternatively, we can generate natural numbers as expressions using the grammar rule (inBNF format):

〈expression〉::=〈number〉This gives a string which is


(ii) dependent on precisely one natural number which is generated from the non-terminal〈number〉 on the right hand side of the rule.


Added Expressions

When modelling added expressionse1+e2



that is

(ii) dependent on precisely two expressions.

In the algebra of Section 15.2, we modelled added expressions with the operation

Add : Exp(Σ )× Exp(Σ )→ Exp(Σ )

defined by:Add(e1 , e2 ) = e1+e2



(ii) an operation that takes two arguments, each instance of type Exp(Σ ), so that it is de-pendent on precisely two expressions.

Alternatively, we can generate added expressions using the grammar rule (in BNF format):

〈expression〉::=〈expression〉+〈expression〉



(ii) dependent on precisely two expressions which are each generated from the non-terminal〈expression〉 on the right hand side of the rule.

Multiplied Expressions

When modelling multiplied expressionse1*e2



that is

(ii) dependent on precisely two expressions.


In the algebra of Section 15.2, we modelled multiplied expressions with the operation

Times : Exp(Σ )× Exp(Σ )→ Exp(Σ )

defined by:Times(e1 , e2 ) = e1*e2



(ii) an operation that takes two arguments, each instance of type Exp(Σ ), so that it is de-pendent on precisely two expressions.

Alternatively, we can generate multiplied expressions using the grammar rule (in BNFformat):

〈expression〉::=〈expression〉*〈expression〉This gives a string which is


(ii) dependent on precisely two expressions which are each generated from the non-terminal〈expression〉 on the right hand side of the rule.

Expressions

Thus, when we can model the set of expressions with the algebra given in Section 15.2, wehave:

structure

depends on defines expression

Id :Num :Add :Times :

︷︸︸︷

VarN

Exp(Σ )× Exp(Σ )Exp(Σ )× Exp(Σ )

︷︸︸︷

→ Exp(Σ )→ Exp(Σ )→ Exp(Σ )→ Exp(Σ )

Form

create string by

Var(v)Num(n)Add(e1 , e2 )Mult(e1 , e2 )

====

︷︸︸︷

vne1+e2

e1*e2


structureand

form

defines expression depends on and creates string︷︸︸︷

〈expression〉〈expression〉〈expression〉〈expression〉

::=::=::=::=

︷︸︸︷

〈identifier〉〈number〉〈expression〉+〈expression〉〈expression〉*〈expression〉


15.4.3 Algebra and Grammar for while Program Tests

Next, we consider while command tests.

Truth Values

We can model the set of Boolean expressions in an analogous manner for expressions:

structure

depends on defines test

true :false :Eq :Not :And :Or :

︷︸︸︷

Exp(Σ )× Exp(Σ )BExp(Σ )BExp(Σ )× BExp(Σ )BExp(Σ )× BExp(Σ )

︷︸︸︷

→ BExp(Σ )→ BExp(Σ )→ BExp(Σ )→ BExp(Σ )→ BExp(Σ )→ BExp(Σ )

Form

create string by

Not(b)And(b1 , b2 )Or(b1 , b2 )

===

︷︸︸︷

not bb1 and b2

b1 or b2


structureand

form

defines test depends on and creates string︷︸︸︷

〈Bool exp〉〈Bool exp〉〈Bool exp〉〈Bool exp〉〈Bool exp〉〈Bool exp〉

::=::=::=::=::=::=

︷︸︸︷

truefalse〈expression〉=〈expression〉not〈Bool exp〉〈Bool exp〉and〈Bool exp〉〈Bool exp〉or〈Bool exp〉

15.5 Context Free Grammars and Terms

In Section 15.2, we produced an algebra to model the abstract syntax of while commands. Itis simple to produce a grammar to define this same set of strings. Indeed, we can producea grammar that has the same structure as the algebra. Furthermore, we can do this in asystematic manner, as this is an example of a general technique.

In this section we study an algorithm that transforms any context free grammar G intoa signature ΣG , and hence into a term algebra T (ΣG) which can also be used to define thelanguage L(G).

15.5.1 Outline of Algorithm

Recall that in a context free grammar G = (T ,N , S ,P), the production rules P are constrainedto have the form

A→ w

15.5. CONTEXT FREE GRAMMARS AND TERMS 631

where A ∈ N is some non-terminal, and w ∈ (T ∪ N )+ is some string of terminal and/ornon-terminal symbols.

We divide the production rules of the context free grammar into two distinct sets:

(i) those production rules which have no non-terminal symbols on the right-hand-side, i.e.,they are of the form

A→ t1 · · · tnfor A ∈ N , t1 , . . . , tn ∈ T ; and

(ii) those production rules which have at least one non-terminal symbol on the right-hand-side, i.e., they are of the form

A→ u0A1u1 · · · un−1Anun

for A,A1 , . . . ,An ∈ N , u0 , u1 , . . . , un ∈ T ∗.

In a derivation of a string, when we use a rule of type () to replace a non-terminal in astring, that is the final step for that non-terminal: it has now been replaced by terminal symbolswhich cannot be rewritten. The only further possible derivation(s) for that string result fromany other non-terminal symbols present in the string.

A rule of type () on the other hand, will always be an intermediate step in a derivation, withfurther derivations needed to rewrite the non-terminals which are introduced by the applicationof that production rule.

The rules of type () will give the constants in the algebra, and those of type (), the functions.The sorts of each are determined by the non-terminal symbols present in the production rules.

Because we use a context free grammar, the terminal symbols generated at any stage onlydetermine the representation of the string that is generated, they do not have any influenceon how the string is generated. Thus, we only use the terminal symbols to describe how weinterpret the constant and function symbols.

15.5.2 Construction of a signature from a context free grammar

We now give a formal description of how we form a signature ΣG from the grammar G , fromwhich we generate the term algebra T (ΣG). First, we need the following definition.

Definition For any context free grammar G = (T ,N , S ,P) we inductively define a function

nonterminals : (T ∪ N )∗ → N ∗

on strings that returns all the non-terminal symbols present in a string w ∈ (T ∪N )∗, (possiblywith repetitions) in the same order in which they originally appeared. So, a string w containsno variable symbols if, and only if, nonterminals(w) = ε. We define the function nonterminalsfor any u ∈ (T ∪ N ), and w ∈ (T ∪ N )∗ by

nonterminals(ε) = ε

nonterminals(u.x ) =

nonterminals(x ) if u ∈ T ;

u.nonterminals(x ) if u ∈ N .


Example Let T = a, e, i , o, u be the set of vowels and N = a, b, . . . , z−T , the set of con-sonants. The function nonterminals : (T ∪ N )∗ → N ∗ removes terminal symbols from strings,so in this example it will remove all the vowels from a word, whilst leaving the consonantsuntouched. So, applying nonterminals to some sample words over (T ∪ N )∗ we get:

nonterminals(alphabet) = lphbt

nonterminals(swansea) = swns

nonterminals(oui) = ε

nonterminals(rhythm) = rhythm

Signature Construction Algorithm Let G = (T ,N , S ,P) be a context free grammar. Leteach of the productions A→ w of P be labelled with a unique identifier i :

i : A→ w

We construct the signature ΣG as follows.

(i) We define the set of sorts to be the set N of the non-terminal symbols of G .

(ii) For each production rulei : A→ w

in P withnonterminals(w) = ε

we define a constant symboli :→ A.

(iii) For each production rulej : A→ w

in P withnonterminals(w) = A1A2 · · ·An

we define a function symbolj : A1 × · · · × An → A.

15.5.3 Algebra T (ΣG) of language terms

Given the signature ΣG , we now form the closed term algebra T (ΣG) in the manner describedin Section 8.6. Thus we get:

(i) for each production rule of the form

i : A→ t1 · · · tn

for any non-terminal symbol A ∈ N and any terminal symbols t1 , . . ., tn :→ T , there willbe a constant

i :→ T (ΣG)A;

and


(ii) for each production rule of the form

j : A→ u0A1u1 · · · un−1Anun

for any non-terminal symbols A, A1 , . . ., An ∈ N and any strings of terminal symbolsu0 , . . ., un ∈ T ∗ there will be a function

j : T (Σ )A1× · · · × T (Σ )An

→ T (Σ )A,

We use a ΣG-homomorphism φ to interpret the terms of the term algebra as follows.

(i) Given a production rulei : A→ t1 · · · tn

in the grammar G which has given rise to a constant symbol

i :→ A

in the signature ΣG , we interpret the term i by:

φ(i) = t1 · · · tn

(ii) Given a production rulej : A→ u0A1u1 · · · un−1Anun

which has given rise to a function symbol

j : T (Σ )A1× · · · × T (Σ )An

→ T (Σ )A,

in the signature ΣG , and given terms a1 ∈ T (Σ )A1, . . . , an ∈ T (Σ )An

, we interpret theterm

j (a1 , . . . , an)

byφ(j (a1 , . . . , an)) = u0a1u1 · · · un−1anun .

User-Friendly Algebraic Model

In practice, we can typically use the production rules to give suggestive names to the constantand function symbols and to hide some of this technical machinery. A grammar G

grammar G

terminals . . . , t , t1 , . . . , tn , . . .

nonterminals . . . ,A,A0 ,A1 , . . . ,An , . . .

start symbol A0

productions...A → t1 · · · tn...A → u0A1u1 · · · un−1Anun

...


where u0 , . . . , un ∈ T ∗ are strings of terminals gives rise to an algebraic model:

algebra T (ΣG)

carriers . . . ,T (ΣG)A,T (ΣG)A0,T (ΣG)A1

, . . . ,T (ΣG)An

constants...t1 · · · tn :→ T (ΣG)A...

operations...u0 u1 . . . un−1 un : T (ΣG)A1

× · · · × T (ΣG)An→ T (ΣG)A

...

15.5.4 Observation

This construction has the property that each term in the algebra T (ΣG) corresponds to aparse tree for each string that can be derived from the grammar G . In particular, for each non-terminal symbol A ∈ N , each string of the grammar derived from A will have an associatedterm in the carrier set T (ΣG)A.

Example Consider the context free grammar

G = (a, b, S, S ,P)

with labelled production rules Pα : S → abβ : S → aSb

which generates the languageL(G) = anbn | n ≥ 1.

We can transform G into the signature:

signature ΣG

sorts S

constants α :→ S

operations β : S → S

endsig

Thus, the signature ΣG has

(i) a sort set which consists of the start symbol S (as this is the only non-terminal symbolin the grammar);

(ii) a constant symbol α :→ S which corresponds to the production α : S → ab in G ; and


(iii) a function symbol β : S → S which corresponds to the production β : S → aSb in G .

For example, the derivation in the grammar G :

S ⇒ aSb ⇒ aaSbb ⇒ aaabbb

corresponds to the term β(β(α)) ∈ T (ΣG). Figure 15.8 shows some sample parse trees forderivations in G with their corresponding terms and trees in the algebra T (Σ G).

S

a S b

a b

S

a S b

Sa b...S

a S b

a b

Parse trees for derivation.

S

a b

Corresponding term in T (ΣG).

S ⇒ ab

S ⇒ aSb⇒ aabb

·α

α

...

β

α

α

β

β

β

βn(α)Sn⇒ anSbn ⇒ an+1bn+1

β2(α)

Figure 15.8: Derivations of sample phrases with their corresponding algebraic trees and terms.

15.6. CONTEXT SENSITIVE LANGUAGES 637

signature WP

import Var ,Num

sorts <command>,<exp>,<Bool exp>,<identifier>,<number>

constants skip : → <command>true, false : → <Bool exp>

operations assign : 〈variable〉 × 〈exp〉 → 〈command〉seq : 〈command〉 × 〈command〉 → 〈command〉cond : 〈Bool exp〉 × 〈command〉 × 〈command〉 → 〈command〉iter : 〈Bool exp〉 × 〈command〉 → 〈command〉id : 〈variable〉 → 〈exp〉num : 〈number〉 → 〈exp〉add : 〈exp〉 × 〈exp〉 → 〈exp〉times : 〈exp〉 × 〈exp〉 → 〈exp〉not : 〈Bool exp〉 → 〈Bool exp〉and : 〈Bool exp〉 × 〈Bool exp〉 → 〈Bool exp〉or : 〈Bool exp〉 × 〈Bool exp〉 → 〈Bool exp〉eq : 〈exp〉 × 〈exp〉 → 〈Bool exp〉

We can now form the closed term algebra T (ΣWP) from the signature ΣWP , giving us analgebra isomorphic to the algebra WP(Σ ) of Figure 15.4.

15.6 Context Sensitive Languages

We can only use the algorithm in Section 15.5 to describe how we can construct an algebra froma context free grammar. In order to build an algebra to describe a context sensitive language, wehave to define additional operations and equations that describe the context sensitive featuresof the language. The purpose of these new operations is to impose constraints upon the contextfree part of the language to remove strings that are generated by the context free grammar,but which we do not want in the language.

We used this approach, in an informal manner, to specify the context sensitive featuresexplained in Sections 14.5.1 and 21. This is a common approach to specifying context sensitiveprogramming languages, see Dijkstra [1962], for example.

The algebraic approach to the specification of non-context free languages is a mathematicallyprecise formulation for defining the context sensitive aspects of languages. Whilst we cancompletely specify a context sensitive language in a formal way by using a context sensitivegrammar, it can be quite a difficult task as it is at such a low level.

Example Consider the language

L = anbncn | n ≥ 1.

We know that this language is not context free from Section 14.4.2. We can be establish that thislanguage is context sensitive by giving the following grammar Ganbncn to define L = L(Ganbncn ):


grammar Ganbncn

terminals a, b, c

nonterminals S

start symbol S ,A,B

productions S → abcS → aAbcAb → bAAc → BbccbB → BbaB → aaAaB → aa

Notice that this low-level context-sensitive grammar does not reflect the structure of thelanguage. Compare this with an algebraic description of L.

Consider the context free superset

La ibj ck

= a ibj ck | i , j , k ≥ 1

of the language L. We can describe this language easily with the context free grammar:

grammar Ga ibj ck

terminals a, b, c

nonterminals S ,A,B ,C

start symbol S

productions S → ABCA → aA → aAB → bB → bBC → cC → cC

Equivalently, we can describe La ibj ck

with the algebra:


algebra La ibj ck

carriers La ibj ck

S = t | t is the parse tree for a ibj ck , i , j , k ≥ 1 .La ibj ck

A = t | is the parse tree for a i , i ≥ 1 .La ibj ck

B = t | is the parse tree for b i , i ≥ 1 .La ibj ck

C = t | is the parse tree for b i , i ≥ 1 .

constants a : → Ab : → Bc : → C

operations ABC : A× B × C → SaA : A→ AbB : B → BcC : C → C

Now we need to restrict the language La ibj ck

to the language L = Lanbncn

that we actuallywant. We do this by adding functions to our algebraic description. We describe how thesefunctions operate by using equations. These functions ensure that the following constraint issatisfied:

Parity Condition

Every string of L has equal numbers of a’s, b’s and c’s.

First we design a function

chk : La ibj ck → (Lanbncn ∪ error)so that:

(i) given the input t = ABC (aAn(a), bBn(b), cC n(c)) (for n ≥ 0 ) that represents a legalstring an+1bn+1cn+1 ∈ L, chk(t) returns t ; and

(ii) given the input t = ABC (aAi , bB j (b), cC k(c)) where i , j and k are not all equal, thatrepresents an illegal string a i+1bj+1ck+1 6∈ L, chk(t) returns error .

The idea behind the definitions we provide below, is that we recurse down in parallel oneach of the terms representing strings of a’s, b’s and c’s, until we either:

(i) end up with more of one element than another, which results in error ; or

(ii) end up with the term ABC (a, b, c) which represents the string abc ∈ L, so we return theoriginal term.

To perform this latter task, we use the function

grow : (Lanbncn ∪ error)→ (Lanbncn ∪ error)so that grow(t) extends the term t representing a string anbncn , to that representing an+1bn+1cn+1 .In the case that t = error , we simply propagate this message along.


algebra Lanbncn

import La ibj ck

carriers Lanbncn ∪ error = t | t is the parse tree for anbncn , n ≥ 1 ∪ error

constants error :→ (Lanbncn ∪ error)

operations chk : La ibj ck → (Lanbncn ∪ error)grow : (Lanbncn ∪ error)→ (Lanbncn ∪ error)

equationschk(ABC (a, bB(tB), tC )) = errorchk(ABC (a, tB , cC (tC ))) = errorchk(ABC (aA(tA), b, tC )) = errorchk(ABC (aA(tA), tB , c)) = errorchk(ABC (tA, b, cC (tC ))) = errorchk(ABC (tA, bB(tB), c)) = error

chk(ABC (a, b, c)) = ABC (a, b, c)chk(ABC (aA(tA), bB(tB), cC (tC ))) = grow(chk(ABC (tA, tB , tC )))

grow(error) = errorgrow(ABC (tA, tB , tC )) = ABC (aA(tA), bB(tB), cC (tC ))

For i , j , k ≥ 0 , each string a i+1bj+1ck+1 is represented by a term

ABC (aAi(a), bB j (b), cC k(c)),

and each error-reported string can be represented by the term

t = chk(ABC (aAi(a), bB j (b), cC k(c))),

which we can also represent as an algebraic tree:


chk

ABC

bB

bB

b

j times aA

aA

a

i times cC

k times

cC

cThe equations tell us that we can reduce every initial algebraic tree to one of the forms

shown below:

error.

aA bB cC

cba

ABC

aA bB cC

......

...n ≥ 0 times

Every term that does not reduce to error represents a parse tree of the form shown below:


S

ABC

B

B

A

A

C

n times

C

a b c

ba c

A B Ccba


1. What changes to the algebra T (ΣWP) are needed to incorporate (i) the reals? (ii)characters?

2. By adding appropriate arithmetical operations to ΣWP , define a program for Euclid’salgorithm as a term.

3. Give an algebra and a BNF grammar for the while language generated from an arbitrarysignature that names the elements and operations for an arbitrary set of data.

Part III

Semantics

643

645

Introduction

There are countless types of written texts and syntaxes. For each syntax, often, there are manydifferent ways to define or interpret its meaning — i.e. to give the syntax a semantics. Notsurprisingly, defining the semantics of texts in a systematic way is even more difficult thandefining their syntax. For natural languages, the problem is technically and philosophicallydeep. Certainly, literature exploits fully the vast space of possible meanings.

Artificial languages, like notation systems for mathematical formulae and musical scores,are designed to record concepts, structures or physical effects. They are invented in order togive added precision in some specialism where a natural language is too loose in its meaningsto be practically useful. Thus, defining the semantics of artificial languages ought to be moreeasily achievable because of their limited scope. The form of the semantics is heavily dependenton the form of the syntax of the language. Artificial languages have much simpler syntacticspecifications.

There are, of course, countless artificial languages for programming, and their constructsmay have many meanings. What is the semantics of a programming language? In short, itis a description of what the all the programs of the language do. A semantics models of thecomputational behaviour of a language. It could be quite abstract, stating only what outputsresult from what inputs; or it could be more concrete and detailed, specifying each step of anycomputation.

The diversity of languages is great, as their different constructs and implementations testify.There are heated debates on the merits and superiority of different constructs and their seman-tics. However, it is possible to give a complete description of the semantics of a programminglanguage. We will do this for our simple and small languages.

To conclude our introduction to the theory of programming languages, in Part III, we willstudy the semantics of programming languages. We will introduce methods for

(i) defining different kinds of semantics,

(ii) reasoning about program behaviour, and

(iii) comparing different programming languages via compilation.

We will begin with the problem of defining the behaviour of programs. In Chapter 16,we introduce the input-output semantics of while programs. This requires us to define statesof a computation and transformations of these states. To prove properties of programs, inChapter 17, we introduce the principle of structural induction for programs. In Chapter 18, werefine the input-output semantics of programs into an operational semantics that explains thesteps in a computation. In Chapter 19, we introduce a virtual machine language. This gives usa new language with jumps on which to apply semantical methods. Finally, in Chapters 20–21, we consider the process of compiling. We formulate the correctness of compilation usingequations and we prove the correctness of a simple complier, from while programs to the virtualmachine language, using structural induction.

Chapter 16

Input-Output Semantics

We have modelled the syntax and semantics of many data types by signatures and algebras.We have modelled the concrete and abstract syntax of many languages, including our chosenimperative programming language, the while language, by grammars and algebras. Now webegin the study of the semantics of languages with that of the while programming language.Our problem is

Problem of Semantics To model mathematically what happens when any while programperforms a computation over any data type.

To give a mathematical definition of how a while program computes a function on any datatype.

To do this, we shed much of the rich concrete syntax of the while language and reduceit to a simple abstract syntax. The important syntactic features of complex identifiers anddeclarations are not needed for semantics. What are needed are simple notations for the datatype operations, variables, expressions and commands.

More technically, for any data type whose interface is modelled by a signature Σ , we willgive a concise definition of the set

WP(Σ )

of all while programs over Σ . Then we will make a mathematical model of the behaviour ofany while program S ∈WP(Σ ) in execution on any implementation of the data type modelledby Σ -algebra A.

In this chapter we will build a mathematical model whose equations allow us to derive theoutput of a program from its input. This kind of model is called an

input-output semantics

for the programming language.Later, in Chapter 18, we will build more detailed models which allow us to calculate each

step in a computation. This kind of model is called an

operational semantics

647

648 CHAPTER 16. INPUT-OUTPUT SEMANTICS

for the programming language.The while programming language WP(Σ ) defines computations on any underlying data

type with interface the signature Σ and implementation the Σ -algebra A. It constitutes anexcellent kernel language with many extensions. Clearly, syntactic features that do not requiresemantics can be added to enrich the language. By choosing different signatures and algebras,a range of programs and, indeed, programming languages can be specified easily.

In general, we can extend to a new language L by adding new constructs to the while

programming language by three methods. First, we can extend the data types. For example,to add arrays or streams to the while language, we need only add them to a data type signatureΣ to make expansions ΣArray and Σ Stream and then take

WP(ΣArray) and WP(Σ Stream)

Now we can substitute signatures and apply our methods to give a semantics for a while

language with arrays and a while language with streams, respectively.Second, we can extend the language with constructs by using syntactic definitions that

specify the new constructs in terms of the old. For example, we may add repeat or casestatements by defining them in terms of the constructs of the while language. Here a processof flattening reduces the extended language L to the kernel language WP(Σ ) and our semanticmethods can be applied immediately. An obvious question arises:

Which imperative constructs can be reduced to those of the kernel while language?

For example, can procedures and recursion be added in this way?Third, we can extend the language with new constructs that have their own semantics.

This is easy to do for constructs such as repeat and case, but harder for constructions suchas recursion. Extensions with concurrent assignments and non-deterministic choices we do notexpect to be able to reduce to a sequential deterministic kernel like the while language.

These methods are depicted in Figure 16.1.Let us look further at the raw material of programming, in order to think about the problems

of defining the semantics a programming language; these are the problems that we can expectto analyse by our mathematical models.

16.1 Some Simple Semantic Problems

To think scientifically and reason mathematically about imperative programming, we mustmaster the problem of defining exactly the semantics of an imperative programming language.To appreciate the problem, we consider how to define precisely the data used in a program andthe effect or meaning of a simple programming statement such as the assignment. Then we willconsider how to define the effect of a sequence of statements, such as a program for calculatingthe greatest common divisor of two numbers.

16.1.1 Semantics of a Simple Data Type

Consider a simple interface for programming with natural numbers. It is a list of names fordata and operations:

16.1. SOME SIMPLE SEMANTIC PROBLEMS 649

Extend withnew syntactic features

independent of semanticse.g., user identifiers and

variable declarations other new data structures

Extend withnew sorts and operations

in data typee.g., arrays and

Extend withconstructs that are

defined in terms of old

import

e.g., repeat, case

Syntax ofkernel language

WP (Σ)

Semantics ofkernel language

WP (Σ)New Semantics

New constructs

flattensubstituteforget

Figure 16.1: Extending the kernel while language.


sorts nat , bool


operations mod : nat × nat → nat6=: nat × nat → bool

Statements in the program for the Euclidean Algorithm will use these notations (see Section ??).But this is notation; what does it denote, what are their semantics?

There are several choices for the set of natural numbers, such as

NDec = 0 , 1 , 2 , . . .NBin = 0 , 1 , 10 , . . .

represented in decimal and binary, respectively. And there are finite choices of the form

0 , 1 , . . . ,Max.

Each choice has appropriate functions to interpret the operator mod and test eq. The importantpoint is that some collection of sets of data and functions on data must be made in order tobegin interpreting the statements.

16.1.2 Semantics of a Simple Construct

We will define the semantics of a series of program fragments based on different assignmentstatements.


Assignment 1 : begin var x ,y :nat; x :=y end

“Make the value of y the new value of x . The value of y is not changed but the old value ofx is lost.”

This first fragment can be applied to any data type, not just the naturals. This secondfragment depends on the properties of the data:

Assignment 2 : begin var x ,y ,z :nat; x :=y+z end

“Evaluate the sum of the data that are the values of y and z , and make this the new valueof x . The values of y and z are not changed but the old value of x is lost.”

The operation of addition always returns a value: it is a total function. If the data type ofnaturals is finite then a maximum natural number will exist but the above description will stillbe valid. Clearly to define the assignment we must also write out the meaning of the operatorsit contains. Now the semantics of the second fragment can be also applied to numbers such asthe integers, rational numbers, real numbers or complex numbers.

Assignment 3 : begin var x ,y ,z :nat; x :=(y+z )/2 end

“Evaluate the sum of the data that are the values of y and z . Divide by 2 and, if this numberis a natural number, then make this the new value of x . The values of y and z are notchanged but the old value of x is lost. If however the division leads to a rational then . . . ”

Because division by 2 in the naturals does not always have a value (e.g., 3/2 is not anatural) the assignment contains an operator that is a partial function. The above meaning isleft incomplete because there are several choices available. First, one can ignore the problem:

(i) (y + z )/2 can be rounded up or rounded down and the computation proceed;

(ii) x , y , z do not change their values and the computation proceeds;

or one can indicate there is a problem:

(iii) an undefined element u can be added to the data type, to indicate there is no value, whichcan be assigned to x and the computation proceed;

(iv) an error message can be displayed and the computation suspended.

This particular problem with division can be avoided by changing the data type to therationals. However the basic semantical difficulties with partial operators remain. Anotherexample, involving the real numbers, is this:

16.1. SOME SIMPLE SEMANTIC PROBLEMS 651

Assignment 4 : begin var x ,y ,z :real; x :=sqr[[(y+z )/2]] end

“Evaluate the sum of the data that are the values of y and z , and divide by 2. Take thesquare root of this datum, if it exists, and make this the new value of x . The values of yand z are not changed but the old value of x is lost. If the square root does not exist then . . . ”

Since there are no square roots for negative numbers we must complete the semantics bychoosing an option. Again the particular problem can be avoided by changing the type to thecomplex numbers.

As one thinks about these meanings one is struck by the increasing length of the descriptionsand the choice of statements as to what happens when, in particular, operators return no value.In fact there is already a decision in the first case of Assignment 1: here is a different non-standard meaning for the assignment:

Assignment 5 : begin var x ,y :nat; x :=y end

“Transfer the value of y to be the new value of x . The value of y is set to a special valuedenoting empty or unspecified u, and the old value of x is lost.”

If this non-standard interpretation is selected then the other semantic options can be rewrit-ten, the choices multiply and the descriptions become more elaborate.

Try this exercise of carefully describing semantics with some of the other constructs we havementioned in the previous section.

Of course, what we need is a mathematical formulation of the semantics that is precise,concise and capable of a full logical analysis. Shortly, we will define the semantics of assignmentsin a simple formula:

M (begin var x ,y ,z , . . . :data; x :=e(y ,z , . . . )end)(σ) = σ[x/V (e)(σ)]

Roughly speaking, the left hand side means: this is the state after applying the assignmentfragment. The right hand side means: take the state σ and replace the value of x by the valueobtained by evaluating the expression e(y ,z , . . . ) on σ, and leaving all other variables y , z , . . .unchanged.

This formula works for all of the above cases where the semantic decisions concern only thedata types. Remarkably, the formula works for any data type consisting of data equipped withoperators. The option (4) requires us to add the idea of an error state to the set of possiblestates.

16.1.3 Semantics of a Simple Program

Let us look at a program that is based on the imperative constructs mentioned.We consider a famous and ancient algorithm. The largest natural number that divides two

given natural numbers m and n is called the greatest common divisor of m and n, and is writtengcd(m, n); for example, gcd(45 , 12 ) = 3 .

Euclid’s Algorithm is the name given to a method for computing gcd(m, n) found in BookVII of Euclid’s Elements, although the algorithm is older. Here it is expressed as a simpleimperative program based on the data type of natural numbers:


program Euclid


sorts nat,bool



endsig

body

var x,y,r:nat

begin



od;write (y)

end

How can we describe exactly the behaviour of the program? What is a computation by thisprogram?

A state of a computation can be defined to be the three integer values p, q , r of the variablesx , y , z . Let

N = NDec = 0 , 1 , 2 , . . .denote the natural numbers. Hence a state is represented by a 3-tuple

(p, q , r) ∈ N3

A computation can be defined to be a sequence of these states starting with some initial state

(p1 , q1 , r1 ), (p2 , q2 , r2 ), . . . , (pk , qk , rk), . . . .

Here (pk , qk , rk ) is the state of the computation at the k -th step. The step from (pk , qk , rk) to(pk+1 , qk+1 , rk+1 ) is determined by processing a statement in the program. This sequence is aprogram trace which tracks the values of the variables as the computation proceeds.

Let us work out an example using our informal understanding of these programming con-structs. The computation of gcd(45 , 12 ) by Euclid’s algorithm consists of the following steps:

16.2. OVERVIEW 653

Step Number Value of x Value of y Value of z Comment1 45 12 ? Initial state

mathit2 45 12 9 First assignmentmathit3 12 12 9 Entered loopmathit4 12 9 9mathit5 12 9 3mathit6 9 9 3 Re-enter loopmathit7 9 3 3mathit8 9 3 0 Exit loop

The ? indicates that the value of z can be anything at the start of the computation. Noticethat the idea of state chosen avoids, or hides, the meaning of the read and write statements,and the evaluation of the Boolean tests.

Of particular importance are exceptional properties of the program:What is gcd(0 , n) and gcd(0 , 0 )? What does the program do for these values?If the semantics of the integers is not the infinite set

N = 0 , 1 , 2 , . . .

but some finite set−M , . . . ,−2 ,−1 , 0 , 1 , 2 , . . . ,+M

that approximates N, then does the program still compute greatest common divisors?Clearly, the answers to these latter questions depend on the data type of integers and

the meanings of greatest common divisor, and the operation of division. They also involveproperties of the Booleans.

There are some other obvious questions we can ask: How do we know the program computesthe greatest common divisor? What is its efficiency?

16.2 Overview

A while program produces a computation by specifying a sequence of assignments and testson the values of its variables. At any stage, the values of the variables constitute a state of acomputation. For each while program S we imagine that, given any starting or initial state σfor a computation, the program generates a finite sequence

σ0 , σ1 , . . . , σn

or an infinite sequenceσ0 , σ1 , . . . , σk , . . .

of states as the constructs in the program S are processed in the order prescribed; of course,the first state σ0 in the sequence is the given initial state σ.

These sequences are the basis of a trace of a program execution. The finite sequence iscalled a terminating or convergent computation sequence; the infinite sequence is called a non-terminating or divergent computation sequence. A terminating computation sequence has a lastor final state σn from which we expect that an output can be abstracted. A non-terminating


computation sequence has no final state and, since the computation proceeds forever, we donot expect an output.

In our first attempt at defining the semantics of a program we will focus on this concept ofthe input-output behaviour of a program.

Suppose the program S is over a data-type with signature Σ , i.e., S ∈ WP(Σ ). Supposethe data type is implemented by a Σ -algebra A. We will define the concept of a state of acomputation using data from the algebra A and hence the set

State(A)

of all possible states of all possible computations using data from A. The input-output behaviourof a program S is specified by a function

M ioA (S ) : State(A)→ State(A)

such that for any state σ ∈ State(A)

M ioA (S )(σ) = the final state of the computation generated by a program S

from an initial state σ, if such a final state exists.

Since a while loop may execute forever, we do not expect the function M ioA (S ) to be defined on

all states. That is, we expect M ioA (S ) to be a partial function. If there is a final state τ of the

computation of S on σ we say the computation is terminating and we write

M ioA (S )(σ) ↓ or M io

A (S )(σ) ↓ τotherwise it is non-terminating and we write

M ioA (S )(σ) ↑ .

To model the behaviour of programs, we will solve the following problem:

Problem of Input-Output Semantics To give a precise mathematical definitionof the input-output function M io

A (S ).

In this first attempt at modelling program behaviour, it is important to understand that weare seeking to define the input-output behaviour of a program and not every step of a computationby the program. It turns out that to define M io

A (S ) we do not need a full analysis of every stepof a computation of S . The analysis is directed toward the idea that the program computes apartial function f on A. Hence, input-output semantics is an abstraction from the behaviour ofprograms. There are many methods for defining the semantics of a program S that are basedon intuitions of how the program operates, step-by-step; these methods are called operationalsemantics.

The mathematical modelling of the operation of a program is a subtle activity. Even in thecase of simple while programs, there is considerable scope for the analysis of the step

. . . , σi , σi+1 , . . .

from one state σi to the next σi+1 in a computation. The level of detail involved, or considered,in a step determines the level of abstraction of the semantics and affects our analysis of timein computations and hence of the performance of the programs.

In this chapter we describe a simple method of defining the input-output semantics of while

programs, guided by some operational ideas. We will specify the

16.3. DATA 655

• data

• states of a computation

• operations and tests on states

• control and sequencing of actions in commands

that allow S to generate computations. In a later chapter these methods will be reviewed anda more detailed operational semantics studied; this will generate a full trace of a program’soperation.

16.3 Data

The interfaces and implementations of data types are modelled by signatures and algebras,respectively. We met these elements of the theory of data in Chapter 3, and showed how tointegrate the signatures into the syntax of the while language in Chapter 10. Our first taskis to integrate the algebras into the semantics of the while language. We will do this for thelanguage WP(Σ ) of while programs over a fixed signature Σ .

We will quickly recall and settle notations and examples of algebras which we will use toillustrate our semantical definitions and computations. We suppose a signature Σ has theform:

signature Σ

sorts . . . , s , . . .

Bool

constants . . . , c : → s , . . .

true, false : → Bool

operations . . . , f : s(1)× · · · × s(n)→ s , . . .

. . . , r : t(1)× · · · × t(m)→ Bool , . . .

not : Bool → Booland : Bool × Bool → Boolor : Bool × Bool → Bool

endsig

and the Σ -algebra A has the form:


algebra A

carriers . . . ,As , . . .

B

constants . . . , cA : → As , . . .trueA, falseA : → B

operations . . . , f A : As(1 ) × · · · × As(n) → As , . . .. . . , rA : At(1 ) × · · · × At(m) → B, . . .notA : B→ B

andA : B× B→ B

orA : B× B→ B

We assume that the signature Σ and algebra A are both standard with respect to theBooleans. That is, we assume that the Boolean sorts and operations have their standardinterpretation in the algebra A.

16.3.1 Algebras of Naturals

Consider a two-sorted signature ΣPeano of natural numbers and Booleans defined by:

signature Peano

sorts nat ,Bool

constants zero : → nattrue, false : → Bool

operations succ : nat → natadd : nat × nat → natmult : nat × nat → natand : Bool × Bool → Boolnot : Bool → Boolless than : nat × nat → Bool

LetA = (N,B; 0 , tt ,ff ; n + 1 , n + m, n ×m,∧,¬, <)

be an algebra with signature ΣPeano .

16.3.2 Algebra of Reals

Consider a two-sorted signature ΣOrdered ring of reals of real numbers and Booleans defined by:

16.4. STATES 657

signature Ordered ring of reals

sorts real ,Bool

constants zero : → realtrue, false : → Bool

operations add : real × real → realminus : real × real → realmult : real × real → realand : Bool × Bool → Boolnot : Bool → Boolless than : real × real → Bool

LetA = (R,B; 0 , tt ,ff ; +,−,×,∧,¬, <)

be an algebra with signature ΣOrdered ring of reals . If we add division we have a new signatureΣOrdered field of reals defined by

signature Ordered field of reals

sorts real ,Bool

constants zero, one : → realtrue, false : → Bool

operations add : real × real → realminus : real × real → realmult : real × real → realdivide : real × real → realand : Bool × Bool → Boolnot : Bool → Boolless than : real × real → Bool

LetA = (R,B; 0 , 1 , tt ,ff ; +,−,×,−1 ,∧,¬, <)

be an algebra with signature ΣOrdered field of reals .

16.4 States

To define the semantics of WP(Σ ), we must first define the state of a computation over algebraA and then how to apply expressions, Boolean expressions and commands to these states. Thedata of A belong to the family

〈As | s ∈ S 〉of carrier sets of A. Each sort of data needs its own store.


We will consider while programs that operate over some S -sorted family

Var = 〈Var s | s ∈ S 〉

of variables, whereVar s

is the set of all variables of sort s .For each sort s ∈ S an s-state over A is a map:

σs : Var s → As

which represents a possible configuration of a store of data of sort s from A. The idea is that

σs(x ) = value in As of variable x ∈ Var s .

Let States(A) be the set of all s-sorted states over A.A state over A is a family

σ = 〈σs | s ∈ S 〉of s-states over A and represents a possible configuration of a complete store of data from A.

The set of all states over A represents all configurations of this abstract state over A, andis given by

State(A) = 〈States(A) | s ∈ S 〉.

...

sort s variables x s0 , x

s1 , . . . , x

sn , . . .

state σs values σs(xs0 ), σs(x

s1 ), . . . , σs(x

sn), . . .

...

Figure 16.2: The store modelled by state σ.

Notational Convention

In practice, we can often drop reference to sorts and allow them to be inferred from the context.We will sometimes write

x ∈ Var , σ ∈ State(A) and σ(x ) ∈ A

instead ofx ∈ Var s , σ ∈ States(A) and σ(x ) ∈ As .

16.5. OPERATIONS AND TESTS ON STATES 659

16.4.1 Example

Let Var real = r1 , r2 , . . . and VarBool = b1 , b2 , . . . be a set of real - and Bool -sorted variablesrespectively. Let σ = 〈σreal , σBool〉 be a state over A, where

σreal : Var real → R

σBool : VarBool → B

are real -states and Bool -states, respectively. A state can be visualised as shown in Table 16.1below; for example σreal(r5 ) =

√2:

Var real r1 r2 r3 r4 r5 r6 · · · ri · · ·σreal(ri) 0 1 .5 12 π

√2 1 · · · −4 · · ·

VarBool b1 b2 b3 b4 b5 b6 · · · bi · · ·σBool(bi) tt ff tt ff ff ff · · · ff · · ·

Table 16.1: An example of real -sorted and Bool -sorted states.

16.4.2 Substitutions in States

Let σ = 〈σs | s ∈ S 〉 be a state over A. Let s ∈ S , x ∈ Var s and a ∈ As . To change the valueof a variable x in the state σ to the new value a we require a substitution operation

subs : As × Var s × States → States

that transforms σs into a new state subs(x , a, σs); we invariably write the value of the operation

σs [a/x ]

rather than subs(x , a, σs). This substitution is defined by:

subs(x , a, σa)(y) = σs [a/x ](y) =

σs(y) if y 6= x ;

a otherwise.

So σs is unchanged except that the new value of x is a and the old value is lost; in particular,for y 6= x ,

σs [a/x ](y) = σs(y)

Example Consider the state σ given in Table 16.1. Then the substitution

σreal[3 .142/r4 ]

will give the new σreal state shown in Table 16.2.

16.5 Operations and tests on states

We define the value of general expressions and Boolean expressions on a state.


Var real r1 r2 r3 r4 r5 r6 · · · ri · · ·σreal(ri) 0 1 .5 12 3 .142

√2 1 · · · −4 · · ·

Table 16.2: New state after substitution.

16.5.1 Expressions

We define the value of an expression e on a state σ over A by means of the function

VA : Exp(Σ )→ (State(A)→ A)

where for e ∈ Exp(Σ ) the purpose of the function

VA(e) : State(A)→ A

is, for σ ∈ State(A)

VA(e)(σ) = the value in A of the expression e on the state σ.

In detail, VA is a family〈V s

A | s ∈ S 〉of functions

V sA : Exps(Σ )→ (States(A)→ As)

that define the value of expressions of sort s on states of sort s .The family VA is defined by induction on the structure of terms over Σ simultaneously for

each sort s ∈ S by:

V sA(c)(σ) = cA for c :→ s ∈ Σ .

V sA(x )(σ) = σ(x ) = σs(x ) for x ∈ Var s .

V sA(f (e1 , . . . , en))(σ) = f A(V

s(1 )A (e1 )(σ), . . . , V

s(n)A (en)(σ)) for f : s(1 )× · · · × s(n) ∈ Σ , ei ∈ Exps(i)(Σ ).

16.5.2 Example Expression Evaluation

Recall the two-sorted signature ΣOrdered field of reals and algebra A of real numbers in Sec-tion 16.3.2. Let t1 = add(x , y) and t2 = mult(plus(x , one),minus(x , one)) be terms overΣOrdered field of reals of sort real ; so x , y ∈ Var real . If the variables have the values

σreal(x ) = 1

σreal(y) = 3 .142

16.5. OPERATIONS AND TESTS ON STATES 661

in a state σ = (σreal , σBool), then we get the following evaluations of the terms t1 and t2 on thestate σ:

V realA (t1 )(σ) = V real

A (add(x , y))(σ)

= +(V realA (x )(σ),V real

A (y)(σ))

= +(σreal(x ), σreal(y))

= +(1 , 3 .142 )

= 4 .142

V realA (t2 )(σ) = V real

A (mult(plus(x , one),minus(x , one)))(σ)

= ×(V realA (plus(x , one))(σ),V real

A (minus(x , one))(σ))

= ×(+(V realA (x )(σ),V real

A (one)(σ)),−(V realA (x )(σ),V real

A (one)(σ)))

= ×(+(σreal(x ), 1 ),−(σreal(x ), 1 ))

= ×(+(1 , 1),−(1 , 1))

= ×(2 , 0 )

= 0

16.5.3 Tests

The semantics of Boolean expressions is the special case s = Bool of the semantics of expres-sions: we use V Bool

A . It is helpful to give tests special treatment and notations: We define thevalue of a Boolean expression b on a state σ over A by means of the function

WA : BExp(Σ )→ (State(A)→ B)

where for b ∈ BExp(Σ ) the purpose of the function

WA(b) : State(A)→ B


WA(b)(σ) = the value in B of the Boolean expression b on the state σ.

We write out the inductive definition of the above function on the syntactic structure of b:

WA(true)(σ) = tt

WA(false)(σ) = ff

WA(r(e1 , . . . , en))(σ) = rA(VA(e1 )(σ), . . . , VA(en)(σ))

WA(not(b))(σ) =

tt if WA(b)(σ) = ff ;

ff if WA(b)(σ) = tt .

WA(and(b1 , b2 ))(σ) =

tt if WA(b1 )(σ) = tt and WA(b2 )(σ) = tt;

ff otherwise.


The function WA is one of the component functions of VA since BExp(Σ ) = ExpBool(Σ ), andthus:

V BoolA : ExpBool(Σ )→ (StateBool(A)→ B).

Notice that since the semantic functions V sA are defined simultaneously for all s ∈ S , WA (which

is, of course, V BoolA ) cannot be defined independently of the term evaluation for the other sorts.

Example We can now extend Example 16.5.2 to deal with Boolean expressions. Let

σBool(b)(σ) = tt

be the evaluation of the Boolean expression b on the state σ. The evaluation of the term

and(less than(x, zero), b)

on the state σ proceeds as follows:

WA(and(less than(x, zero), b))(σ) = ∧(WA(less than(x, zero))(σ),WA(b)(σ))

= ∧(< (VA(x)(σ),VA(zero)(σ)), σBool(b)(σ))

= ∧(< (σreal(x), 0 ), tt)

= ∧(< (1 , 0 ), tt)

= ∧(ff , tt)

= ff

16.6 Statements and Commands: First Definition

The input-output semantics for commands is given by the following functions:

M ioA : Cmd(Σ )→ (State(A)→ State(A))

where, for S ∈ Cmd(Σ ) the purpose of the function



M ioA (S )(σ) = the final state, if such a state exists, on executing a program S

on an initial state σ.

16.6.1 First Definition of Input-Output Semantics

The first definition of the input-output semantics M ioA of while programs is a formalisation of

simple ideas about the constructs. The definition is constructed by induction on the syntacticstructure of a program S :

Base Case There are two base cases:

16.6. STATEMENTS AND COMMANDS: FIRST DEFINITION 663

Identity

The skip statement does nothing: for σ ∈ State(A),

M ioA (skip)(σ) = σ.

Assignment

The assignment statement evaluates an expression and updates a variable: forσ ∈ State(A),

M ioA (x :=e)(σ) = σ[VA(e)(σ)/x ].

Induction Step We suppose that the partial functions M ioA (S0 ), M io

A (S1 ) and M ioA (S2 ) are spec-

ified on all states. There are three cases:

Composition

The composition operation executes S1 and then S2 :

M ioA (S1 ;S2 )(σ) ' M io

A (S2 )(M ioA (S1 )(σ))

More exactly, the equation means: Execute S1 on σ and if there is a final stateσ′ = M io

A (S1 )(σ) then execute S2 on this state σ′. If there is a final stateσ′′ = M io

A (S2 )(σ′) then the final state of S1 ;S2 on σ is σ′′ = M ioA (S1 ;S2 )(σ).

Otherwise, there is no final state for S1 ;S2 on σ and M ioA (S1 ;S2 )(σ) ↑.

Conditional

The conditional operation chooses to execute S1 or S2 according to the test b:

M ioA (if b then S1 else S2 fi)(σ) =

M ioA (S1 )(σ) if WA(b)(σ) = tt ;

M ioA (S2 )(σ) if WA(b)(σ) = ff .

More exactly, this equation means: If b is true on σ then execute S1 on σ andif there is a final state σ′ = M io

A (S1 )(σ) then the final state of if b then S1 else

S2 fi is σ′. If b is false on σ then execute S2 on σ and if there is a final stateσ′′ = M io

A (S2 )(σ) then the final state of if b then S1 else S2 fi is σ′′. Otherwisethere is no final state for the conditional.

Iteration

The iteration operator repeats S0 until b is false. We define the semantics of thewhile command in two cases, depending upon whether or not a computationexits the while loop.


Termination

Suppose the computation exits the while construct and halts. Then the situa-tion is characterised by:

M ioA (while b do S0 od)(σ) ↓ and M io

A (while b do S0 od)(σ) = τ

if, and only if, there exists n ≥ 0 and a sequence of states

σ0 , σ1 , . . . , σn

such that

Initial state σ0 = σ

Final state σn = τ

Iteration M ioA (S0 )(σi−1 ) ↓ and M io

A (S0 )(σi−1 ) = σi for 1 ≤ i ≤ n

Continuity WA(b)(σi) = tt for 1 ≤ i ≤ n

Exit WA(b)(σn) = ff

This sequence of states traces out precisely the stages of the computation whenthe Boolean test b is evaluated and the decision is taken either to exit the while

loop or execute the body S0 . The loop is traversed n times.

Non-termination

Otherwise, suppose that the computation does not exit the while construct.This means there is no such finite sequence and M io

A (while b do S0 od)(σ) isundefined, which we denote by

M ioA (while b do S0 od)(σ) ↑ .

16.6.2 Examples

Let us consider the input-output semantics of some trivial programs over the real numbers.Let ΣOrdered ring of reals be the signature of the ordered ring of reals, and let A be the standardΣOrdered ring of reals -algebra.

1. Consider the programy :=x

in Cmd(ΣOrdered ring of reals). We shall evaluate this program on the state σ ∈ State(A),where

σreal(x ) = 3 .142

σreal(y) = π.

16.6. STATEMENTS AND COMMANDS: FIRST DEFINITION 665

Thus,

M ioA (y :=x )(σ) = σ[VA(x )(σ)/y ]

= σ[σreal(x )/y ]

= σ[3 .142/y ].

So the program y :=x on the state σ has the effect of replacing the value of y (initiallyπ in state σ) with that of the value of x on state σ, which is 3 .142 , whilst leaving thevalues of all the other variables unchanged.

2. Consider the programS = while x>0 do x:=x+1 od

in Cmd(ΣOrdered ring of reals). which we shall evaluate over the state σ ∈ State(A), where

σ(x) = π.

Then

M ioA (S )(σ) = M io

A (while x>0 do x:=x+1 od)(σ)

= ⊥

because the evaluation of the Boolean expression will always be true, so leading to anon-convergent computation sequence

σ0 , σ1 , σ2 , . . . , σn , . . .

in which σn(x ) = π + n.

However, if we take the initial state σ, over which we evaluate S to have

σreal(x) = −1

thenM io

A (S )(σ) ↓ σas

WA(x>0)(σ) = ff

giving a computation sequence of one state σ0 = σ.

16.6.3 Non-Termination

Because while statements may not terminate, for S ∈ Cmd(Σ ), M ioA (S )(σ) may be a partial

function, and when M ioA (S )(σ) converges to a final state τ we shall write:

M ioA (S )(σ) ↓ τ

andM io

A (S )(σ) ↑if S does not terminate.


It is sometimes convenient to make M ioA (S )(σ) a total function, which is achieved by the

introduction of a special element ⊥ (read as bottom) by the following operation:

State⊥(A) = State(A) ∪ ⊥

and defining

M ioA (while b do S od)(σ) =

τ if M ioA (while b do S od)(σ) ↓ τ ;

⊥ if M ioA (while b do S od)(σ) ↑ .

In defining M ioA (S ) for S ∈ Cmd(Σ ) we have formalised conditions for final states to exist.

These conditions are straight-forward. It is important to note that listing and formalising theconditions under which final states do not exist is quite subtle.

Take the case of iteration. For σ ∈ State(A), suppose that

M ioA (while b do S0 od)(σ) ↑ .

This corresponds to the computation failing to exit the while loop. This situation arises in twoways. Consider the repeated execution of S0 starting in state σ. Either:

(i) there is an infinite sequenceσ0 , σ1 , . . . , σn , . . .

of states such that for each n ≥ 0 ,

WA(b)(σn) = tt

andMA(S0 )(σn−1 ) ↓ and σn = MA(S0 )(σn−1 ).

Here S0 terminates and on each occasion the test b is true.

(ii) Or there is a finite sequenceσ0 , σ1 , . . . , σn

of states such that for 1 ≤ i ≤ n,

WA(b)(σi) = tt ,

and for 0 ≤ i ≤ n − 1 ,

M ioA (S0 )(σi) ↓ and σi+1 = M io

A (S0 )(σi),

butM io

A (S0 )(σn) ↑ .Here S0 terminates on σ0 , σ1 , . . . , σn−1 but fails to terminate on σn (i.e., there is no σn+1 ).

The analysis of non-termination must now focus on S0 .

16.7. STATEMENTS AND COMMANDS: SECOND DEFINITION USING RECURSION667

16.7 Statements and Commands: Second Definition us-

ing Recursion

Although the first definition of the input-output semantics M ioA is based on simple intuitions,

the definition of the while construct is not as smooth as those of the other constructs. In thecase of the while construct we abandon explicit equational definitions, used for sequencing andconditional constructs, and postulate the existence of a finite or infinite sequence of states. Inlogical terms this seems clumsy and more complicated.

An alternative approach is to develop an equational definition for the case of the while

statement. Our intuitions about the processing of a while loop allow us to unfold a while loopusing an if-then-else statement: we expect that the statement

S ≡ while b do S0 od

has the same effect on a state as the statement

S ′ ≡ if b then S0 ;while b do S0 od else skip fi

which unfolds the first stage in the while loop.Now both statements S and S ′ are valid while programs and hence have a formal input-

output semantics according to the first definition. The input-output semantics of the firstdefinition are the same:

16.7.1 Lemma (Semantics of unfolded while loops)For any σ ∈ State(A),

M ioA (while b do S0 od)(σ) ' M io

A (if b then S0 ;while b do S0 od else skip fi)(σ)

These semantic observations lead to a new semantic definition for the language in which thewhile statement case in the first definition is changed into a recursion.

Let this new semantics be denoted

M recA : Cmd(Σ )→ (State(A)→ State(A))

and defined for S ∈ Cmd(Σ ) and σ ∈ State(A) by induction on the structure S as follows:For the basis cases and induction steps for sequencing and conditionals we use the same

clauses as in the first definition.For the while case we define:

M recA (while b do S0 od)(σ)

=

M recA (while b do S0 od)(M rec

A (S0 )(σ)) if WA(b)(σ) = tt ;

σ if WA(b)(σ) = ff .

This recursive definition provides for each S an equation that M recA (S ) must satisfy.

Thus, the complete mathematical definition for while programs is as follows:


M recA : Cmd(Σ )→ (State(A)→ State(A))

M recA (S )(σ) = the final state, if such a state exists, on executing

a program S on an initial state σ.M rec

A (skip)(σ) = σM rec

A (x :=e)(σ) = σ[VA(e)(σ)/x ]M rec

A (S1 ;S2 )(σ) ' M recA (S2 )(M rec

A (S1 )(σ))

M recA (if b then S1 else S2 fi)(σ) =

M recA (S1 )(σ) if WA(b)(σ) = tt ;

M recA (S2 )(σ) if WA(b)(σ) = ff .

M recA (while b do S0 od)(σ) =

M recA (while b do S0 od)(M rec

A (S0 )(σ))if WA(b)(σ) = tt ;

σ if WA(b)(σ) = ff .

By the Lemma in Section 4 we know that the state transformer M ioA satisfies the equations

generalised from S . However,

(i) How many state transformers, in addition to M ioA (S ) satisfy the equations?

(ii) Are there extra properties that allow us to characterise the function M ioA (S ) as a unique

solution of the equations?

16.8 Adding Data Types to Programming Languages

We have developed a formal definition of the syntax and semantics for the simple programminglanguage

WP(Σ )

of while programs that compute over an abstract data type with signature Σ .To compute over and implement the data type, we chose a Σ -algebra

A

and defined the input-output semantics


of every program S ∈ Cmd(Σ ) over A.Suppose we want to enhance the power of WP by adding some constructs, such as

(i) dynamic arrays, or

(ii) infinite streams.

This is trivial given our methods. As we have emphasised repeatedly, we have solved theproblem for while programming over any algebra A. In Chapter 6, we showed how to modelarrays and streams over Σ by new signatures and algebras.

16.8. ADDING DATA TYPES TO PROGRAMMING LANGUAGES 669

16.8.1 Adding Dynamic Arrays

For any Σ -algebra A we can construct the algebra

AArray

with signature ΣArray of dynamic arrays over A.So, given WP(Σ ), we can add dynamic arrays to our while programming language over Σ

simply by forming the languageWP(ΣArray).

We can obtain its semantics by applying our input-output model to ΣArray -algebra AArray .

16.8.2 Adding Infinite Streams

For any Σ -algebra A we can construct the algebra

AStream

with signature Σ Stream of infinite streams over A.So, given WP(Σ ), we can add infinite streams to our while programming language over Σ

simply by forming the languageWP(Σ Stream).

We can then obtain its semantics by applying our input-output semantics model to Σ Stream -algebra AStream .



1. Show for any state σ ∈ State(A) and x , y ∈ Var :

(a) σ[σ(x )/x ] = σ;

(b) (σ[a1/x ])[a2/x ] = σ[a2/x ]; and

(c) (σ[a1/x ])[a2/y ] = (σ[a2/y ])[a1/x ] if x 6= y .

2. Add the following constructs to the WP(Σ ) and define their semantics:

(a) concurrent assignments;

(b) case statements;

(c) repeat-until statements; and

(d) for statements.

3. Define the set Var(S ) of variables occurring in a program S . If v 6∈ Var(S ) then is it thecase that for all σ ∈ State(A)

M ioA (S )(σ)(v) = σ(v)?

4. Prove the Lemma in Section .

5. Using the semantics of while programs work out the final state for the following program:

y :=1 ;c :=1 ;while c <= x do

y :=y + c ;c :=c + 1

od

with start state σ(x) = 10 .

6. What does the above program return when the start state is σ(x) = v?

7. Using the semantics evaluate the behaviour of the following program:

y :=1 ;x :=1 ;c :=1 ;while c < z do

d :=x ;x :=x + y ;y :=d ;c :=c + 1

od

with start states σ(z) = 2 , σ(z) = 3 and σ(z) = 4 .

8. What numbers does the above program produce in terms of z?

Chapter 17

Proving Properties of Programs

Structural induction is a simple and essential technique for defining syntax, defining functionson syntax, and proving facts about syntax and its semantics. Structural induction is based onhow syntax is built up: The definitions postulate some basic syntax and generate new syntaxby repeatedly applying operations. Functions are defined, or properties proved, first for thebasic syntax and next by specifying the effect of the syntax forming operations. This type ofdefinition or proof is traditionally called

definition or proof by induction or recursion on the structure of the syntax.

In practice, the terms structural induction or recursion are applied rather loosely. Structuralinduction has a huge range of applications.

In this chapter we will reflect on structural induction and prepare the reader for its extensiveuse. There are many forms of structural induction since there are many types of syntax. Wewill introduce only a few simple principles. Although structural induction is based on syntax,it is not confined to syntax. Indeed, it originates in induction and recursion on the set N ofnatural numbers which we discussed in Chapter 7 when analysing the data type of naturalnumbers.

The syntax of interest to us is essentially that of programs. Structural induction is usedto define functions on all programs, or prove properties true of all programs. Already, wehave seen several examples of definitions of syntax by structural induction, including those ofΣ -terms over a signature Σ (Chapter 8); and expressions, Boolean expressions, and commandsfor WP(Σ ) in Chapter 8. There are plenty of functions to construct and properties to prove.

In the case of functions, the way we defined the semantics of while programs, in the lastchapter, involved structural inductions on expressions, Boolean expressions and commands.So we have already introduced and used structural induction to define the most importantfunctions in our semantics!

We will meet many examples of such definitions. For example, also definable by structuralinduction are program transformations

t : L → L

for a programming language PL, and compilers

c : L → L′

671

672 CHAPTER 17. PROVING PROPERTIES OF PROGRAMS

which map the programs of one language L to equivalent programs of another L′.However, we have yet to meet proofs of program properties based on structural induction.

In this chapter, we concentrate on proving properties common to all while programs.To prove facts about while programs, we need to be able to prove facts about expressions,

boolean expressions and commands. Thus, for WP we need three structural induction principlescorresponding with expressions, Boolean expressions and statements.

After some simple results about side effects, we prove two important theorems.The first theorem establishes the fact that:

Each computation by any while program on any Σ -algebra A takes place inside theΣ -subalgebra of A generated by the program’s input data.

This is called the Local Computation Theorem. It gives us new insight into the way programsare absolutely dependent upon their underlying data types. Among several consequences of thetheorem is the fact that:

The square root function√x on the set of real numbers cannot be computed by any

while program using the operations x + y, −x , x .y, x−1 and tests = and <.

The second theorem establishes the fact that:

If A and B are isomorphic Σ -algebras, then the input-output semantics of any while

program on A and B are isomorphic.

This fact is called the Isomorphism Invariance Theorem. It gives us further insight into theconcept of interface and implementation since it confirms that the semantics of while programsis equivalent over equivalent implementations of a data type. Among many consequences is thefact that

while computation over decimal numbers is isomorphic to while computation overbinary numbers.

17.1 Principles of Structural Induction for Programming

Language Syntax

Suppose some syntax is defined by giving some basic or atomic syntax, and then generatingnew syntax by repeatedly applying syntax-forming operations. Then this is the general idea ofstructural induction for proving properties of the syntax:

Suppose that some property P is true of the basic syntax and that if P is true ofsome syntax, then P also remains true after applying the syntax-forming operations.Then the property P is true of all the syntax.

We have defined the expressions, Boolean expressions and programs from some basic givensyntax by applying syntax-forming operations. Recall the argument about generating thenatural numbers from 0 and succ and the induction principle on numbers, and compare it withthe ideas of generating syntax and principles of induction for expressions, Boolean expressionsand programs.

17.1. PRINCIPLES OF STRUCTURAL INDUCTION FOR PROGRAMMING LANGUAGE SYNTAX673

17.1.1 Principle of Induction for Expressions

Principle of Induction for Expressions

Let Σ be a signature and Exp(Σ ) the set of all expressions over Σ . Let P be a property ofexpressions, i.e.,

P ⊆ Exp(Σ ) is a set of expressions having that property.

We write P(e) for e ∈ P.If the following two cases hold:

Base Case P(c) is true for each constant c ∈ Σ ;

P(x) is true for each variable x ∈ Var.

Induction Step Let e1 , . . ., en ∈ Exp(Σ ) be any expressions of sorts s(1 ), . . ., s(n) respec-tively. Let f : s(1 )×· · ·× s(n)→ s ∈ Σ be any function. If P(e1 ), . . ., P(en)are all true then

P(f(e1 , . . . , en))

is also true.

ThenP(e) is true for all expressions e ∈ Exp(Σ ).

We now formulate an analogous structural induction principle for Boolean expressions.

17.1.2 Principle of Structural Induction for Boolean Expressions

The Boolean expressions over a signature Σ are contained in ExpBool(Σ ); we give them specialtreatment:


Principle of Structural Induction for Boolean Expressions

Let Σ be a signature and BExp(Σ ) the set of all Boolean expressions over Σ . Let P be aproperty of Boolean expressions, i.e.,

P ⊆ BExp(Σ ) is a set of Boolean expressions having that property.

We write P(b) for b ∈ P.If the following two cases hold:

Base Case P(true) is true;P(false) is true.

Induction Step Let b, b1 , b2 ∈ BExp(Σ ) be any Boolean expressions. If P(b), P(b1 ), P(b2 )are all true then

P(not(b)) and P(and(b1 , b2 ))

are also true.Let e1 , . . ., en ∈ Exp(Σ ) be any expressions of sorts s(1 ), . . ., s(n) respec-tively. Let r : s(1 ) × · · · × s(n) → bool ∈ Σ be any relation. If P(e1 ), . . .,P(en) are all true then

P(r(e1 , . . . , en))

is also true.

ThenP(b) is true for all Boolean expressions b ∈ BExp(Σ ).

17.1.3 Principle of Structural Induction for Statements

Finally, there are the programs. The while programs are made from skip and assignments bythe application of program-forming operations of composition, conditional and iteration.

17.1. PRINCIPLES OF STRUCTURAL INDUCTION FOR PROGRAMMING LANGUAGE SYNTAX675

Principle of Induction for Statements

Let Σ be a signature and Cmd(Σ ) the set of all commands over Σ . Let P be a property ofstatements, i.e.,

P ⊆ Cmd(Σ ) is a set of statements having that property.

We write P(S ) is true for S ∈ P.If the following two cases hold:

Base Case P(skip) is true;P(x :=e) is true for each variable x ∈ Var and expression e ∈ Exp(Σ ).

Induction Step If P(S1 ) and P(S2 ) are true then

P(S1 ;S2 )

is true;if P(S1 ) and P(S2 ) are true then

P(if b then S1 else S2 fi)

is true for all b ∈ BExp(Σ );if P(S0 ) is true then

P(while b do S0 od)

is true for all b ∈ BExp(Σ ).

ThenP(S ) is true for all commands S ∈ Comm(Σ ).

17.1.4 Proving the Principles of Structural Induction

Each principle of structural induction can be proved from the Principle of Induction for thenatural numbers. That is, each principle is a theorem, the proof of which is has the form: Thetechnique is to define the height

h : Exp(Σ )→ N of expressions;

h : BExp(Σ )→ N of Boolean expressions; and

h : Cmd(Σ )→ N of commands

and transfer the induction to one in terms of the Principle of Induction for the natural numbers.


17.2 Reasoning about Side Effects Using Structural In-

duction

Now we will use the structural induction principles for expressions, Boolean expressions andcommands to prove some basic properties of their semantics. We will show that:

If a variable v does not appear in a while program S then its value on any state σis not changed by executing the program S on σ.

This and related results are not difficult but they are important. First, we must introduceconcepts to express these side-effect properties precisely.

Definition Two states σ, σ′ are equivalent over an algebra A with respect to a set of variablesV , if

for all v ∈ V , σ(v) = σ′(v);

we write thisσ 'A σ′ mod V

and often, when A is obviously involved, we will just write this as

σ ' σ′ mod V .

17.2.1 Theorem (No external side effects)Let S ∈ Cmd(Σ ). Let A be a Σ -algebra. Let y 6∈ Var(S ). Then for all states σ ∈ States(A),

M ioA (S )(σ) ↓ implies M io

A (S )(σ) ' σ mod y.

Proof. We use structural induction for Cmd(Σ ).

Base case

Case S ::= skip.We first note that S contains no variables. But, for any state σ, as M io

A (skip)(σ) = σ it isimmediate that

M ioA (skip)(σ) ↓ and M io

A (skip)(σ) ' σ mod xfor all variables x ∈ Var .

Case S ::= x :=e.Suppose that y 6∈ Var(S ). Since

M ioA (x :=e)(σ) = σ[VA(e)(σ)/x ]

entails that M ioA (x :=e)(σ) ↓ and only the value of x has been changed (Section 16.4.2) we can

say thatM io

A (x :=e)(σ) ' σ mod yfor any state σ and for all variables y ∈ Var , with y 6= x , and in particular for any y 6∈ Var(S ).

17.2. REASONING ABOUT SIDE EFFECTS USING STRUCTURAL INDUCTION 677

Induction Step

Case S ::= S1 ;S2 .Suppose that y 6∈ Var(S ), then y 6∈ Var(S1 ) and y 6∈ Var(S2 ). If M io

A (S )(σ) ↓ thenM io

A (S1 )(σ) ↓ τ and M ioA (S2 )(τ) ↓ for all terminating states σ. The Induction Hypothesis for

S1 and S2 is:

(i) M ioA (S1 )(σ) ↓ ⇒ M io

A (S1 )(σ) ' σ mod y; and

(ii) M ioA (S1 )(σ) ↓ ⇒ M io

A (S2 )(σ) ' σ mod yfor all states σ.

Choose a state M ioA (S1 )(σ) and substituting into the equation for sequential composition

we have:

M ioA (S2 )(M io

A (S1 )(σ)) ' M ioA (S1 )(σ) mod y by hypothesis (ii),

' σ mod y by hypothesis (i).

Thus, M ioA (S )(σ) = M io

A (S2 )(M ioA (S1 )(σ)) implies that

M ioA (S )(σ) = σ mod y

for any state σ.

Case S ::= if b then S1 else S2 fi.Suppose that y 6∈ Var(S ), then y 6∈ Var(S1 ) and y 6∈ Var(S2 ). If M io

A (S )(σ) ↓ thenM io

A (S1 )(σ) ↓ and M ioA (S2 )(σ) ↓. The Induction Hypothesis is:

M ioA (S1 )(σ) ↓ ⇒ M io

A (S1 )(σ) ' σ mod yand

M ioA (S2 )(σ) ↓ ⇒ M io

A (S2 )(σ) ' σ mod yfor any state σ. Since evaluating a Boolean expression does not change the state and

M ioA (S )(σ) =

M ioA (S1 )(σ) if WA(b)(σ) = tt ;

M ioA (S2 )(σ) if WA(b)(σ) = ff .

we have M ioA (S )(σ) ' σ mod y for any state σ.

Case S ::= while b do S0 od

Suppose that y 6∈ Var(S ), then y 6∈ Var(S0 ) and y 6∈ Var(b). The Inductive Hypothesis is:

M ioA (S0 )(σ) ↓ ⇒ M io

A (S0 )(σ) ' σ mod yfor any state σ.

We define a sequence of states: σ0 = σ and σi+1 = M ioA (S0 )(σi) for i = 0 , 1 , 2 , . . .. By a

simple argument using the principle of induction on N it is clear that for each i ,

σi ' σ mod y.Now consider M io

A (S )(σ) for any state σ. If M ioA (S )(σ) ↓ then such a sequence σ0 , σ1 , . . . , σn

exists, with WA(b)(σi) = tt for 0 ≤ i < n and WA(b)(σn) = ff . Since M ioA (S )(σ) = σn , we

have the required result from σn ' σ mod y.


17.2.2 Lemma (No internal side effects)Let V ⊂ Var be a set of variables. Let Σ be a signature with Cmd(Σ ) the set of commandsover Σ . For any command S ∈ Cmd(Σ ) such that Var(S ) ⊂ V , we have

(σ ' σ′ mod V )⇒ (M ioA (S )(σ) ' M io

A (S )(σ′) mod V )

for any states σ, σ′ ∈ State(A).

Proof. Left to the reader. Note the case when M ioA (S )(σ) ↑.

17.3 Local Computation Theorem and Functions that

Cannot be Programmed

Consider how a while program performs a computation. Notice that

• states store data from a Σ -algebra;

• assignment statements compute and store new data from data in states using the opera-tions named in Σ ; and

• sequencing, conditional and iteration statements schedule computation by assignments,possibly using tests which do not change the values in states.

In particular:

Only the assignment statements compute new data from old, and they do so viaexpressions over Σ .

Thus, given any while program S and input data a1 , . . . , an for its variables, the values ofthese variables in all the states of the computation contain only data constructed from a1 , . . . , an

using expressions over Σ . In particular, if there is a final state, then the

output of the computation are data constructed from a1 , . . . , an using expressionsover Σ .

In this section, we are going to formulate these observations precisely and prove themtrue. The main result is called the Local Computation Theorem and it has many importantconsequences.

17.3.1 Local Computation and Expressions

Lemma Let e ∈ Exp(Σ ) be an expression whose variables are among

v1 , . . . , vn .

Let A be any Σ -algebra. For any state σ ∈ State(A), the value VA(e)(σ) of e on σ lies in theΣ -subalgebra of A generated by

σ(v1 ), . . . , σ(vn).

In symbols, we haveVA(e)(σ) ∈ 〈σ(v1 ), . . . , σ(vn)〉.

17.3. LOCAL COMPUTATION THEOREM AND FUNCTIONS THAT CANNOT BE PROGRAMMED679

Proof We prove this by structural induction on the definition of expressions (Section 17.1.1).We will drop reference to sorts as this will not lead to confusion.

Base case There are two cases for e:Constant case e ≡ c. Then

VA(e)(σ) = VA(c)(σ)

= cA by definition of VA.

The element cA of A named by constant c belongs to every Σ -subalgebra of A. Thus,

VA(e)(σ) ∈ 〈σ(v1 ), . . . , σ(vk )〉.

Variable case e ≡ vi. Then

VA(e)(σ) = VA(vi)(σ)

= σ(vi) by definition of VA.

Since σ(vi) is one of the generators of the subalgebra, clearly

VA(e)(σ) ∈ 〈σ(v1 ), . . . , σ(vk)〉.

Induction Step There is one case for e.Function application case e ≡ f(e1 , . . . , en). As induction hypotheses, suppose that

VA(ei)(σ) ∈ 〈σ(v1 ), . . . , σ(vk)〉.

holds for the expressions ei ∈ Exp(Σ ) for 1 ≤ i ≤ n. Then

VA(e)(σ) = VA(f(e1 , . . . , en))(σ)

= f A(VA(e1 )(σ), . . . ,VA(en)(σ)) by definition of VA.

By the induction hypothesis, VA(e)(σ) is obtained by applying the operation f A to elements ofthe Σ -subalgebra 〈σ(v1 ), . . . , σ(vk)〉 of A. Any Σ -subalgebra is closed under operations, andso the value

VA(e)(σ) ∈ 〈σ(v1 ), . . . , σ(vk )〉.Since the Basis and Induction Step are true, we apply the Principle of Structural Induction

for Expressions to conclude that for all e ∈ Exp(Σ ), and any state σ ∈ State(A),

VA(e)(σ) ∈ 〈σ(v1 ), . . . , σ(vk )〉.

2


17.3.2 Local Computation and while Programs

Theorem (Local Computation) Let S ∈ WP(Σ ) be a while program whose variables areamong

v1 , . . . , vn .

Let A be any Σ -algebra. For any state σ ∈ State(A), the values of the variables in the finalstate M A

io (S )(σ), if it exists, lie in the Σ -subalgebra of A generated by

σ(v1 ), . . . , σ(vn).

In symbols, if M Aio (S )(σ) ↓, then for each v ∈ v1 , . . . , vn,

M Aio (S )(σ)(v) ∈ 〈σ(v1 ), . . . , σ(vn)〉.

Proof We prove this by structural induction on the definition of commands.

Base case There are two types of atomic statement.

Case 1 : S ≡ x :=e. Then

M Aio (S )(σ) = Mio

A(x :=e)(σ)

= σ[VA(e)(σ)/x ] by definition of M Aio .

Consider the value of the final state on each variable v ∈ v1 , . . . , vn. There are two subcasesdepending on whether v is x ∈ v1 , . . . , vn.Subcase 1a: v ≡ x . Then

M Aio (S )(σ)(v) = VA(e)(σ)by definition of variable substitution.

Since var(e) ⊆ v1 , . . . , vn, we can apply the Local Evaluation Lemma and deduce that

VA(e)(σ) ∈ 〈σ(v1 ), . . . , σ(vk)〉.

Subcase 1b: v 6≡ x . ThenM A

io (S )(σ)(v) = σ(v).

This is clearly one of the generators of 〈σ(v1 ), . . . , σ(vn)〉.

Case 2: S ≡ skip. Then

M Aio (S )(σ) = Mio

A(skip)(σ)

= σ by definition of M Aio .

Clearly, as in subcase 1b,M A

io (S )(σ)(v) = σ(v)

is one of the generators of 〈σ(v1 ), . . . , σ(vn)〉.

Induction Step The three control constructs determine three cases. As induction hypotheses,suppose that the locality property holds for subprograms S0 , S1 and S2 . That is, for any stateσ ∈ State(A),

M Aio (Si)(σ)(v) ∈ 〈σ(v1 ), . . . , σ(vn)〉.

17.3. LOCAL COMPUTATION THEOREM AND FUNCTIONS THAT CANNOT BE PROGRAMMED681

for i = 0 , 1 , 2 .

Case 3: S ≡ S1 ;S2 . Then

M Aio (S )(σ) = M A

io (S1 ;S2 )(σ)

= M Aio (S2 )(Mio

A(S1 )(σ)) by definition of M Aio .

By the induction hypothesis for S1 , we know that for each v ∈ v1 , . . . , vn,

M Aio (S1 )(σ)(v) ∈ 〈σ(v1 ), . . . , σ(vn)〉. (1 )

By the induction hypothesis for S2 , applied to the input state M Aio (S1 )(σ), we know that for

each v ∈ v1 , . . . , vn,

M Aio (S2 )(M A

io (S1 )(σ))(v) ∈ 〈M Aio (S1 )(σ)(v1 ), . . . ,M A

io (S1 )(σ)(vn)〉. (2 )

By the Local Evaluation Lemma applied to Equation 1 , we deduce that

〈M Aio (S1 )(σ)(v1 ), . . . ,M A

io (S1 )(σ)(vn)〉 ⊆ 〈σ(v1 ), . . . , σ(vn)〉

and by Equation 2 , we may conclude that

M Aio (S )(σ)(v) ∈ 〈σ(v1 ), . . . , σ(vn)〉.

Case 4: S ≡ if b then S1 else S2 fi. Then

M Aio (S )(σ) = M A

io (if b then S1 else S2 fi)(σ)

=

MioA(S1 )(σ) if WA(b)(σ) = tt ;

MioA(S2 )(σ) if WA(b)(σ) = ff .

By definition of M Aio . Clearly, there are two subcases.

Subcase 4a: WA(b)(σ) = tt . Then

MioA(S )(σ)(v) = Mio

A(S1 )(σ)(v)

which is in 〈σ(v1 ), . . . , σ(vn)〉 by the induction hypothesis for S1 .Subcase 4b: WA(b)(σ) = ff . Then

M Aio (S )(σ)(v) = M A

io (S2 )(σ)(v)

which is in 〈σ(v1 ), . . . , σ(vn)〉 by the induction hypothesis for S2 .

Case 5: S ≡ while b do S0 od. First, we consider the potentially infinite sequence

σ = σ0 , σ1 , . . . , σi , σi+1 , . . .

of states generated by applying S0 repeatedly, starting on the initial state σ. Specifically, fori ≥ 0 , we define

σi+1 ' M Aio (S )(σi).


Claim. For each i ≥ 0 , if σi exists, then

σi(v) ∈ 〈σ(v1 ), . . . , σ(vn)〉.

Proof of Claim. We prove this using the Principle of Induction on the Natural Numbers(recall Section 7.4) applied to the sequence index i ≥ 0 .

Basis: i = 0. Now σ0 (v) = σ(v) by definition of σ0 , and σ(v) is one of the generators.Induction Step: As induction hypothesis, suppose that σi exists and

σi(v) ∈ 〈σ(v1 ), . . . , σ(vn)〉. (3 )

Suppose that σi+1 = M Aio (S0 )(σi) ↓. Since we are reasoning within the scope of Case 5 of the

Structural Induction Step, by the Induction Hypothesis for statements, we have

σi+1 = M Aio (S0 )(σi) ∈ 〈σi(v1 ), . . . , σi(vn)〉. (4 )

By the Local Evaluation Lemma applied to Equation 3 , we have

〈σi(v1 ), . . . , σi(vn)〉 ⊆ 〈σ(v1 ), . . . , σ(vn)〉.

Hence, by Equation 4 , we have

σi+1 ∈ 〈σ(v1 ), . . . , σ(vn)〉.

By the Principle of Induction on the Natural Numbers, the claim holds.

Having proved the Claim, it is straightforward to conclude Case 5. Recall that

M Aio (S )(σ) ' M A

io (while b do S0 od)(σ)

' σk+1

for some k + 1 such that

(i) for all 0 ≤ i ≤ k + 1 , σi exists;

(ii) for all 0 ≤ i ≤ k , WA(b)(σi) = tt ;

(iii) WA(b)(σk+1 ) = ff .

By the Claim,M A

io (S )(σ) ∈ 〈σ(v1 ), . . . , σ(vn)〉.

Having verified all cases of the Basis and Induction Steps, by the Principle of StructuralInduction for Statements, we deduce that the theorem is true for all programs. 2

17.4. INVARIANCE OF SEMANTICS 683

17.3.3 Local Computation and Functions

Corollary Let A be any Σ -algebra. Let

f A : Aw → A

be a function on A. If f A is computable by a while program over Σ , then for each (a1 , . . .,an) ∈ Aw ,

f A(a1 , . . . , an) ∈ 〈a1 , . . . , an〉.Consider the field of real numbers with signature:

signature Reals

sorts real , bool

constants 0 , 1 : → realtt ,ff : → bool

operations + : real × real → real− : real → real. : real × real → real−1 : real → realeq : real × real → boolless than : real × real → bool

Theorem Let R be the standard ΣReals-algebra. Then the partial function F : R ; R definedby

F (x ) = +√

x

is not while computable over R.

Proof Suppose, for a contradiction, that there was a program S ∈WP(Σ ) that computed F .Then, by the Corollary, for every x ∈ R,

F (x ) ∈ 〈x 〉.

Consider F (2 ). Now 〈2 〉 is a two-sorted ΣReals-subalgebra of R whose carriers are as follows:

〈2 〉real = Q

〈2 〉bool = B.

By the irrationality of√

2, we know that F (2 ) =√

2 6∈ Q = 〈2 〉. This contradicts the existenceof the while program that computes F . 2

17.4 Invariance of Semantics

Our definition, in Chapter 16, of the input-output semantics of while programs on any algebramodels the input-output behaviour of any while program on any implementation of any datatype.


Now, in Chapter 7, we learned the following. A many-sorted algebra A of signature Σ is amodel of a concrete implementation of a data type. A class of many-sorted algebras of signatureΣ is a model of a data type. If two Σ -algebras A and B are isomorphic then this models theirequivalence as implementations of a data type. A class of many-sorted algebras of signature Σclosed under isomorphism is a model of an abstract data type.

What is abstract about an abstract data type is the idea that computations with dataneed not depend on the way data is implemented. Equivalent implementations provide equiv-alent program behaviour. We will now make this idea precise by combining our input-outputsemantics with our algebraic models of data types. We ask the following question:

Suppose A and B are isomorphic Σ -algebras. Is computation by a while programS ∈WP(Σ ) on A equivalent with computation by S on B?

The computations should be equivalent. For example, in the context of the abstract data typeof natural numbers, the general question includes the simple question:

Given isomorphic algebras Ndec and Nbin of decimal and binary representations ofthe natural numbers, do while programs — such as Euclid’s algorithm — haveequivalent semantic behaviour on Ndec and Nbin?

We will formulate and prove the Isomorphism Invariance Theorem for the semantics, that

for any while program S ∈ WP(Σ ), if A and B are isomorphic Σ -algebras, thenthe input-output semantics of S on A and B are isomorphic.

We make precise what we mean by the

equivalence of computations

by a program, or more specifically for the main theorem, the

isomorphism of two input-output semantics

for while programs.The starting point is to compare the states of two computations. Let us explore the problem

by looking at the natural number data type and the question about Ndec and Nbin above.

17.4.1 Example of the Natural Numbers

Recall the two-sorted signature of the natural numbers of Section 7.1:

signature Euclidean Algorithm

sorts nat , bool


operations mod : nat × nat → nateq : nat × nat → bool


Consider two specific implementations of the data type of natural numbers:

(i) a decimal interpretation Ndec of the natural numbers; and

(ii) a binary interpretation Nbin of the natural numbers.

Let E ∈ Cmd(Σ ) be the while program for Euclid’s algorithm:

z:=x mod y;while z 6= 0 do

x:=y;y:=z;z:=x mod y

od

For an input state σ with σnat(x) = m and σnat(y) = n, the program returns gcd(m, n) asthe value of y.

Consider the execution of E over the natural numbers under the two implementations ofthis data type, Ndec and Nbin .

Consider case (i) of executing the program E over a decimal implementation Ndec of thenatural numbers, starting from a given state σdec ∈ State(Ndec) with

σdec(x) = 45 and σdec(y) = 12 .

Step Number σdec(x) σdec(y) σdec(z) Comment1 45 12 ? Initial state2 45 12 9 First assignment3 12 12 9 Entered loop4 12 9 95 12 9 36 9 9 3 Re-enter loop7 9 3 38 9 3 0 Exit loop

Then we have thatM dec

io (E )(σdec) = τdec

whereτdec(x) = 9 τdec(y) = 3 τdec(x) = 0

Consider case (ii) of executing the program E over a binary implementation Nbin of thenatural numbers, starting from a given state σbin ∈ State(Nbin) with

σbin(x) = 101101 and σbin(y) = 1100 .

Step Number σbin(x) σbin(y) σbin(z) Comment1 101101 1100 ? Initial state2 101101 1100 1001 First assignment3 1100 1100 1001 Entered loop4 1100 1001 10015 1100 1001 116 1001 1001 11 Re-enter loop7 1001 11 118 1001 11 0 Exit loop


Then we have thatM bin

io (E )(σbin) = τbin

whereτbin(x) = 1001 τbin(y) = 11 τbin(x) = 0 .

Clearly, the states correspond exactly in the sense that decimal to binary conversion appliedto the values in decimal output state τdec results in binary output state τbin .

Ifφ : Ndec → Nbin

is the decimal to binary conversion ΣEuclidean Algorithm -isomorphism, then we can construct aconversion map

φ : State(Ndec)→ State(Nbin)

for states such thatφ(σdec) = σbin

whereφ(σnat)(v) = φ(σnat(v))

and soφ(τdec) = τbin .

17.4.2 Isomorphic State Spaces

Let Σ be an S -sorted signature and let A and B be Σ -algebras. Let φ : A → B be a Σ -map.This induces a map

φ : State(A)→ State(B)

defined as follows.Recall that the state space State(A) is a family

State(A) = 〈States(A) | s ∈ S 〉

and a state σ ∈ State(A) is a family

σ = 〈σs | s ∈ S 〉

where the s-sorted state σs ∈ States(A).Furthermore, the map φ is a family

φ = 〈φs | s ∈ S 〉

where φs : As → Bs . The state map φ is a family

φ = 〈φs | s ∈ S 〉

where φs : States(A)→ States(B).For any σs ∈ States(A) and x ∈ Var s , we define φs by

φs(σs)(x ) = φs(σs(x )).

Suppose Var s = x1 , x2 , x3 , . . . and let σs ∈ States(A) be depicted by:


Var s x1 x2 x3 . . . xi . . . xj . . .σs a1 a2 a3 . . . ai . . . aj . . .

then we depict φs(σs) ∈ States(B) as follows:

Var s x1 x2 x3 . . . xi . . . xj . . .

φs(σs) φs(a1 ) φs(a2 ) φs(a3 ) . . . φs(ai) . . . φs(aj ) . . .

Lemma If the map φ is injective, surjective or bijective, then φ is injective, surjective orbijective, respectively.

Proof. Suppose φ is injective. Let σ, σ′ ∈ State(A) and

φ(σ) = φ(σ′).

Then for each s ∈ S , for all σs ∈ States(A) and x ∈ Var s ,

φs(σs)(x ) = φs(σ′s)(x )

and by definition of φs ,φs(σs(x )) = φs(σ

′s(x )).

Since φs is injective, we have for all x ∈ Var s ,

σs(x ) = σ′s(x )

and henceσs = σ′

s .

Thus, φs is injective for each s ∈ S and φ is injective.Suppose φ is surjective. For any τ ∈ State(B) we will construct σ ∈ State(A) such that

φ(σ) = τ . Let s ∈ S and x ∈ Var s and suppose

τs(x ) = b ∈ Bs .

Since φs is surjective there exists a ∈ As such that

φs(a) = b.

For each s ∈ S and x ∈ Var s we define σs(x ) = a. Then for each x ∈ Var s ,

φ(σ)(x ) = φs(σs(x ))

= φs(σs(a))

= b

= τs(x )

= τ(x ).

Combining these cases gives the case that φ is bijective.


17.4.3 Isomorphism Invariance Theorem

The correspondence between computations on A and B under the isomorphic translation φ :A→ B is formalised by:

Theorem (Input-Output Semantics) Let A and B be Σ -structures and φ : A → B a Σ -isomorphism. Let φ : State(A)→ State(B) be the induced bijection between state spaces. Thenfor each statement S ∈ Cmd(Σ ) and for every state σ ∈ State(A),

φ(M ioA (S )(σ)) ' M io

B (S )(φ(σ)).

Alternatively, the following diagram commutes:

State(A)M io

A (S)

M ioB (S)

State(A)

State(B) State(B)

φφ

As usual since the semantics of statements are constructed from the semantics of expressionsand Boolean expressions, we will begin the proof by proving the algebraic invariance of thesemantics of expressions and Boolean expressions. Each proof is by structural induction usingthe relevant principles from Chapter ??.

Lemma (Isomorphism Invariance for Expressions) For each expression e ∈ Exp(Σ ) andfor every state σ ∈ State(A) we have

φ(VA(e)(σ)) = VB(e)(φ(σ)).

Proof. We prove this by structural induction on the definition of expressions (recall Sec-tion 17.1.1). We will drop the reference to sorts as this will not lead to confusion.Base case There are two cases:Constant case e ≡ c.

φ(VA(c)(σ)) = φ(cA) by definition of VA,

= cB as φ preserves constants,

= VB(c)(φ(σ)) by definition of VB .

Variable case e ≡ x.

φ(VA(x)(σ)) = φ(σ(x)) by definition of VA,

= VB(x)(φ(σ)) by definition of VB .

Induction Step As induction hypotheses, suppose that

φ(VA(ei)(σ)) = VB(ei)(φ(σ))


holds for the expressions ei ∈ Exp(Σ ) for 1 ≤ i ≤ n. Then for

e ≡ f(e1 , . . . , en),

φ(VA(f (e1 , . . . , en))(σ))= φ(f A(VA(e1 )(σ), . . . ,VA(en)(σ))) by definition of VA

= f B(φ(VA(e1 )(σ)), . . . , φ(VA(en)(σ))) as φ preserves the operation named by f

= f B(VB(e1 )(φ(σ)), . . . ,VB(en)(φ(σ))) by Induction Hypothesis

= VB(f (e1 , . . . , en))(φ(σ)) by definition of VB .

Lemma (Isomorphism Invariance for Boolean Expressions) For each Boolean expres-sion b ∈ BExp(Σ ) and for every state σ ∈ State(A) we have

φ(WA(b)(σ)) = WB(b)(φ(σ)).

Proof. We prove this by structural induction on the definition of Boolean expressions (recallSection 17.1.2).Base case There are two base cases:Case b ≡ true.

φ(WA(true)(σ)) = tt by definition of WA,

= WB(true)(φ(σ)) by definition of WB .

Case b ≡ false.

φ(WA(false)(σ)) = ff by definition of WA,

= WB(false)(φ(σ)) by definition of WB .

Induction Step There are three cases. As induction hypothesis, suppose that

φ(WA(bi)(σ)) = WB(bi)(φ(σ))

holds for the Boolean expressions bi ∈ BExp(Σ ) for i = 0 , 1 , 2 . Then:Case b ≡ not(b).

φ(WA((not(b0 ))(σ))) =

tt if φ(WA(b0 )(σ)) = ff ,

ff if φ(WA(b0 )(σ)) = tt ,by definition of WA,

=

tt if WB(b0 )(φ(σ)) = ff ,

ff if WB(b0 )(φ(σ)) = tt ,by Induction Hypothesis,

= WB(not(b0 ))(σ) by definition of WB .


Case b ≡ and(b1 ,b2 ).

φ(WA(and(b1 ,b2 ))(σ))

=

tt if φ(WA(b)(σ)) = tt , and φ(WA(b)(σ)) = tt ,

ff otherwise,by definition of WA,

=

tt if WB(b)(φ(σ)) = tt and WB(b)(φ(σ)) = tt ,

ff otherwise,by Induction Hypothesis,

= WB(and(b1 , b2 ))(φ(σ)) by definition of WB .

Case b ≡ r(e1 , . . . , en).

φ(WA(r(e1 , . . . , en))(σ))= φ(rA(VA(e1 )(σ), . . . ,VA(en)(σ))) by definition of WA

= rB(φ(VB(e1 )(σ)), . . . , φ(VB(en)(σ))) as φ preserves relation named by r

= rB(VB(e1 )(φ(σ)), . . . ,VB(en)(φ(σ))) by the Isomorphism Invariance for Expressions Lemma in Section 17.4.3

= WB(r(e1 , . . . , en))(φ(σ)) by definition of WB .

We will use these results to prove the invariance result for commands.

Theorem (Isomorphism Invariance Theorem) For each statement S ∈ Cmd(Σ ) and forevery state σ ∈ State(A) we have


B (S )(φ(σ)).

Proof. We prove this by structural induction on the definition of commands (recall Section 17.1.3).

Base case There are two base cases.

Identity case S ≡ skip.

φ(M ioA (skip)(σ)) = φ(σ) by definition of M io

A

= M ioB (skip)(φ(σ)). by definition of M io

B .

Assignment case S ≡ x :=e.

φ(M ioA (x :=e)(σ)) ' φ(σ[VA(e)(σ)/x ]) by definition of M io

A ,

' (φ(σ))[φ(VA(e)(σ))/x ] by Exercise ??(3),

' (φ(σ))[VB(e)(φ(σ))/x ] by the Isomorphism Invariance for Expressions Lemma in Section 17.4.3,

' M ioB (x :=e)(φ(σ)) by definition of M io

B .

Induction Step There are three cases. As induction hypotheses, suppose that

φ(M ioA (Si)(σ)) ' M io

B (Si)(φ(σ))


holds for the statements Si ∈ Cmd(Σ ) for i = 0 , 1 , 2 . Then:

Sequencing case S ≡ S1 ;S2 .

φ(M ioA (S1 ;S2 )(σ)) ' φ(M io

A (S2 )(M ioA (S1 )(σ))) by definition of M io

A

' M ioB (S2 )(φ(M io

A (S1 )(σ))) by Induction Hypothesis on S2

' M ioB (S2 )(M io

B (S1 )(φ(σ))) by Induction Hypothesis on S1

' M ioB (S1 ;S2 )(φ(σ)) by definition of WB .

Conditional case S ≡ if b then S1 else S2 fi.

φ(MA(S )(σ))

'

φ(MA(S1 )(σ)) if WA(b)(σ) = tt ,

φ(MA(S2 )(σ)) if WA(b)(σ) = ff ,by definition of MA,

'

MB(S1 )(φ(σ)) if WA(b)(σ) = tt ,

MB(S2 )(φ(σ)) if WA(b)(σ) = ff ,

by Induction Hypothesis on S1 ,by induction Hypothesis on S2 ,

'

MB(S1 )(φ(σ)) if WB(b)(φ(σ)) = tt ,

MB(S2 )(φ(σ)) if WB(b)(φ(σ)) = ff ,by the Isomorphism Invariance Lemma for Boolean Expressions in Section 17.4.3,

' MB(S )(φ(σ)) by definition of MB .

Iteration case S ≡ while b do S0 od.First, we consider the iteration of the command S0 ∈ Cmd(Σ ) on A and B . For σ ∈ State(A)we define the iteration sequence

σ0 , σ1 , . . . , σn , . . .

of states byσ0 = σ and σn ' M io

A (S0 )(σn−1 ).

For τ ∈ State(B) we define the iteration sequence

τ0 , τ1 , . . . , τn , . . .

of states byτ0 = τ and τn ' M io

A (S0 )(τn−1 ).

Note that either of these sequences can be finite because S0 need not terminate.

Claim

If τ = φ(σ) then for all n ≥ 0τn ' φ(σn).


Proof. This is proved by induction on n. The basis case n = 0 is true by assumption.As Induction Hypothesis, suppose that

τn−1 ' φ(σn−1 ).

This means either τn−1 and σn−1 exist and τn−1 = φ(σn−1 ), or τn−1 and σn−1 do not exist.Then

τn ' M ioB (S0 )(τn−1 ) by definition,

' M ioB (S0 )(φ(σn−1 )) by induction hypothesis on n,

' φ(M ioA (S0 )(σn−1 )) by induction hypothesis on S0 .

' φ(σn)

This proves the claim.

Now we consider the iteration of S0 in the context of the while loop. There are two cases:

Termination case

Suppose M ioA (S ) ↓. Then the sequence

σ0 , σ1 , . . . , σn

of iterates of S0 on σ over A is finite and

WA(b)(σi) = tt 1 ≤ i ≤ n − 1 and WA(b)(σn) = ff .

On applying φ to these Boolean conditions we get

φ(WA(b)(σi)) = tt 1 ≤ i ≤ n − 1 and φ(WA(b)(σn)) = ff

since φ(tt) = tt and φ(ff ) = ff . By the Isomorphism Invariance Lemma for Boolean Expressionsin Section 17.4.3, we have

WB(b)(φ(σi)) = tt 1 ≤ i ≤ n − 1 and WB(b)(φ(σn)) = ff .

Comparing with the sequenceτ0 , τ1 , . . . , τn

of iterates of S0 on τ = φ(σ) over B , by the Claim, we get that

WB(b)(τi) = tt 1 ≤ i ≤ n − 1 and WB(b)(τn) = ff

andM io

B (S )(φ(σ)) ' τn ' φ(σ) ' φ(M ioA (S )(σ)).

This concludes the termination case.


Non-termination case

Suppose M ioA (S ) ↑. Then the sequence

σ0 , σ1 , . . . , σn , . . .

of iterates of S0 on σ over A is infinite and

WA(b)(σi) = tt for all i ≥ 0 .

Once again, applying φ and the Isomorphism Invariance Lemma for Boolean Expressions inSection 17.4.3, we get

WB(b)(φ(σi)) = tt for all i ≥ 0 .

Comparing with the sequenceτ0 , τ1 , . . . , τn , . . .

of iterates of S0 on τ = φ(σ) over B , using the Claim, gives

WB(b)(τi) = tt for all i ≥ 0

andM io

B (S )(φ(σ)) ↑ .Hence,


B (S )(φ(σ)).

17.4.4 Isomorphism Invariance and Program Equivalence

There are several conceptions of when two programs may be considered equivalent. Normally,they involve performing a common task, like computing a function.

Definition Let S , S ′ ∈ Cmd(Σ ) and U ,V ⊆ Var . We say that S and S ′ are equivalent onState(A), with respect to input variables U and output variables V , if for all σ ∈ State(A) andfor all x ∈ U , σ(x ) = σ′(x ) implies

M ioA (S )(σ)(y) ' M io

A (S ′)(σ)(y)

for all y ∈ V . We write S ≡A S ′ mod U ,V .

For example, the Euclidean program E for gcd ,

E ∼=N E mod x , y, y.

In general, S and S ′ in Cmd(Σ ) may be equivalent on some Σ -algebra A but they need notbe equivalent on another Σ -algebra B .

Lemma Suppose A is isomorphic to B. Then S ≡A S ′ mod U ,V if, and only if, S ≡B S ′

mod U ,V .


Proof. Let φ : A→ B be an isomorphism, and

φ : State(A)→ State(B)

be the corresponding bijection between state spaces. Suppose S ≡A S ′ mod U ,V . For anyτ, τ ′ ∈ State(B) there are unique σ, σ′ ∈ State(A) such that

φ(σ) = τ and φ(σ′) = τ ′

Ifτ ' τ ′ mod U

thenφ(σ) ' φ(σ′) mod U

andσ ' σ′ mod U

since φ is injective. Now, since S ≡A S ′ mod U ,V ,

MA(S )(σ) ' MA(S ′)(σ) mod V .

Applying φ to both sides, we have

φ(MA(S )(σ)) ' φ(MA(S ′)(σ)) mod V .

By the Isomorphism Invariance Theorem,

MB(S )(φ(σ)) ' MB(S ′)(φ(σ)) mod V

and soMB(S )(τ) ' MB(S ′)(τ) mod V .

Since τ was chosen arbitrarily, the last equation means that S ≡B S ′ mod U ,V .

Definition Let P be a property of Cmd(Σ ), i.e.,

P ⊆ Cmd(Σ ).

Let U ,V ⊆ Var . We say that P is A-semantic with respect to U ,V , if

P(S ) holds and S ≡A S ′ mod U ,V implies P(S ′) holds.

Lemma If P is A-semantic with respect to U ,V and A is isomorphic with B then P is B-semantic with respect to U ,V .

Proof. If P is A-semantic with respect to U ,V , then P(S ) holds and S ≡A S ′ mod U ,Vimplies that P(S ′) holds. As A ∼= B , from Lemma 17.4.4, S ≡A S ′ mod U ,V if, and onlyif, S ≡B S ′ mod U ,V . So we have that P(S ) holds and S ≡B S ′ implies that P(S ′) holds.Thus, P is B -semantic if P is A semantic and A ∼= B .

17.5. PERFORMANCE MEASURES 695

17.4.5 Review of Terms

To appear.

17.5 Performance Measures

The aim of this assignment is to define a method for measuring or estimating the cost of acomputation by a while program. Let N+ = 1 , 2 , . . .. We want to define a function

λ : Cmd(Σ )→ (State(A)→ N+)

such that, for S ∈ Cmd(Σ ),λ(S ) : State(A)→ N+

computes the resources needed for the computation, i.e.,

λ(S )(σ) = the cost of executing S in state σ.

The costs could measure the time or space used in a computation. The function λ(S ) may bea partial function and we require for any S and σ

λ(S )(σ) ↓ if, and only if, M ioA (S )(σ) ↓ .

The strategy for measuring the costs of a computation follows that for defining semantics.We must determine the costs associated with data and states; operations and tests on states;and control and sequencing of actions in commands.

17.5.1 Performance of Data

We begin by modelling the costs of the basic operations on data. Let A be a many sortedalgebra of signature Σ . To measure the cost of using the constants and operations of A, wedefine the concept of a performance measure p which consists of the following:

(i) For each constant symbolc :→ s

of Σ naming an elementcA :→ As

of A, the performance measure ρ assumes a fixed charge of

cA,p :→ N+.

(ii) For each function symbolf : w → s

of Σ naming an operationf A : Aw → As

of A, the performance measure ρ assumes a cost function

f A,p : Aw → N+


to compute the number of units charged. Specifically, the cost for any argument a1 ∈As(1 ), . . ., an ∈ As(n) to compute

f A(a1 , . . . , an) ∈ As

isf A,p(a1 , . . . , an) ∈ N+

units.

An important example of a performance measure p for any algebra A is called A-time inwhich data and operations have unit costs. More precisely, this is defined by:

• cA,p = 1 for each constant cA; and

• f A,p(a1 , . . . , an) = 1 for each operation f A on each argument (a1 , . . . , an).

Here the data type A is considered without reference to any implementation: the operations of Aare seen as atomic and indivisible and therefore define units for evaluating program performancebased on A. This performance measure is appropriate whenever we regard the algebra A asmodelling an independent and autonomous level of abstraction for data.

We can define costs for the operations Succ, + and × and the tests of equality and orderon the natural numbers.

17.5.2 Expressions

We can Use structural induction on expressions to define the performance for computing ex-pressions:

λA,pExp : Exp(Σ )→ (State(A)→ N+)

such that for e ∈ Exp(Σ ),λA,pExp(e) : State(A)→ N+

computes the resources needed for evaluating an expression e in state σ, i.e.,

λA,pExp(e)(σ) = the cost of evaluating expression e in state σ accordingto performance measure p.

The cost can be defined to cover parallel and sequential evaluation of subexpressions.

17.5.3 Tests

We can use structural induction on Boolean expressions to define the performance measure fortests

λA,pBExp : BExp(Σ )→ (State(A)→ N+)

such that for b ∈ BExp(Σ ),λA,pBExp(b) : State(A)→ N+

computes the resources needed for evaluating the Boolean expression b in state σ, i.e.,

λA,pBExp(b)(σ) = the cost of evaluating the Boolean expression b in stateσ as measured by performance measure p.


17.5.4 Performance of Programs

We can use structural induction on commands to define the performance function for commands:

λA,p : Comm(Σ )→ (State(A)→ N+)

such that for S ∈ Cmd(Σ ),λA,p(S ) : State(A)→ N+

computes the resources needed for evaluating the command S in state σ, so

λA,p(S)(σ) = the cost of evaluating the command S on state σ as mea-sured by performance measure p.

Note this involves the definition of the input-output semantics function M Aio .

Let Var(S ) ⊆ V be the set of variables appearing in the command S . We can prove thatfor all states σ and σ′ with σ ' σ′ mod Var(S ):

λA,p(S )(σ) = λA,p(S )(σ′).

We can prove that if y 6∈ Var(S ) then for all states σ and σ ′ with σ ' σ′ mod Var(S )−y:

λA,p(S )(σ) = λA,p(S )(σ′).

We can prove that if a variable x does not occur in the right-hand side of an expression, orin a boolean expression then for all states σ and σ′ with σ ' σ′ mod Var(S )− x:

λA,p(S )(σ) = λA,p(S )(σ′).



1. Prove structural induction on signatures implies induction on natural numbers, and hencecomplete the proof of Induction on Terms Theorem in Section 8.2.4.

2. Prove that if a variable x does not appear in the left-hand side of an assignment in Sthen:

M ioA (S )(σ) ↓⇒ M io

A (S )(σ)(x ) = σ(x ).

3. Prove that if x occurs only on the left hand side of a single assignment x :=e in S and allvariables y ∈ var(e) appear nowhere on the left hand side of an assignment in S then:

VA(e)(σ) = M ioA (S )(σ)(x )

if M ioA (S )(σ) terminates.

4. Use the Local Computation Theorem to prove that the function f : N→ N defined by

f (n) = n + 1

cannot be computed by any while program over the algebra

A = (N,B; 0 ; n.− 1 ,=N,=B).

5. Consider the algebraA = (N,B; 0 , 1 ; 2n,=N,=B).

Which of the following functions f : N → N are, or are not, while computable on A?Justify your answers.

(a) f (x ) = 3 ;

(b) f (x ) = 3x ;

(c) f (x ) = 4x 2 ;

(d) f (x ) = 6x 3 ;

(e) f (x ) = 2 nx n ; and

(f ) f (n, x ) = 2 nx n .

6. Evaluate the program E

(a) over the decimal implementation Ndec on the start state σ0 with

σ(x ) = 10 and σ(y) = 5 ;

(b) over the binary implementation Nbin on the start state σ0 with

σ(x ) = 1010 and σ(y) = 101 .

7. Show thatφ(σ[a/x ]) = φ(σ)[φ(a)/x ]

holds for any state σ : Var → A, any program variable x ∈ Var and any value a ∈ A.


8. Write out the trace of the execution of the program P ∈ Cmd(Σ ):

x:=1;y:=x+1;if x=y then

skip

else

y:=x+yfi

on an initial state σ0

(a) over a decimal implementation Ndec of the natural numbers where σ0 (x) = 4 andσ0 (x) = 3 ; and

(b) over a binary implementation Nbin of the natural numbers where σ0 (x) = 100 andσ0 (x) = 11 .

9. Does the Isomorphism Invariance Theorem of Section 17.4.3 still hold if it is assumedthat φ : A→ B is a Σ -homomorphism.

10. Does the Isomorphism Invariance Theorem of Section 17.4.3 hold if we add the followingconstructs to the while language:




(d) for statements.

11. Define the cost λ(S ) for a program that involves




(d) for statements.

Chapter 18

Operational Semantics

Incomplete DraftThe input-output semantics of Chapter 16 gives an abstract program behaviour by defining

the output state of a computation as the semantics of a program. In this chapter, we extendthe input-output semantics to give a new semantics that defines the operation of the programat every step in making computations.

In this way, we can see how a program behaves over time as it executes, just as we wouldwith a debugger or program animator. So now we will have the semantics of a program givinga sequence of states instead of a single output state. Thus, if we execute a program P on aninitial state σ0 , we will either get a finite sequence

σ0 , σ1 , . . . , σt

of states if it terminates, or else an infinite sequence

σ0 , σ1 , . . . , σt , . . .

of states if it does not.These state sequences represent execution traces. The state σt is the state of the computa-

tion produced by a program at some time cycle t . We can recover the input-output behaviourof a program by returning the last state (if it exists) of the computation of a finite sequence.

Thus, to define the operational semantics of a language, we need to define a function

Comp : Prog(Σ )× State(A)× Time → State(A)

to enumerate, by time, the sequence of states produced by a computation, such that

Comp(P , σ, t)

gives the state produced by the computation of a program P on a state σ at time t . Thus, thestate

σt = Comp(P , σ0 , t)

is that which is produced by executing the program P on the initial state σ0 for t time cycles.An execution trace is the finite

Comp(P , σ0 , 0 ),Comp(P , σ0 , 1 ), . . . ,Comp(P , σ0 , t)

or infiniteComp(P , σ0 , 0 ),Comp(P , σ0 , 1 ), . . . ,Comp(P , σ0 , t), . . .

sequence of states produced by this function, depending on whether P terminates on σ0 or not.

701

702 CHAPTER 18. OPERATIONAL SEMANTICS

Problem of Operational Semantics To give a precise mathematical definition of Comp.

We shall consider three different methods of defining Comp: we define

(i) execution trace semantics in Section 18.1;

(ii) structural operational semantics in Section 18.2; and

(iii) algebraic operational semantics in an equational manner in Section 18.3.

DISCUSSION ON LEVEL OF ABSTRACTION... CURRENTLY HAVE TIMETO EXECUTE TESTS IN AOS BUT NONE OTHERS

18.1 Execution Trace Semantics

Input-output semantics concentrates on the output of execution. Operational semantics, on theother hand, considers the evolution of program execution One way of viewing this evolution isvia an execution trace. Given an initial state σ0 , the execution trace of program S on the stateσ0 will either be finite:

σ0 , σ1 , . . . , σn

or infinite:σ0 , σ1 , . . . , σn , . . .

depending on whether the program S terminates or not, on starting from σ0 .We shall define a function

M opA (S ) : State(A)→ Seq(State(A))

to give the execution traceM op

A (S )(σ) ∈ Seq(State(A))

of a program M opA (S ) from a state σ. Thus, we can determine the state Comp(P , σ, t) produced

at time t by executing program P on state σ by extracting the t th element of the sequence ofstates produced by M op

A :Comp(P , σ, t) = nth(M op

A (P , σ), t)

Given an operational semantics for a program S , we can easily define its input-outputsemantics by extracting the final state (if it exists) from the execution trace:

M ioA (S )(σ0 ) =

σn if M opA (S )(σ0 ) = σ0 , σ1 , . . . , σn ;

⊥ if M opA (S )(σ0 ) = σ0 , σ1 , . . . , σn , . . ..

18.1. EXECUTION TRACE SEMANTICS 703

18.1.1 Execution Traces

Given a model State(A) of states storing values from some data type A, we define the set

Seq(State(A))

of execution traces to be all possible finite sequences of the form:

σ0 , σ1 , . . . , σn

and all possible infinite sequences of the form:

σ0 , σ1 , . . . , σn , . . .

Thus, the set

Seq(State(A)) = σ0 , σ1 , . . . , σn | σi ∈ State(A) ∪ σ0 , σ1 , . . . , σn , . . . | σi ∈ State(A)

of execution traces is the set of all possible finite and infinite sequences of states.

Extracting Particular States

We define a projection function

nth : Seq(State(A))× N→ State⊥(A)

on state sequences such that nth(σ, i) extracts the i th element:

nth(σ0 , σ1 , . . . , σn , i) =

σi if i ≤ n;

⊥ otherwise.

nth(σ0 , σ1 , . . . , σn , . . . , i) = σi .

We define a functionFinal : Seq(State(A))→ State⊥(A)

so that Final(σ) gives the final state of an execution trace σ if it exists:

Final(σ0 , . . . , σn) = σn

and otherwise, we flag its non-existence with the element ⊥ ∈ State⊥(A):

Final(σ0 , . . . , σn , . . .) = ⊥.

Thus, we can define the relationship between input-output semantics and operational semantics:

M ioA (S )(σ) = Final(M op

A (S )(σ))


Concatenating Execution Traces

In order to manipulate execution traces, we introduce a concatenation operator

: Seq(State(A))× Seq(State(A))→ Seq(State(A))

on execution traces. The concatenation of finite execution traces

σ0 , . . . , σm

andσ′

0 , . . . , σ′n

is, as we would expect, simply

σ0 , . . . , σm σ′0 , . . . , σ

′n = σ0 , . . . , σm , σ

′0 , . . . , σ

′n .

We have to exercise a little care though when infinite execution traces are involved. If thefirst execution trace is finite:

σ0 , . . . , σm

but the second is infinite,σ′

0 , . . . , σ′n , . . .

the execution trace that results will be infinite:

σ0 , . . . , σm σ′0 , . . . , σ

′n , . . . = σ0 , . . . , σm , σ

′0 , . . . , σ

′n , . . .

If the first execution trace is infinite:

σ0 , . . . , σm , . . .

then the second execution trace will be immaterial. If the second trace is finite

σ′0 , . . . , σ

′n

the execution trace that results from concatenation is simply the first:

σ0 , . . . , σm , . . . σ′0 , . . . , σ

′n = σ0 , . . . , σm , . . .

and if the second trace is infiniteσ′

0 , . . . , σ′n , . . .

the execution trace that results from concatenation is similarly just the first:

σ0 , . . . , σm , . . . σ′0 , . . . , σ

′n , . . . = σ0 , . . . , σm , . . .

Thus, for any computation traces σ, σ′ ∈ Seq(State(A)), we define the concatenation

σ σ′

of execution traces by:

σ σ′ =

σ if σ = σ0 , . . . , σm , . . .;

σ0 , . . . , σm , σ′ if σ = σ0 , . . . , σm .


Algebra of Execution Traces

algebra Execution Traces

import State⊥(A),N

carriers Seq(State(A)) = σ0 , σ1 , . . . , σn | σi ∈ State(A)∪ σ0 , σ1 , . . . , σn , . . . | σi ∈ State(A)

constants

operations nth : Seq(State(A))× N→ State⊥(A)final : Seq(State(A))→ State⊥(A) : Seq(State(A))× Seq(State(A))→ Seq(State(A))

definitions

nth(σ0 , σ1 , . . . , σn , i) =

σi if i ≤ n;

⊥ otherwise.

nth(σ0 , σ1 , . . . , σn , . . . , i) = σi

Final(σ0 , . . . , σn) = σn

Final(σ0 , . . . , σn , . . .) = ⊥

σ σ′ =

σ if σ = σ0 , . . . , σm , . . .;

σ0 , . . . , σm , σ′ if σ = σ0 , . . . , σm .

18.1.2 Operational Semantics of while programs

We define the operational semantics of while programs by induction on their syntax.

Base Case

As is the case for all the execution traces, the first state in the sequence is the initial state. Forthe atomic programs, the second, and final state is that given by the input-output semanticsof Chapter 16.

Identity Executing a skip statement on a state σ does not affect the state σ:

M opA (skip)(σ) = σ σ

Assignment Executing an assignment statement x :=e on a state σ just updates the valuestored in the variable x with the result VA(e)(σ) of evaluating the expression e on the state σ:

M opA (x :=e)(σ) = σ σ[VA(e)(σ)/x ].

Induction Step

We suppose that the execution traces M opA (S0 ), M op

A (S1 ) and M opA (S2 ) are defined for all states.


Sequencing We define the operational semantics

M opA (S1 ;S2 )

of a sequenced statement S1 ;S2 on an initial state σ by concatenating:

• the execution trace M opA (S1 )(σ) produced by the execution of S1 on the initial state; and

• the execution trace produced by executing S2 on the final state

Final(M opA (S1 )(σ))

(if it exists) produced by executing S1 on the initial state σ.

Thus,M op

A (S1 ;S2 )(σ) = M opA (S1 )(σ) M op

A (S2 )(Final(M opA (S1 )(σ)))

So, if S1 terminates, it produces a finite sequence of states and the sequencing of S1 and S2

produces first the state sequence from executing S1 , then the the state sequence from executingS2 . If S1 does not terminate, then it produces an infinite sequence of states and we do not getto observe the behaviour of S2 .

Conditional We define the operational semantics

M opA (if b then S1 else S2 fi)(σ)

of a conditional statement if b then S1 else S2 fi on an initial state σ by cases.If the evaluation WA(b)(σ) of the test b on the initial state σ is true, then we define the

execution trace M opA (if b then S1 else S2 fi)(σ) by concatenating:

• the initial state σ; and

• the execution trace M opA (S1 )(σ) produced by executing the then-statement S1 on the

initial state σ.

If, however, the evaluation WA(b)(σ) of the test b on the initial state σ is false, then wedefine the execution trace M op

A (if b then S1 else S2 fi)(σ) by concatenating:

• the initial state σ; and

• the execution trace M opA (S1 )(σ) produced by executing the else-statement S2 on the initial

state.

Thus,

M opA (if b then S1 else S2 fi)(σ) =

σ M opA (S1 )(σ) if WA(b)(σ) = tt ;

σ M opA (S2 )(σ) if WA(b)(σ) = ff


Iteration We define the operational semantics

M opA (while b do S0 od)(σ)

of a while statement while b do S0 od on an initial state σ in the same manner as we defined theinput-output semantics of while statements in Chapter 16, but we retain all the intermediatestates.

We concatenate:

(i) the initial state σ; and

(ii) the execution traceM op

A (S0 )(σ0 )

produced by executing the body S0 on the state σ0 if the evaluation

WA(b)(σ0 )

of the test b on the state σ0 is true;

(iii) the execution traceM op

A (S0 )(M opA (S0 )(Final(σ0 )))

produced by executing the body S0 on the final state

Final(M opA (S0 )(σ0 ))

(if it exists) that results from stage (ii) if the evaluation of the test in stage (ii) is trueand the evaluation

WA(b)(Final(M opA (S0 )(σ0 )))

of the test b on the final state Final(M opA (S0 )(σ0 )) (if it exists) that results from stage

(ii) on the state σ0 is also true;

(iv) and so on whilst the evaluation of the test b at each stage remains true.

Thus,

M opA (while b do S0 od)(σ0 )

=

σ0 σ0 if WA(b)(σ0 ) = ff ;σ0 · · · σn if σ0 = σ0 ,

∃n > 1 , ∀i : 0 ≤ i ≤ n − 1[WA(b)(Final(σi)) = tt , M op

A (S0 )(Final(σi)) = σi+1 ]and WA(b)(Final(σn)) = ff ;

σ0 · · · σn · · · if σ0 = σ0 , σ0 and∀i ≥ 0[WA(b)(Final(σi)) = tt , M op

A (S0 )(Final(σi)) = σi+1 ].

Note that we can get an infinite sequence of states from executing a while statement in oneof two ways. The first possibility is that the execution of the body S0 of the loop does notterminate at some point in the execution. The second possibility is that each iteration of thebody S0 terminates, but the loop itself does not terminate because the test at each stage istrue. In the first case we do not observe any subsequent behaviour of the loop after the firstpoint that the infinite body execution occurs.


Summary This gives us the following operational semantics for while programs:

M opA (skip)(σ) = σ σ

M opA (x :=e)(σ) = σ σ[VA(e)(σ)/x ]

M opA (S1 ;S2 )(σ) = M op

A (S1 )(σ) M opA (S2 )(Final(M op

A (S1 )(σ)))

M opA (if b then S1 else S2 fi)(σ) =

σ M opA (S1 )(σ) if WA(b)(σ) = tt ;

σ M opA (S2 )(σ) if WA(b)(σ) = ff .

M opA (while b do S0 od)(σ1 )

=

σ0 σ0 if WA(b)(σ0 ) = ff ;

σ0 · · · σn if σ0 = σ0 ,

∃n > 1 ,∀i : 0 ≤ i ≤ n − 1

[WA(b)(Final(σi)) = tt M opA (S0 )(Final(σi)) = σi+1 ]

and WA(b)(Final(σn)) = ff ;

σ0 · · · σn · · · if σ0 = σ0 , σ0 and

∀i ≥ 0

[WA(b)(Final(σi)) = tt , M opA (S0 )(Final(σi)) = σi+1 ].

Example Consider the program E from Section ?? to compute Euclid’s algorithm. Supposewe execute E on the state σ0 where σ0 (x) = 45 and σ0 (y) = 12 . Then the execution trace forE on σ0 is constructed as follows.

M opA (z:=(x mod y);while z 6= 0 do x:=y;y:=z;z:=(x mod y) od)(σ0 )

= M opA (z:=(x mod y))(σ0 )

M opA (while z 6= 0 do x:=y;y:=z;z:=(x mod y) od)(Final(M op

A (z:=(x mod y)(σ0 ))))

by the definition of M opA on sequenced statements

Let us consider the execution trace σ0 of the assignment:

σ0 = M opA (z:=(x mod y))(σ0 )

= σ0 , σ0 [VA((x mod y))(σ)/z]

by the definition of M opA on assignments

= σ0 , σ0 [9/z]

by the definition of VA.Now let us consider the execution trace of the while loop. Let

σ1 = Final(σ0 )

= σ0 [9/z].


Then, by the definition of M opA on iteration,

M opA (while z 6= 0 do x:=y;y:=z;z:=(x mod y) od)(σ1 ) = σ1 σ2 σ3

where the execution traces

σ1 = σ1

σ2 = M opA (x:=y;y:=z;z:=(x mod y))(Final(σ1 ))

σ3 = M opA (x:=y;y:=z;z:=(x mod y))(Final(σ2 ))

The execution trace σ2 simplifies to:

M opA (x:=y;y:=z;z:=(x mod y))(Final(σ1 )) = M op

A (x:=y;y:=z;z:=(x mod y))(σ1 )

by definition of σ1 and Final

= M opA (x:=y)(σ1 ) M op

A (y:=z;z:=(x mod y))(Final(M opA (x:=y)(σ1 )))

= M opA (x:=y)(σ1 )(M op

A (y:=z)(Final(M opA (x:=y)))

M opA (z:=(x mod y))(Final(M op

A (y:=z)(Final(M opA (x:=y)(σ1 ))))))

by definition of M opA on sequenced statements.

Taking the semantics of x:=y:

M opA (x:=y)(σ1 ) = σ1 , σ1 [VA(y)(σ1 )/x]


M opA (x:=y)(σ1 ) = σ1 , σ1 [12/x]

by the definition of VA.The semantics of y:=z:

M opA (y:=z)(Final(M op

A (x:=y))) = M opA (y:=z)(Final(σ1 , σ1 [12/x]))

= M opA (y:=z)(σ1 [12/x])

= σ1 [12/x], σ1 [12/x][VA(z)(σ1 [12/x])/y]


= σ1 [12/x], σ1 [12/x][9/y]

by the definition of VA.The semantics of z:=(x mod y)):

M opA (z:=(x mod y))(Final(M op

A (y:=z)(Final(M opA (x:=y)(σ1 ))))))

= M opA (z:=(x mod y))(Final(σ1 [12/x], σ1 [12/x][9/y]))

= M opA (z:=(x mod y))(σ1 [12/x][9/y])

= σ1 [12/x][9/y], σ1 [12/x][9/y][VA(x mod y)(σ1 [12/x][9/y])/z]



= σ1 [12/x][9/y], σ1 [12/x][9/y][3/z]

by the definition of VA.Thus,

σ2 = σ1 [12/x], σ1 [12/x][9/y], σ1 [12/x][9/y][3/z]

= σ0 [9/z][12/x], σ0 [9/z][12/x][9/y], σ0 [9/z][12/x][9/y][3/z]

= σ0 [9/z][12/x], σ0 [9/z][12/x][9/y], σ0 [12/x][9/y][3/z]

by definition of σ1 and substitution on states; and similarly,

σ3 = σ2 [9/x], σ2 [9/x][3/y], σ2 [9/x][3/y][0/z]

= σ0 [9/x][9/y][3/z], σ0 [9/x][3/y][3/z], σ0 [9/x][3/y][0/z]

Note that the test z 6= 0 evaluates to:

WA(z 6= 0)(Final(σ0 )) = WA(z 6= 0)(Final(σ1 , σ1 ))

= WA(z 6= 0)(σ0 [9/z])

= tt

on the state Final(σ1 ),

WA(z 6= 0)(Final(σ2 ))

= WA(z 6= 0)(Final(σ0 [9/z][12/x], σ0 [9/z][12/x][9/y], σ0 [12/x][9/y][3/z]))

= WA(z 6= 0)(σ0 [12/x][9/y][3/z]))

= tt

on the state Final(σ2 ) and

WA(z 6= 0)(Final(σ3 ))

= WA(z 6= 0)(Final(σ0 [9/x][9/y][3/z], σ0 [9/x][3/y][3/z], σ0 [9/x][3/y][0/z]))

= WA(z 6= 0)(σ0 [9/x][3/y][0/z]))

= ff

on the state Final(σ3 ).Thus, the execution trace for the whole algorithm is:

M opA (z:=(x mod y);while z 6= 0 do x:=y;y:=z;z:=(x mod y) od)(σ0 )

= σ0 σ1 σ2 σ3

= σ0 , σ0 [9/z],

σ0 [9/z], σ0 [9/z],

σ0 [9/z][12/x], σ0 [9/z][12/x][9/y], σ0 [12/x][9/y][3/z]

σ0 [9/x][9/y][3/z], σ0 [9/x][3/y][3/z], σ0 [9/x][3/y][0/z]

which we illustrate in Figure 18.1.

18.2. STRUCTURAL OPERATIONAL SEMANTICS 711

Trace Tests Comment Execute State ValuesEvaluate Result x y z

σ0

Initial State 45 12 ?z:=x mod y 45 12 9

σ1

z 6= 0 Succeeds Enter loop 45 12 9σ2

x:=y 12 12 9y:=z 12 9 9

z:=x mod y 12 9 3z 6= 0 Succeeds Re-enter loop 12 9 3

σ3

x:=y 9 9 3y:=z 9 3 3

z:=x mod y 9 3 0z 6= 0 Fails Exit loop 9 3 0

Figure 18.1: Execution trace for Euclid’s algorithm on initial state σ0 where σ0 (x) = 45 andσ0 (y) = 12 .

18.2 Structural Operational Semantics

To appear: general introductionThus, we can determine the state Comp(P , σ, t) produced at time t by executing program

P on state σ by extracting the t th element of the sequence of states produced by M opA :

Comp(P , σ, t) = nth(M SOSA (P , σ), t).

18.2.1 General approach

Configuration = (Prog(Σ )× State(A)) ∪ State(A)

A configuration γ ∈ Configuration represents a program execution in progress, or one that hasterminated.

SOS→⊆ Configuration × Configuration

(S , σ)SOS→ γ

Two possibilities for configuration γ.

(S , σ)SOS→ (S ′, σ′).

The first stage in executing the program S on the state σ produces the state σ ′, and the programS ′ remains to be executed.

(S , σ)SOS→ σ′


The program S terminates on the state σ in a single stage, producing the state σ ′.A derivation sequence of a program S on an initial state σ0 is a finite

γ0 , γ1 , . . . , γn

or infiniteγ0 , γ1 , . . . , γn , . . .

sequence of configurations, where the initial configuration γ0 = (S , σ0 ), and each subsequent

configuration γi+1 is derived from the previous configuration γi by the transition relationSOS→ .

In the case of an infinite derivation sequence, γiSOS→ γi+1 for i ≥ 0 .

In the case of a finite derivation sequence, γiSOS→ γi+1 for 0 ≤ i ≤ n, and 6 ∃γn+1 ∈

Configuration such that γnSOS→ γn+1 . This may be deliberate, because a program has been

completely executed and the configuration is just a state γn ∈ State. However, it may also bebecause there are simply no rules to describe how a configuration may progress.

18.2.2 Structural Operational Semantics of while Programs

(skip, σ)SOS→ σ

(x :=e, σ)SOS→ σ[V A(e)(σ)/x ]

(S1 , σ)SOS→ σ′

(S1 ;S2 , σ)SOS→ (S2 , σ

′)

(S1 , σ)SOS→ (S ′

1 , σ′)

(S1 ;S2 , σ)SOS→ (S ′

1 ;S2 , σ′)

(if b then S1 else S2 fi, σ)SOS→

(S1 , σ) if W A(b)(σ) = tt ;(S2 , σ) if W A(b)(σ) = ff .

(while b do S0 od, σ)SOS→ (if b then S0 ;while b do S0 od else skip fi, σ)

Equivalent Axiomatic Definition


M SOSA (skip, σ) = σ σ

M SOSA (x :=e, σ) = σ[V A(e)(σ)/x ]

M SOSA (S1 ;S2 , σ) =

σ M SOSA (S2 , σ

′)

if M SOSA (S1 , σ) = σ σ′;

σ M SOSA (S ′

1 ;S2 , σ′)

if M SOSA (S1 , σ) = σ M SOS

A (S ′1 , σ

′).

M SOSA (if b then S1 else S2 fi, σ) =

M SOSA (S1 , σ) if W A(b)(σ) = tt ;

M SOSA (S2 , σ) if W A(b)(σ) = ff .

M SOSA (while b do S0 od, σ) = M SOS

A (if bthen S0 ;while b do S0 od

else skip fi, σ)

Example Consider the program E from Section 17.4.1 to compute Euclid’s algorithm. Supposewe execute E on the state σ0 where σ0 (x) = 45 and σ0 (y) = 12 using structural operationalsemantics.

The first statement we execute is the first of the sequence of statements that constitute E .Furthermore, as this is an assignment statement z:=(x mod y), we can completely execute it ina single stage. Thus:

(z:=(x mod y), σ0 )SOS→ σ0 [9/z]

(z:=(x mod y);Swhile , σ0 )SOS→ (Swhile , σ0 [9/z]).

So after one time cycle, we have executed the statement z:=(x mod y) on the initial state σ0 ,producing the state

σ1 = σ0 [9/z]

where we have the values:x y z

σ1 45 12 9

We also have the program Swhile still to execute on the state σ1 .We now consider the execution of Swhile on the state σ1 . Using the transition rules for

structural operational semantics, we express the semantics of the while loop using that of aconditional statement.

(Swhile , σ1 )SOS→ (if z 6= 0then Sbody ;Swhileelseskipfi, σ1 )

So now using the semantics of conditional statements, as W A(z 6= 0)(σ1 ) = tt , we get:

(if z 6= 0 then Sbody ;Swhileelse skip fi, σ1 )SOS→ (Sbody ;Swhile , σ1 )

We now have a sequence formed from the body of the while loop, followed by the whole ofthe while loop. As the body consists of a sequence of statements, it will be the case that thereexists some program S ′ and some state σ2 such that

(Sbody , σ1 )SOS→ (S ′, σ2 )

(Sbody ;Swhile , σ1 )SOS→ (S ′;Swhile , σ2 )


So, let us consider just Sbody to determine the state σ2 from executing the first stage of S ′. In sodoing, we shall also determine the remaining program S ′ of Sbody that we shall need to execute,in order to fully execute Sbody :

(x:=y, σ1 )SOS→ σ1 [12/x]

(Sbody , σ1 )SOS→ (y:=z;z:=(x mod y), σ1 [12/x]).

Thus, we generate the stateσ2 = σ1 [12/x]

at step 1 of the execution of Sbody , and we have the remaining program

y:=z;z:=(x mod y)

of Sbody to execute. Now, substituting back into our original problem, we have:

(x:=y, σ1 )SOS→ σ1 [12/x]

(x:=y;y:=z;z:=(x mod y);Swhile , σ1 )SOS→ (y:=z;z:=(x mod y);Swhile , σ1 [12/x])

I.e., we have the stateσ2 = σ1 [12/x]

wherex y z

σ1 45 12 9σ2 12 12 9

at step 2 of the execution of the whole program E , and we have the remaining program

y:=z;z:=(x mod y);Swhile

to execute.It is a similar story with the configuration

(y:=z;z:=(x mod y);Swhile , σ2 )

As we have a program formed by sequencing an assignment statement y:=z with another state-ment z:=(x mod y);Swhile , we have:

(y:=z, σ2 )SOS→ σ2 [9/y]

(y:=z;z:=(x mod y);Swhile , σ2 )SOS→ (z:=(x mod y);Swhile , σ2 [9/y])

This gives us our third execution state

σ3 = σ2 [9/y]

wherex y z

σ1 45 12 9σ2 12 12 9σ3 12 9 9


and the remaining programz:=(x mod y);Swhile

to execute.Again, we have a program formed by sequencing an assignment statement z:=(x mod y);,

with another statement Swhile :

(z:=(x mod y), σ3 )SOS→ σ3 [3/z]

(z:=(x mod y);Swhile , σ3 )SOS→ (Swhile , σ3 [3/z])

This gives us our fourth execution state

σ4 = σ3 [3/z]

wherex y z

σ1 45 12 9σ2 12 12 9σ3 12 9 9σ4 12 9 3

and the remaining programSwhile

to execute.Now, having executed the whole of the body Sbody of the while loop Swhile once, we are now

at the stage of having Swhile to deal with again.

(Swhile , σ4 )SOS→ (if z 6= 0 then Sbody ;Swhile else skip fi, σ4 )

Again, the test is true:W A(z 6= 0)(σ4 ) = tt ,

so we execute the body of the while loop once more. By the same reasoning as for the firstexecution of Sbody ;Swhile , we produce the configurations

(y:=z;z:=(x mod y);Swhile , σ5 ), (z:=(x mod y);Swhile , σ6 )(Swhile , σ7 )

where the state produced at time 5 is:

σ5 = σ4 [9/x]

at time 6 :σ6 = σ5 [3/y]

and time 7 :σ7 = σ[0/z]


wherex y z

σ1 45 12 9σ2 12 12 9σ3 12 9 9σ4 12 9 3σ5 9 9 3σ6 9 3 3σ7 9 3 0

Now we are back at Swhile again. This time though, the test is false:

W A(z 6= 0)(σ7 ) = ff .

Consequently, we exit from the while loop:

(if z 6= 0 then Sbody ;Swhileelse skip od, σ7 )SOS→ (skip, σ7 )

And at this point, our program execution terminates:

(skip, σ7 )SOS→ σ7

Thus, the overall sequence of configuration states that we produce is shown in Figure 18.2.

Time ConfigurationRemaining Program State

x y z0 z:=(x mod y);while z 6= 0 do x:=y;y:=z;z:=(x mod y) od 45 12 ?1 while z 6= 0 do x:=y;y:=z;z:=(x mod y) od 45 12 92 y:=z;z:=(x mod y);while z 6= 0 do x:=y;y:=z;z:=(x mod y) od 12 12 93 z:=(x mod y);while z 6= 0 do x:=y;y:=z;z:=(x mod y) od 12 9 94 while z 6= 0 do x:=y;y:=z;z:=(x mod y) od 12 9 35 y:=z;z:=(x mod y);while z 6= 0 do x:=y;y:=z;z:=(x mod y) od 9 9 36 z:=(x mod y);while z 6= 0 do x:=y;y:=z;z:=(x mod y) od 9 3 37 9 3 0

Figure 18.2: Derivation trace for Euclid’s algorithm using structural operational semantics onan initial state σ0 where σ0 (x) = 45 and σ0 (y) = 12 .

18.3 Algebraic Operational Semantics

We can describe the operational semantics of a language without recourse to infinite sequencesof states by describing how each state in the execution trace is generated from the previousstate.

We define the operational semantics function

Comp : Prog(Σ )× State(A)× Time → State(A)

18.3. ALGEBRAIC OPERATIONAL SEMANTICS 717

Semantics offull language

L

kernel ofL

Semantics of

full languageL

kernel ofL

Figure 18.3: Semantic kernel of a language.

by considering how an execution trace can be constructed in a step-by-step manner. In partic-ular, we suppose that programs are built from a set of atomic programs

AProg(Σ ) ⊆ Prog(Σ )

and flow of control constructors that determine in what order the atomic programs are executed.We need a function

Act : AProg(Σ )× State(A)→ State(A)

that determines the semantics Act(α, σ) of any atomic program α on any state σ.Our aim is to produce a finite sequence

α0 , α1 , . . . , αt

or infinite sequenceα0 , α1 , . . . , αt , . . .

of atomic programs, such that the execution

Act(αt , σt)

of the program αt on the state σt gives the next state σt+1 in the finite

σ0 , σ1 , σ2 , . . . , σt

= = = . . . =

Comp(P , σ, 0 ), Comp(P , σ, 1 ), Comp(P , σ, 2 ), . . . , Comp(P , σ, t)

= = . . . =

Act(α0 , σ0 ) Act(α1 , σ1 ) . . . , Act(αt−1 , σt−1 )

or infinite execution trace.We shall suppose that AProg(Σ ) contains the identity program skip, whose execution has

no effect on the values stored in a state.

Act(skip, σ) = σ.

18.3.1 Producing Execution Traces

We define the function Comp by induction on time.


Time t = 0

At time 0 , we simply want to return the initial state of the computation. So, for any P ∈Prog(Σ ) and σ ∈ State(A), we define

Comp(P , σ, 0 ) = σ.

Time t = 1

At time 1 , we want to find the first atomic instruction

First(P , σ)

of the program P on the state σ and then execute it on the state σ,

Act(First(P , σ), σ).

So, for any P ∈ Prog(Σ ) and σ ∈ State(A), we define

Comp(P , σ, 1 ) = Act(First(P , σ), σ).

Induction Case

Suppose we have an atomic program to execute. Then we know that this only takes one step oftime to execute. Thus, at times 2 , 3 , . . ., we can have no further work to do. We denote that wehave finished executing a program with a distinguished state ∗, which we presume is containedin the set State(A) of all possible computation states. (This state allows us to model a finitecomputation σ0 , σ1 , . . . , σn as an infinite sequence of states σ0 , σ1 , . . . , σn , ∗, ∗, . . ..) Thus, if wehave an atomic program α ∈ AProg(Σ ), we define that for any σ and t ≥ 1 ,

Comp(α, σ, t + 1 ) = ∗.Otherwise, if we have a non-atomic program P ∈ Prog(Σ ) with P 6∈ AProg(Σ ), we shall

split the computation into two stages. In the first step of time, we shall produce the state

Comp(P , σ, 1 )

that results at time 1 . We shall use this state as the new initial state to execute the remaininginstructions

Rest(P , σ)

of the program in the remaining t cycles of time. Thus, for any non-atomic program P , σ andt ≥ 1 , we define

Comp(P , σ, t + 1 ) = Comp(Rest(P , σ),Comp(P , σ, 1 ), t).

To summarise, we define the function Comp for any P ∈ Prog(Σ ), σ ∈ State(A) andt ∈ Time by:

Comp(P , σ, 0 ) = σComp(P , σ, 1 ) = Act(First(P , σ))(σ)

∀t ≥ 1 : Comp(P , σ, t + 1 ) =

∗ if P is atomic;

Comp(Rest(P , σ),Comp(P , σ, 1), t) if P is not atomic.

18.3. ALGEBRAIC OPERATIONAL SEMANTICS 719

18.3.2 Deconstructing Syntax

As we have seen, the functions First and Rest play a crucial part in the definition of the functionComp. We use these functions to split a program into its constituent parts.

Thus, on an atomic program α ∈ AProg(Σ ) ⊆ Prog(Σ ),

First(α, σ) = α

Rest(α, σ) = skip.

On compound programs, we determine the first atomic command First(P , σ) of the programP that we execute to simulate the execution of the program P on the state σ over a period ofone cycle of time. We use this in conjunction with the remaining instructions Rest(P , σ) of theprogram P that we have to consider to determine any future behaviour of the program P afterthe first cycle of time.

18.3.3 Algebraic Operational Semantics of while programs

Semantics offull language

WP (Σ)

kernel ofWP (Σ)

Semantics of

full languageWP (Σ)

kernel ofWP (Σ)

Figure 18.4: Semantic kernel of while programs.

Atomic Programs

We first note that the while programs have two atomic commands, namely

skip and x :=e;

every other while program command is dependent on some other sub-command(s). Thus, wecan define the semantics of any while program in terms of the effect of some sequence of identityand assignment statements. Thus,

First(skip, σ) = skip

Rest(skip, σ) = skip

Act(skip, σ) = σ

First(x :=e, σ) = x :=e

Rest(x :=e, σ) = skip

Act(x :=e, σ) = σ[VA(e, σ)/x ].

We now proceed by induction on the structure of commands.


Sequencing

When two commands S1 and S2 are sequenced together, the first atomic statement that weshould execute is the first of S1 . If S1 consists only of this atomic command then the rest ofthe program that we need to consider after execution of S1 is simply S2 . Alternatively, if S1 isitself a compound statement, then after execution of the first atomic command of S1 , we needto consider the rest of the command S1 and then the whole of S2 . Thus,

First(S1 ;S2 , σ) = First(S1 , σ)

Rest(S1 ;S2 , σ) =

S2 if S1 is atomic;

Rest(S1 , σ);S2 if S1 is not atomic.

Conditionals

The command if b then S1 else S2 fi requires the evaluation of the Boolean expression b inorder to determine whether S1 or S2 should be executed. We shall consider that any suchevaluation requires one cycle of time. Accordingly, we define

First(if b then S1 else S2 fi, σ) = skip

Rest(if b then S1 else S2 fi, σ) =

S1 if WA(b)(σ) = tt ;

S2 if WA(b)(σ) = ff .

as evaluating the test b requires one time cycle, and does not alter the state; this is equivalentto executing the skip statement.

Iteration

In the case that we have a while loop,

while b do S0 od

we first evaluate the test b to determine whether we should execute the body of the loop or not.Just as for the Boolean test in conditional statements, we shall consider that we require onestep of time to perform any such test, and we shall model it by executing the program skip.

If the test evaluates to true on the initial state, we have to first execute the body S0 of theloop. Then we execute the while loop again. Otherwise, if the test evaluates to false, we exitfrom the while loop, which we model by executing the program skip. Thus,

First(while b do S0 od, σ) = skip

Rest(while b do S0 od, σ) =

S0 ;while b do S0 od if WA(b)(σ) = tt ;

skip if WA(b)(σ) = ff .

To summarise, the AOS of the while language is given by combining the algebraic definitionof Comp in Section 18.3.1 with this algebraic definition of First and Rest :

18.4. COMPARISON OF SEMANTICS 721

First(skip, σ) = skip

Rest(skip, σ) = skip

First(x :=e, σ) = x :=e

Rest(x :=e, σ) = skip

First(S1 ;S2 , σ) = First(S1 , σ)

Rest(S1 ;S2 , σ) =

S2 if S1 ∈ AProg(Σ );

Rest(S1 , σ);S2 if S1 6∈ AProg(Σ ).

First(if b then S1 else S2 fi, σ) = skip

Rest(if b then S1 else S2 fi, σ) =

S1 if WA(b)(σ) = tt ;

S2 if WA(b)(σ) = ff .

First(while b do S0 od, σ) = skip

Rest(while b do S0 od, σ) =

S0 ;while b do S0 od if WA(b)(σ) = tt ;

skip if WA(b)(σ) = ff .

Example Consider the program E from Section ?? to compute Euclid’s algorithm. Supposewe execute E on the state σ where σ(x) = 45 and σ(y) = 12 . Then, we simulate E with theprogram formed by taking the following sequence of identity and assignment statements:

Time Atomic Program Comment State Valuesx y z

0 Initial State 45 12 ?1 z:=x mod y 45 12 92 skip Model test z 6= 0 45 12 93 x:=y Test successful — enter loop 12 12 94 y:=z 12 9 95 z:=x mod y 12 9 36 skip Model test z 6= 0 12 9 37 x:=y Test successful — re-enter loop 9 9 38 y:=z 9 3 39 z:=x mod y 9 3 010 skip Model test z 6= 0 9 3 011 skip Test fails — exit loop 9 3 0

18.4 Comparison of Semantics

To appear.


18.5 Program Properties

To appear.

18.5. PROGRAM PROPERTIES 723


Chapter 19

Virtual Machines

A virtual machine is a model of a computer. As a model, it is an abstraction of a real physicalcomputer, in terms of, for example,

(i) its architecture;

(ii) the data it computes on;

(iii) the control constructs that sequence operations;

(iv) memory structures.

Virtual machines are designed to be implemented in software and run on different processors.A specification of how a program or a process should behave on a virtual machine enablesimplementations to be built for different processors, running different operating systems andenvironments. Thus, virtual machines afford implementation independence. The degree ofindependence is determined by the degree of abstraction provided in the model.

Any model that is produced is a virtual machine. For example, a relatively low-level model ofa computer may have descriptions of data buses that transport data between different processingand memory units; such a description may be helpful to a hardware engineer who may beconcerned, for example, with maximising the performance of assembly level instructions. Ahigher level model of a computer might be used by a software engineer who may be concerned,for example, with the production of a piece of software written in a high-level language such asPascal, Prolog or Java.

19.1 Machine Semantics and Operational Semantics

We refine our general model of operational semantics to describe the semantics of low-levelprogramming languages and abstract machines. We call the special case

machine semantics

Our refinement involves an adaptation to the notion of state. Clearly at the level of machinesemantics the programming language Prog that we study will be used to control the machine,and the clock Time that enumerates the sequences of states will run at a faster level comparedto a high-level language. (We shall explain exactly how we can relate implementations atdifferent levels of abstraction in Chapter 20.)

725

726 CHAPTER 19. VIRTUAL MACHINES

19.1.1 Generating Machine State Execution Traces

First, we observe that when executing a low-level language the distinction between programsand states is somewhat blurred; the program is stored within the state, and certain aspects ofthe state are determined by the program to be executed. Of course if we were to look veryclosely at how a high-level language is executed, we would find that this description also holds,yet with high-level languages it is possible and, indeed, desirable that we model their executionat a higher level of abstraction.

Suppose we have some setMachineState

of machine states with a distinguished element

∗ ∈ MachineState

to determine when we have completed a computation.We extend our model State of states by supposing that we have a function

Load : Prog × State → MachineState

such that

Load(P , σ) = the state of the machine that results from loading the programP ∈ Prog into memory for execution on the state whose initialvalues are determined by σ ∈ State.

We shall also use a function

Reset : MachineState → State

to clear a program from memory, so that

Reset(Load(P , σ)) = σ

for all P ∈ Prog and σ ∈ State.Thinking of operational semantics, we expect that executing a machine program P ∈ Prog

on an initial state σ0 ∈ State that is stored in an initial machine state τ0 = Load(P , σ0 )produces a finite

τ0 , τ1 , . . . , τt

or infiniteτ0 , τ1 , . . . , τt , . . .

sequence of machine states.To define the behaviour of a low-level program P , we shall use a function

Next : MachineState → MachineState

such that

Next(τt) = the next state τt+1 that results from executing the program thatis loaded in memory on the state τt .

19.2. THE REGISTER MACHINE 727

The machine semantics of a low-level language can be defined via an iterated map

MachineComp : MachineState × Time → MachineState

such that,

MachineComp(τ0 , t) = the state τt that results from executing the program that is loadedin memory on the state τ0 for t machine cycles of time.

We define MachineComp by induction on time:

MachineComp(τ, 0 ) = τ

MachineComp(τ, t + 1 ) = MachineComp(Next(τ), t)

Definition (Operational Semantics of a Machine) We define the operational semanticsof a machine for any P ∈ Prog , σ ∈ State and t ∈ Time by:

Comp(P , σ, t) = Reset(Next t(Load(P , σ))).

19.2 The Register Machine

19.2.1 Informal Description

We shall construct a virtual machine with a very simple architecture called an

abstract or virtual register machine (VRM).

It consists of a sequence of registers. The structure of the machine is based on distinguishingthree types of register according to the data that we are allowed to store in them:

program counter a register to determine the order in which the program instructions areto be executed;

data registers to store values that are manipulated by programs; and

test registers to store values that result from performing tests on the dataregisters.

We suppose that the machine has a single program counter, and unlimited storage, i.e., themachine has infinitely many data and test registers, see Figure 19.1.

Test RegistersData RegistersProgram Counter

r2 . . . rn . . . t1 t2 . . . tn . . .r1

d1 d2 . . . dn . . .pc p1 p2 . . . pn . . .

Figure 19.1: Form of register machine.

Any computation by the machine will access only finitely many registers. These registersstore data and Booleans from an arbitrarily chosen data type and its process will be based


on the operations of the data type. Thus, one can say that what makes this register machinemodel virtual is the abstract nature of its data.

On these VRMs, we shall execute programs which consist of a sequence of simple instruc-tions.

An instruction has one of four forms:

copy instructions duplicate data in one register to another;

jump instructions conditionally direct the order in which instructions are executed;

value instructions place the value of a constant into a register; and

evaluate instructions place in a register the result of applying an operation of the data typeto register values.

19.2.2 Data Type

To specify formally a register machine we begin by choosing a data type. Let Σ be any signatureof the form:

signature Data

sorts data

constants . . . , c : → data, . . .true : → Boolfalse : → Bool

operations . . . , f : datan → data, . . .. . . , rel : datam → Bool , . . .¬ : Bool → Bool∧ : Bool × Bool → Bool∨ : Bool × Bool → Bool

endsig

For simplicity, we are assuming that there is just one sort of data and the Booleans.Let D be any Σ -algebra of the form:

algebra D

carriers D ,B = tt ,ff

constants . . . , cD : → D , . . .tt : → B

ff : → B

operations . . . , f D : Dn → D , . . .. . . , relD : Dm → B, . . .not : B→ B

and : B× B→ B

or : B× B→ B


Any data and any operations and tests may be chosen.

19.2.3 States

Suppose that a VRM program computes over the data type D . We consider in turn each ofthe three types of register.

Program Counter

The program counter is a single register which stores the address of the next program instructionto be executed.

We shall model addresses in a simple manner using the natural numbers.

Data Registers

The data registers of a VRM store values that are computed by a program during execution.Each data register can hold some value d ∈ D . We have arbitrary many data registers in amachine; each register has a unique address.

We model the data registers as a set

RegD = [DataReg → D ]

of locations which are accessed through addresses . . . , r ∈ DataReg , . . ., and which store values. . . , d ∈ D , . . .: a data register

ρD ∈ RegD

stores the valueρD(r) ∈ D

in the register r ∈ DataReg as shown in Figure 19.2.

ρD(r2)ρD(r1) . . . ρD(rn)

r1 r2 . . . rn . . .

. . .

Figure 19.2: Data registers ρD ∈ RegD of the VRM.

Test Registers

The test registers hold the value of tests that can be used to determine the execution order ofprogram instructions.

We model the test registers as a set

RegB = [TestReg → B]

of locations which are accessed through addresses . . . , t ∈ TestReg , . . ., and which store values. . . , b ∈ B, . . .: a test register

ρB ∈ RegB


stores the valueρB(r) ∈ B

in the register r ∈ TestReg as shown in Figure 19.3.

ρB(t2) . . . ρB(tn)

t1 t2 . . . tn . . .

. . .ρB(t1)

Figure 19.3: Test registers ρB ∈ RegB of the VRM.

Definition (Modelling States) A VRM state

τ = (P , pc, ρD , ρB)

consists of a program P , program counter pc ∈ N, a set ρD ∈ RegD of data registers, and a setρB ∈ RegB. We model the set

VRMState(D) = Prog(Σ )× N× RegD × RegB

of VRM states as the collection of all possible combinations of programs and values held in theprogram counter, data registers, and test registers.

19.2.4 Programs

The form of a program is:l . Ill + 1 . Il+1

l + 2 . Il+2

......

l + λ. Il+λ

where l , l + 1 , . . . , l + λ ∈ N are labels and Il , Il+1 , . . . , Il+λ ∈ Instrn are instructions.Thus, the syntax of VRM programs is given by:

〈VRMProgram〉 ::= 〈label〉 . 〈instrn〉 |〈label〉 . 〈instrn〉 ,〈VRMProgram〉 |〈label〉 . 〈instrn〉〈newline〉〈VRMProgram〉

where we require the labels to successively increment.Thus,

VRMProgram = l .Il , l + 1 .Il+1 , . . . , Il+λ.Il+λ |l , l + 1 , . . . , l + λ ∈ N, Il , Il+1 , . . . , Il+λ ∈ Instrn

⊆ (N× Instrn)∗


Semantics

The semantics of a VRM program is derived from the semantics of its individual instructions.The program counter register of the VRM state determines the order of execution of the

instructions. Execution ceases when there is no instruction with a label equal to the valuestored in the program counter.

Thus, given a programP = l . Il

l + 1 . Il+1

......

l + λ. Il+λ

and a state(pc, ρD , ρB)

the behaviour of P is

M DVRM (P)(pc, ρD , ρB) =

M DVRM (P)(M D

VRMInstrn(Ipc)(pc, ρD , ρB)) if l ≤ pc ≤ l + λ;

(pc, ρD , ρB) if pc < l and pc > l + λ.

19.2.5 Instructions

The program instructions are summarised in Figure 19.4.

Instruction Syntax Intuitive Semanticscopy rj ← ri Copy contents of data register ri to data register rj .

jump n?← ti If test register ti contains the value true, increment

program position by n ∈ Z. Otherwise, incrementcurrent program position by one.

value rj ← c Place value of constant c in register rj .

evaluate rj ← f (ri1 , . . . , rin ) Place in register rj , the result of applying function fto values held in registers ri1 , . . . rin .

Figure 19.4: Virtual machine instructions.

Let us consider the instructions in more detail.We consider each of the four types of instruction, giving the behaviour

M DVRMInstrn(i) : VRMState(D)→ VRMState(D)

of each type of instruction i .

Copy Instructions

The form of a copy instruction isrj ← ri


for ri , rj ∈ DataReg. The behaviour of such a copy instruction is that the value of register ri

is placed in register rj and the value of the the program counter is incremented by one. Thus,given a state

(pc, ρD , ρB) ∈ VRMState(D)

executing an instruction:rj ← ri

produces the state

M DVRMInstrn(rj ← ri)(pc, ρ

D , ρB) = (pc + 1 , ρDnew , ρ

B) ∈ VRMState(D)

where

ρDnew(r) =

ρD(rj ) if r = rj ;

ρD(r) otherwise.

Jump Instructions

The form of a jump instruction is

n?← ti

for n ∈ Z, ti ∈ TestReg . The behaviour of such a jump instruction is that the value of theprogram counter is incremented by n if the value in the test register ti is true, and otherwise,the value of the program counter is incremented by one. Thus, given a state


executing an instruction:

n?← ti

produces the state

M DVRMInstrn(n

?← ti)(pc, ρD , ρB) = (m, ρD

new , ρB) ∈ VRMState(D)

where

m =

pc if ti = true;

pc + 1 otherwise.

Value Instructions

There are two forms of value instruction: one for data registers, and one for test registers.The form of the value instruction for data registers is

rj ← c

for rj ∈ DataReg , c :→ data ∈ ΣData . The behaviour of such a value instruction is that thevalue of the constant cD is placed in register rj and the value of the the program counter isincremented by one. Thus, given a state



executing an instruction:rj ← c

produces the state

M DVRMInstrn(rj ← c)(pc, ρD , ρB) = (pc + 1 , ρD

new , ρB) ∈ VRMState(D)

where

ρDnew (r) =

cD if r = rj ;

ρD(r) otherwise.

The form of the value instruction for test registers is

tj ← c

for tj ∈ TestReg , c :→ Bool ∈ ΣData . The behaviour of such a value instruction is that thevalue of the constant cD (which will be either true or false) is placed in register rj and the valueof the the program counter is incremented by one. Thus, given a state


executing an instruction:rj ← c

produces the state

M DVRMInstrn(tj ← c)(pc, ρD , ρB)(pc + 1 , ρD , ρB

new ) ∈ VRMState(D)

where

ρBnew (t) =

cD if t = tj ;

ρB(t) otherwise.

Evaluate Instructions

There are three forms of evaluate instruction: one for data registers, one for test registers, andone for data and test registers.

The form of an evaluate instruction for data registers is

rj ← f (ri1 , . . . , rin )

for ri1 , . . . , rin , rj ∈ DataReg and f : datan → data ∈ ΣData . The behaviour of such anevaluation instruction is that the function f D is applied to the values in the data registersri1 , . . . , rin , and is placed in the data register rj ; and the value of the the program counter isincremented by one. Thus, given a state


executing an instruction:rj ← f (ri1 , . . . , rin )


produces the state

M DVRMInstrn(rj ← f (ri1 , . . . , rin ))(pc, ρ

D , ρB) = (pc + 1 , ρDnew , ρ

B) ∈ VRMState(D)

where

ρDnew (r) =

f D(ρD(ri1 ), . . . , ρD(rin ))) if r = rj ;

ρD(r) otherwise.

The form of an evaluate instruction for data and test registers is

tj ← rel(ri1 , . . . , rin )

for ri1 , . . . , rin ∈ DataReg , tj ∈ TestReg and f :→ datan → Bool ∈ ΣData . The behaviour ofsuch an evaluation instruction is that the function relD is applied to the values in the dataregisters ri1 , . . . , rin , and is placed in the test register tj ; and the value of the the programcounter is incremented by one. Thus, given a state


executing an instruction:tj ← rel(ri1 , . . . , rin )

produces the state

M DVRMInstrn(tj ← rel(ri1 , . . . , rin ))(pc, ρ

D , ρB) = (pc + 1 , ρD , ρBnew ) ∈ VRMState(D)

where

ρBnew (t) =

relD(ρD(ri1 ), . . . , ρD(rin )) if t = tj ;

ρB(t) otherwise.

The form of an evaluate instruction for test registers is

tj ← not(ti)

tj ← and(ti1 , ti2 )

tj ← or(ti1 , ti2 )

for ti1 , ti2 , ti , tj ∈ TestReg . The behaviour of such an evaluation instruction is that the function¬, ∧ or ∨, as appropriate, is applied to the values in the test registers ti , ti1 and ti2 (whereappropriate), and is placed in the test register tj ; and the value of the the program counter isincremented by one. Thus, given a state


executing an instruction:tj ← not(ti)

produces the state

M DVRMInstrn(tj ← not(ti))(pc, ρ



where

ρBnew(t) =

¬(ρB(ti)) if t = tj ;

ρB(t) otherwise.

Andtj ← and(ti1 , ti2 )

produces the state

M DVRMInstrn(tj ← and(ti1 , ti2 ))(pc, ρ


where

ρBnew (t) =

∧(ρB(ti1 ), ρB(ti2 )) if t = tj ;

ρB(t) otherwise.

Andtj ← or(ti1 , ti2 )

produces the state

M DVRMInstrn(tj ← or(ti1 , ti2 ))(pc, ρ


where

ρBnew (t) =

∨(ρB(ti1 ), ρD(ti2 )) if t = tj ;

ρB(t) otherwise.

Instruction Syntax

The form of instructions is:

〈instrn〉 ::= 〈data register〉 ← 〈data register〉 |〈increment〉 ?← 〈test register〉 |〈data register〉 ← 〈data expression〉 |〈test register〉 ← 〈test expression〉

In particular, we restrict VRM instructions to the following forms:

Instrn = rj ← ri | ri , rj ∈ DataReg∪ n ?← bi | n ∈ Z, bi ∈ TestReg∪ rj ← c | c ∈ Σdata , rj ∈ DataReg∪ rj ← f (ri1 , . . . , rin ) | f ∈ Σdatan ,data , ri1 , . . . , rin , rj ∈ DataReg∪ tj ← true | tj ∈ TestReg∪ tj ← false | tj ∈ TestReg∪ tj ← not(ti) | tj , ti ∈ TestReg∪ tj ← and(ti1 , ti2 ) | tj , ti1 , ti2 ∈ TestReg∪ tj ← or(ti1 , ti2 ) | tj , ti1 , ti2 ∈ TestReg∪ tj ← rel(ri1 , . . . , rim ) | rel ∈ Σdatan ,bool , ri1 , . . . , rim ∈ DataReg , tj ∈ TestReg


19.2.6 Example

Recall Euclid’s algorithm from Section 16.1.3:

program Euclid


sorts nat,bool



endsig

body

var x,y,r:nat

begin



od;write (y)

end

We implement Euclid’s algorithm as a VRM program.


program Euclid(input : r1, r2; output : r2);


sorts nat,bool


operations mod: nat×nat→nat;6= : nat×nat→bool;not: bool→bool;and: bool×bool→bool;or: bool×bool→bool

endsig

body

var r1,r2,r3,r4:DataReg;t1:TestReg;

begin1. r4 ← 02. r3 ← mod(r1, r2)3. t1 ←6= (r3, r4)

4. 4?← t1

5. r1 ← r26. r2 ← r3

7. -5?← t1

end

Executing Euclid’s algorithm on the initial state

τ = (1 , ρD , ρB) where ρD(r1 ) = 45 and ρD(r2 ) = 12 ,


produces the execution trace:

register instructiondata registers test register program counterr1 r2 r3 r4 t1 pc45 12 ? ? ? 1 r4 ← 045 12 ? 0 ? 2 r3 ← mod(r1, r2)45 12 9 0 ? 3 t1 ←6= (r3, r4)

45 12 9 0 tt 4 4?← t1

45 12 9 0 tt 5 r1 ← r212 12 9 0 tt 6 r2 ← r3

12 9 9 0 tt 7 -5?← t1

12 9 9 0 tt 2 r3 ← mod(r1, r2)12 9 3 0 tt 3 t1 ←6= (r3, r4)

12 9 3 0 tt 4 4?← t1

12 9 3 0 tt 5 r1 ← r29 9 3 0 tt 6 r2 ← r3

9 3 3 0 tt 7 -5?← t1

9 3 3 0 tt 2 r3 ← mod(r1, r2)9 3 0 0 tt 3 t1 ←6= (r3, r4)

9 3 0 0 ff 4 4?← t1

9 3 0 0 ff 8

19.3 Constructing Programs

Unlike while programs, we cannot construct VRM programs from other VRM programs; we canonly build VRM programs by sequential composition of individual VRM instructions. In thissection, we consider how we can design operations to construct VRM programs from existingVRM programs.

19.3.1 Operations on Programs

Program Start point The label of the first instruction of a VRM program conventionallyindicates the starting point for execution. We can establish this with the operation

Start : Prog → N

which we define byStart(l.Il , . . . , l+λ.Il+λ) = l

Program Length The number of instructions present in a program determines its syntacticlength. This is not necessarily the same as the number of program instructions that will beexecuted, because jump instructions can determine the order in which instructions are executed.

The syntactic length of a program is given by the operation

Length : Prog → N

19.3. CONSTRUCTING PROGRAMS 739

which we define byLength(l.Il , . . . , l+λ.Il+λ) = λ

Analysing Register Access A useful operation on a VRM program is to be able to deter-mine which registers affect its behaviour.

We want to define operations

UsedDataReg : Prog → P(DataReg)UsedTestReg : Prog → P(TestReg)

such that

UsedDataReg(P) = the data registers used in the program P

UsedTestReg(P) = the test registers used in the program P .

We shall also want to distinguish between those registers which are written to, and those whichare read from. So, we want operations:

WrittenDataRegisters : Prog → P(DataReg)ReadDataRegisters : Prog → P(DataReg)

WrittenTestRegisters : Prog → P(TestReg)ReadTestRegisters : Prog → P(TestReg)

such that

UsedDataReg(P) = WrittenDataRegisters(P) ∪ ReadDataRegisters(P)

UsedTestReg(P) = WrittenTestRegisters(P) ∪ ReadTestRegisters(P)

Note that a register may be both read from and written to, during the execution of a program.An obvious method for calculating which registers are used is to introduce operations

WrittenDataReg : Instrn → P(DataReg)ReadDataReg : Instrn → P(DataReg)

WrittenTestReg : Instrn → P(TestReg)ReadTestReg : Instrn → P(TestReg)

on individual instructions, such that

WrittenDataReg(I ) = the data register written to by the instruction I

ReadDataReg(I ) = the data register read by the instruction I

WrittenTestReg(I ) = the test register written to by the instruction I

ReadTestReg(I ) = the test register read by the instruction I .

We define these operations for any data register ri1 , . . ., rin , ri , rj ∈ DataReg , test register ti1 ,


ti2 , . . ., ti , tj , and n ∈ Z, by:

Instruction WrittenDataReg ReadDataReg WrittenTestReg ReadTestRegrj ← ri rj ri ∅ ∅n

?← ti ∅ ∅ ∅ ∅. . . , rj ← c rj ∅ ∅ ∅rj ← f (ri1 , . . . , rin ) rj ri1 , . . . , rin ∅ ∅tj ← rel(ri1 , . . . , rim ) ∅ ri1 , . . . , rim tj ∅tj ← true ∅ ∅ tj ∅tj ← false ∅ ∅ tj ∅tj ← not(ti) ∅ ∅ tj titj ← and(ti1 , ti2 ) ∅ ∅ tj ti1 , ti2tj ← or(ti1 , ti2 ) ∅ ∅ tj ti1 , ti2

Thus,

WrittenDataRegisters(l .Il , . . . , l + λ.Il+λ) =l+λ⋃

k=l

WrittenDataReg(Ik)

ReadDataRegisters(l .Il , . . . , l + λ.Il+λ) =l+λ⋃

k=l

ReadDataReg(Ik)

WrittenTestRegisters(l .Il , . . . , l + λ.Il+λ) =l+λ⋃

k=l

WrittenTestReg(Ik)

ReadTestRegisters(l .Il , . . . , l + λ.Il+λ) =l+λ⋃

k=l

ReadTestReg(Ik).

19.3.2 Building Programs

19.4 Properties

list of lemmas that satisfies property x (e.g., iso inv thm) (without proof)If a register other than the program counter does not appear in an VRM program P , then

its value is unchanged by the execution of P :

Lemma (Register Functionality) Let r ∈ N be a register and P ∈ VRMProg(∆) such thatr 6∈ Reg(P). Then for any state τ ∈ VRMState(B),

(M Bio(P)(τ))(r) = τ(r).

Proof By inspection of the semantics of VRM programs. 2

We can strengthen this observation to cover registers that are not used as destination reg-isters:

Lemma (Non-destination Register Functionality) Let r ∈ N be a register. Let P ∈VRMProg(∆) be a program such that r 6∈ RegDest(P). Then for any state τ ∈ VRMState(B),

(M Bio(P)(τ))(r) = τ(r).

19.4. PROPERTIES 741


Looking at the situation from the converse view, if a register other than the program counterdoes not appear in an VRM program P , then its value will not affect the execution of P :

Lemma (Non-Participating Register Preservation) Let r ∈ N be a register and P ∈VRMProg(∆) such that r 6∈ Reg(P). Then for any state τ ∈ VRMState(B) and any valuev ∈ B,

M Bio(P)(τ [v/r ]) = M D

io (P)(τ).


Again, we can strengthen this observation, this time to cover registers that are not used asorigin registers:

Lemma (Non-Origin Register Preservation) Let r ∈ N be a register and P ∈ VRMProg(∆)such that r 6∈ RegOrig(P). Then for any state τ ∈ VRMState(B) and any value v ∈ B,

M Bio(P)(τ [v/r ]) = M D

io (P)(τ).


If we are executing a program P in which r does not occur in the source registers of P ,then we can delay substituting a value v in r . If r is not in the destination registers of Peither, then we can delay the substitution of v in r until after the execution of P . If r is in thedestination registers of P , then it may be overwritten by the execution of P . In particular, ifthe program exhibits sequential instruction behaviour, we are guaranteed that any value in rwill be overwritten by executing P : in this situation, we need not perform the substitution ofv in r .

Lemma (Delaying Substitution)

M Bio(P)(τ [v/r ]) =

M Bio(P)(τ)[v/r ] if r 6∈ P;

M Bio(P)(τ) if r 6∈ source(P ), r ∈ dest(P)

and P exhibits sequential instruction behaviour.

Proof ... 2

If a jump instruction does not appear in an VRM program P , and the value of the programcounter is set to the label of the first instruction of P , then the program will terminate, and itwill have executed the instructions of P sequentially, starting at the first instruction.

Lemma (Sequential Instruction Behaviour) Let P be a program

m. Imm + 1 . Im+1

...n. In

in which the instructions . . . , Ij 6= d?← o1 , o2 , . . .. Then, for any set ρ ∈ [N→ B ] of registers,

M Bio(P)((m, ρ)) ↓ (n + 1 , ρ′)


whereM B

io(n.In) · · · (M Bio(m + 1 .Im+1 )(M B

io(m.Im)((m, ρ)))) = (n + 1 , ρ).

Proof By induction on the length of programs. 2

A jump instruction which uses the same origin registers will act as an unconditional jumpinstruction:

Lemma (Unconditional Jump Behaviour) For any set ρ ∈ [N→ B ] of registers,

M Bio(l .n

?← m,m)((l , ρ)) = (l + n, ρ)

Proof By the semantics of jump instructions. 2

If we want to preserve the values stored in certain registers, we can ensure that they arerecoverable after the execution of a program P , by copying the required values to registers thatare not used by P :

Lemma (Register Restoration) Let P be a program which always terminates with the valueof the program counter set to Last(P) + 1 . Let d1 , . . . , dn 6∈ Reg(P) be register values that donot appear in P. Let P ′ be a program:

1. d1 ← o1

......

n. dn ← on

n+1....

n+l.

P

n + l + 1. o1 ← d1

......

2n+l. on ← dn

Then for any state τ ,

M Bio(P

′)(τ) = M Dio (P)(τ)[τ(o1 )/o1 ] · · · [τ(on)/on ][τ(o1 )/d1 ] · · · [τ(on)/dn ][2n + l/PC ].

Proof By the Register Functionality Lemma and the semantics of copy instructions. 2

If we change the values held in two distinct program registers, it does not matter in whichorder we make the substitutions. If however, we make more than one change to the value heldin a particular program register, then the latest substitution will over-ride any previous ones:

Lemma (Substitution Lemma) For any state τ ∈ VRMState(D) any registers r , r ′ ∈ N,and any values v , v ′ ∈ D:

τ [v/r ][v ′/r ′] =

τ [v ′/r ′][v/r ] if r 6= r ′;

τ [v ′/r ′] otherwise.

19.4. PROPERTIES 743

Proof Let r ′′ ∈ N be an arbitrary register. The proof is by the definition of register evaluationand substitution:

(τ [v/r ][v ′/r ′])(r ′′) =

v ′ if r ′ = r ′′;

τ [v/r ](r ′′) otherwise.

=

v ′ if r ′ = r ′′;

v if r = r ′′, r ′ 6= r ′′;

τ(r ′′) if r 6= r ′ 6= r ′′.

=

v if r = r ′′, r ′ 6= r ′′;

τ [v ′/r ′](r ′′) if r 6= r ′.

=

τ [v ′/r ′][v/r ](r ′′) if r 6= r ′;

τ [v ′/r ′](r ′′) otherwise.

2

The behaviour of a program on a set of registers is not affected by the start point of theinstruction labels; if the start point and the program counter coincide, the program behaviouris independent of the precise value for the first label:

Lemma (Relabelling Invariance) For any program P, labels l ,m, n ∈ N and registers ρ ∈VRMState(B),

M Bio(P)((m, ρ)) ' (n, ρ′)⇒ M B

io(Relabel(l ,P))((l , ρ)) ' (n −m + l , ρ′).


We execute a program whilst the program counter points to a program instruction:

Lemma (Execution Cessation) For any program P ∈ VRMProg(∆) with l ≤ Start(P) orl > Last(P),

M Bio(P)((l , ρ)) = (l , ρ)


The behaviour of an VRM program is determined by the composition of the behaviour ofsingle instructions:

Lemma (Single Instruction Execution) For any program P = . . . , l .Il , . . . ∈ VRMProg(∆)and label l ∈ N with Start(P) ≤ l ≤ Last(P), and any set ρ ∈ [N→ B ] of registers,

M Bio(P)((l , ρ)) = M B

io(P)(M Bio(l .Il)((l , ρ))).


Whilst we execute an VRM program one instruction at a time, we typically want to reasonabout an VRM program using larger steps, i.e., in terms of more than one instruction at a time:

Lemma (Execution Decomposition) For any VRM programs P1 , P2 ∈ VRMProg(∆), andany states τ1 , τ2 , τ3 ∈ VRMState(D),

M Bio(P1 )(τ1 ) ' τ2 and M B

io(P2 )(τ2 ) ' τ3 ⇒ M Bio(P2 )(M B

io(P1 )(τ1 )) ' τ3 .

Proof By induction on the lengths of the programs P1 and P2 . 2

Chapter 20

Compiler Correctness

Let us reflect on the role of a programming language in the process of making computations.In Chapter 1, we explained our concept of a programming language: a programming languageis a formal description or definition of what might be loosely called a programming notation.A programming notation is used to describe algorithms and is based on some model of com-puting. A computation establishes a level of abstraction through its choice of data, algorithmicoperations and control.

In even the simplest of computations, several distinct levels of abstraction, models of compu-tation and programming notations are involved. The complexities of computer implementationsrequire these different levels to manage their automation.

First, there are the levels of the problem domain. The computational problem or taskneeds to be formulated in terms of functions on data, and algorithms need to be chosen tocompute the functions. Secondly, there are the software levels: a programming language mustbe chosen, and specific programs written that encode the algorithms. These programs aresubsequently processed and executed by further software systems, including operating systemsand networks. Finally, there are the levels of the machine. These are also characterised byprograms and include assembly programs that are based on the machine architecture andmachine code programs.

The point is this:

The different models of computation define a hierarchy of different levels of compu-tational abstraction.The levels of computational abstraction are defined formally by means of specifica-tion and programming languages.

From the early days of computing, the problem of constructing abstractions and supportingthem by systematic methods of translating between each level of abstraction has been a funda-mental problem that has led directly to fundamental discoveries (recall Chapter 2). Levels ofabstraction are ubiquitous in computing. Although widely used, they are not well understoodtheoretically.

Indeed, the same can be said of notions of abstraction in other sciences. Everywhere thereare mathematical models that have been designed to abstract different features of naturalphenomena at different levels of spatial or temporal abstraction. It is a difficult problem toshow that one model is an abstraction of another.

In fact, modelling with levels of abstraction is better understood in Computer Science thanelsewhere! One reason is that we have general frameworks — the programming languages —

745

746 CHAPTER 20. COMPILER CORRECTNESS

which define formally the levels of abstraction in general terms. Against this background, theidea of an

integrated hierarchy of levels of computational abstraction

qualifies as another Big Idea in Computer Science. In this final technical chapter we will explorethis idea in an extremely simple special case.

Compilation is ubiquitous in computing. Typically, compilation is associated with thetranslation from a high-level user-oriented language into a low-level machine-oriented language.However, compilation is much more general. Simply stated, a compiler translates each programP in one language to an “equivalent” program P ′ in another language. The theoretical problemis:

In what precise sense are the programs P and the compiled program P ′ equivalent?

Or, said differently,

What exactly does it mean for a compiler to be correct?

We will examine in very general terms what it means

• to compare different levels of abstraction, and

• to translate or compile one language into another.

Each of the types of semantics we have met — the input-output semantics, operational se-mantics, and machine semantics — require different criteria for correctness. We will show howto

• formulate correctness equations that express the fact that a compiler is correct; and

• structure the definition of a compiler on the basis of the structure of the languages.

Then we will apply this general discussion to the two main languages we have studied inthe next chapter.

20.1 What is Compilation?

A compiler is a translator between programming languages. In particular, it takes programswritten in one language, the source language, and translates them into programs that are inanother language, the target language.

Whilst a compiler simply processes syntax, we cannot define the purpose of a compilerwithout considering semantics. A compiler needs to “preserve” semantics: the behaviour of aprogram and its compiled version should be “equivalent”.

Thus, for any program P , and any state σ, we need

the execution of the program P on the state σ

to be “equivalent” to

the execution of the compiled version P ′ of the program P on a state σ′ equivalent to σ.

20.1. WHAT IS COMPILATION? 747

Let us model the situation. Suppose we have sets

ProgS and ProgT

of source and target programs, and a function

Compile : ProgS → ProgT

from source programs in ProgS to target programs in ProgT .We want to compare the behaviours of source programs P ∈ ProgS and target programs

Compile(P) ∈ ProgT .

20.1.1 Input-Output Semantics

Suppose both source and target languages have input-output semantics. In the case of input-output semantics, we have functions

M Sio : ProgS → [StateS ; StateS ]

M Tio : ProgT → [StateT ; StateT ]

that describe how ProgS -programs and ProgT -programs behave in terms of inputs and outputsonly. We must compare the input-output behaviour

M Sio(P) : StateS ; StateS

of a source program P ∈ ProgS , and the input-output behaviour

M Tio (Compile(P)) : StateT ; StateT

of the compiled program Compile(P) ∈ ProgT . In particular, we shall want

M Sio(P) and M T

io (Compile(P))

to produce equivalent results if they are executed on equivalent inputs.

Comparing States

To describe this equivalence, we need first to consider the set of states. Suppose that the sourceprograms compute over some set

StateS

of states, and the target programs over some set

StateT

of states. We need some means

Encode : StateS → StateT

of relating the state sets, such that for any state σ ∈ StateS ,

σ and Encode(σ)


represent equivalent input states. We will also find it useful to have an inverse function

Decode : StateT → StateS

such that for any state τ ∈ StateT ,

τ and Decode(τ)

represent equivalent states. Thus,

Decode(Encode(σ)) = σ

but interestingly,Encode(Decode(τ)) 6= τ

in many cases. Mathematically, this means that Encode is an injective map with left-inverseDecode or, equivalently that Decode is a surjective map with right-inverse Encode. This isbecause

(i) all states in StateS must have some representative in StateT ; and

(ii) for any source state, there need not be a unique representative state in StateT .

So, if we execute the source program on a state σ, this should give us an equivalent outputstate to that of executing the compiled program on an equivalent input state Encode(σ) asshown in Figure 20.1.

StateS StateS

Decode

StateTStateT

Encode

MTio(Compile(P ))

MSio(P )

Figure 20.1: Commutative diagram illustrating the correctness of a compiler for a single pro-gram P ∈ ProgS .

This gives us a definition of the correctness of compiling a single program; for the correctnessof a compiler, we want to consider all possible programs. We need to compare the executionsof

M Sio(P)(σ) and M T

io (Compile(P))(Encode(σ))

on equivalent input states. We want their outputs to be equivalent.

Definition (Compiler Correctness for Input-Output Semantics) The compiler


is said to be correct with respect to the input-output semantics M Sio and M T

io and state trans-formation maps Encode and Decode, if for all programs P ∈ ProgS and all states σ ∈ StateS ,

Correctness Equation M Sio(P)(σ) ' Decode(M T

io (Compile(P))(Encode(σ)))

Equivalently, we require the diagram shown in Figure 20.2 to commute.

20.1. WHAT IS COMPILATION? 749

[StateSProgSMS

io

Compile Encode

[StateTProgTMT

io

StateS]

Decode

StateT ]

Figure 20.2: Commutative diagram illustrating compiler correctness for input-output semantics.

20.1.2 Operational Semantics

In the case of operational semantics, we have functions

CompS : ProgS × StateS × TimeS → StateS

CompT : ProgT × StateT × TimeT → StateT

that describe how ProgS -programs and ProgT -programs behave. We must compare the be-haviour

CompS (P , σ, t)

of a source program P ∈ ProgS , over all states σ ∈ StateS and all time cycles t ∈ TimeS , withthe behaviour

CompT (Compile(P), τ, r)

of the compiled program Compile(P) ∈ ProgT on certain states τ ∈ StateT and at certain timecycles r ∈ TimeT . This comparison is much more concrete and technically sensitive than thatfor input-output semantics because of the role of time in operational semantics. However, wecan follow the ideas we used for input-output semantics.

For a program P to be correctly compiled, we need its behaviour

Comp(P , σ, t)

to be equivalent to someComp(Compile(P), τ, r)

where τ and r are to be determined from P , σ and t in some way. Following the case forinput-output semantics, we assume some means of relating the state sets by functions

Encode : StateS → StateT

andDecode : StateT → StateS .

We can assume that Encode(σ) = τ .The question of relating timing is new and we assume some function

λ : ProgS × StateS × TimeS → TimeT

to calculater = λ(P , σ, t).


Definition (Compiler Correctness for Operational Semantics) The compiler


is said to be correct with respect to the operational semantics CompS and CompT , statetransformation maps Encode and Decode, and time transformation maps λ, if:CorrectnessEquation

CompS (P , σ, t) = Decode(CompT (Compile(P),Encode(σ), λ(P , σ, t))

for all programs P ∈ ProgS , all states σ ∈ StateS and all time cycles t ∈ TimeS . Equivalently,we require the diagram shown in Figure 20.3 to commute, in which the function

Φ : (ProgS × StateS × TimeS )→ (ProgT × StateT × TimeT )

is defined byΦ(P , σ, t) = (Compile(P),Encode(σ), λ(P , σ, t)).

CompS

CompT

ProgS × StateS × T imeS

ProgT × StateT × T imeT

Φ

StateS

Decode

StateT

Figure 20.3: Commutative diagram illustrating compiler correctness for operational semantics.

20.2 Structuring the Compiler

We now have two definitions of the correctness of a compiler, one for input-output semanticsand one for operational semantics, but how are we to define Compile? We want a systematicway of translating all the programs in ProgS that we can model to help us

(i) implement a compiler; and

(ii) reason about the behaviour of a compiler, for example, to prove its correctness.

20.2.1 Defining a Compiler

From the concrete syntax of a high level language, an abstract syntax is derived that definesthe semantically meaningful constructs. The abstract syntactic structure of the higher levellanguage is used to structure the definition of the compiler. We shall define a compiler bystructural induction on the abstract syntax of the higher level programming language.

Suppose the abstract syntax of ProgS is defined by some atomic programs and someprogram-forming operations that put together old programs to create new ones. Thus, if asource program

P ∈ ProgS

20.2. STRUCTURING THE COMPILER 751

is dependent on source sub-programs

P1 , . . . ,Pn ∈ ProgS ,

via some operator, then we shall construct the compiled version

Compile(P) ∈ ProgT

of P from compiled versions

Compile(P1 ), . . . ,Compile(Pn) ∈ ProgT

of P1 , . . . ,Pn .If a source program is atomic it is independent of any sub-programs, then we shall need to

translate it into a target program that is independent of any other compiled source programs.If we are to automate the compilation though, the way in which we put together the target

sub-programsCompile(P1 ), . . . ,Compile(Pn)

to yield Compile(P) must be systematic. Thus, for each source program

P

that is atomic, we shall need a target program

Compile(P)

which is independent of any other compiled source program.For each way in which we can construct a source program

P = f S (P1 , . . . ,Pn)

by means of a program-forming operator f S , from sub-components P1 , . . . ,Pn , we shall con-struct a target program

Compile(P) = f T (Compile(P1 ), . . . ,Compile(Pn))

from sub-components Compile(P1 ), . . ., Compile(Pn) using target language “glue” f T . Thetarget programs for atomic source programs and the operators will not necessarily be built-intarget language constructs: they may require many individual target language constructs.

20.2.2 Algebraic Model of Compilation

This ideal view of compiler construction is algebraic. We have, or must build,

an algebra of source programs and an algebra of target programs.

In particular, the algebra of syntax of source programs and the algebra of syntax of compiledtarget programs both have the same signature.

A compiler translates programs, so it acts on syntax. In particular it acts on programlanguage syntax. What do we know about the syntax of programming languages that we can


use to define a compiler? Recall from Chapter 14 that although most programming languagesare not context-free, it can be helpful to use a context-free grammar to describe the underlyingstructure of a language. In Chapter 8, we considered abstract syntax algebraically at somelength. We saw how context-free grammar structures can also be modelled algebraically asterm algebras, so ProgS can be expected to be structured algebraically.

As Compile is a function defined by structural induction which is between algebras of thesame signature, Compile is a homomorphism.

20.2.3 Structuring the States

Our definitions of compiler correctness all depend on some notion of equivalence between sourceand target states. A state of a computation stores data, so to compare a source state σ ∈ StateS

and a target state τ ∈ StateT , we need to know

(i) where the data that is stored in σ is stored in τ ; and

(ii) how we are to compare the data.

Thus, given a location x in a source state StateS which stores the value a, we need to knowthe location y in the target state StateT that is meant to perform the same task. And we needa notion of equivalence on the data that is stored in there.

From an algebraic point of view, we have or must build,

an algebra of source states and an algebra of target states.

If the algebra of source states has operations of access and store data, then the algebra of targetstates will need operations of accessing and storing data. In particular, accessing equivalentlocations should yield equivalent data values.

20.3 Proof Techniques

If a compiler is defined by structural induction, we can exploit this to reason about its correct-ness. As a basis, we consider first the correctness of the compiler on atomic source programs.Then we use structural induction to reason about the correctness of the compiler on programsthat are constructed from sub-components.

20.3.1 Other Formulations of Correctness

The definitions of compiler correctness for both input-output semantics and operational seman-tics are intuitive. They compare the behaviour of source programs and their compiled versions.There are other ways of representing this principle.

A more general definition of compiler correctness is based on the observation that there maybe many target states that are equivalent to a given source state. For example, if the targetstate has a program counter, then does it matter what value the program counter is initiallyset to, provided that it is the same as the first instruction of the program that it is meant toexecute?

In the definitions of correctness given in Sections 20.1.1 and 20.1.2, the target state may beover-specified by the function Encode : StateS → StateT . This can be rectified for example, by

20.3. PROOF TECHNIQUES 753

relaxing Encode from a function to a relation between source and target states. Alternatively,the function Decode can be used to relate both the input source and target states, as well asthe output source and target states.

20.3.2 Data Types

When we are comparing the behaviour of two programs, ultimately the comparison is on thedata that is manipulated.

If the source programs operate over some data type A and the target programs over somedata type B , then at some point we shall need to compare values

a1 , a2 , . . . ∈ A

that are produced by source programs, and values

b1 , b2 , . . . ∈ B

that are produced by target programs.The purpose of the compiler is to be able to produce the values b1 , b2 , . . . that are equivalent,

in some sense, to the values a1 , a2 , . . .. At this point we can separate out two concerns:

1. How do we compare values from A and B?

2. How do we construct operations on B : given values b1 , . . . , bn that are equivalent toa1 , . . . , an , how can we produce a value that is equivalent to f A(a1 , . . . , an), but by usingb1 , . . . , bn

From an algebraic point of view, we have or must build,

an algebra of source data and an algebra of target data.

20.3.3 Recursion and Structural Induction

If we define a function on the natural numbers using recursion, we can reason about it by usinginduction. More generally, if we define a function on some data type by using recursion, wecan reason about it by using structural induction. If we have a function between two datatypes that have the same structure, and which is defined by recursion, this function will be ahomomorphism. Thus, we can reason about a homomorphism using using structural induction.

From an algebraic point of view, we have algebras of syntax, data and state. The com-piler can be modelled by a homomorphism, and so we can reason about its correctness usingstructural induction.

Suppose the algebra of source programs has

(i) constants . . . , cS , . . . representing source programs that are independently constructed,and

(ii) program constructing functions . . . , f S , . . . which create new source programs

. . . , f S (P1 , . . . ,Pn), . . .

from existing source programs P1 , . . . ,Pn .


Then we define the compiler Compile recursively:

. . . ,Compile(cS ) = cT , . . .

. . . ,Compile(f S (P1 , . . . ,Pn)) = f T (Compile(P1 ), . . . ,Compile(Pn)), . . .

where . . . , cT , . . . are constants representing target programs, and . . . , f T , . . . are target pro-gram constructing functions. And we reason about the behaviour of Compile using structuralinduction:

1. We prove that Compile behaves as required on the base cases . . . , cS , . . ..

2. We assume the structural induction hypotheses that Compile behaves as required on theprograms P1 , . . . ,Pn .

3. Using the structural induction hypotheses, we prove that Compile behaves as required onthe cases . . . , f S (P1 , . . . ,Pn), . . ..

20.4 Comparing while Statements and Register Machine

Programs

We will now start our case study of a compiler that translates the while programming languageWP to the register machine language VRMProg . Following our general discussion, we will:

1. Choose an abstract syntax for WP .

2. Compare data types in WP and VRMProg , choose the input-output semantics for WPand the input-output semantics derived from the machine semantics of VRMProg , anddefine appropriate maps Encode and Decode between state spaces, to obtain a precisecompiler correctness criterion.

3. Define a mapCompile : WP → VRMProg

using structural induction on the abstract syntax of WP .

4. Prove Compile is correct under the compiler correctness criterion.

We will also

5. Establish algebraic structures for the syntax of the languages so that

Compile : WP → VRMProg

is a homomorphism.

What are the differences between the two languages? Essentially, they are as follows.

Data Types Because we have emphasised the role of abstract user-defined data types in thenotion of a virtual machine, our register machine language VRMProg and while language bothoperate over arbitrary data, modelled by signatures and algebras.

20.5. MEMORY ALLOCATION 755

Atomic Statements In WP , we have skip and assignments x :=e with arbitrary expressionevaluation. In VRMProg , we have the copy instructions x ← y and instructions that performsingle applications of atomic data type operations x ← c and x ← f (y1 , . . . , yn). Clearly, incomputing, we will need to construct register machine programs to evaluate expressions.

Control Statements In WP , we have constructs that control arbitrary subprograms, andin VRMProg instructions are scheduled one at a time. For example, in WP , we can write

S1 ;S2

which schedules “blocks“ of constructs S1 and S2 ; in VRMProg , we can write lists of singleinstructions.

In WP , we can writeif b then S1 else S2 fi

andwhile b do S0 od

which evaluates arbitrary Boolean-valued tests and then schedules “blocks” of constructs, byselection and iteration respectively.

In VRMProg , there is the unconstrained jump control construct n?← b1 , b2 that allows any

choice of next instruction. Clearly, in compilation, we will need to construct register machineprograms to evaluate Boolean expressions and reschedule blocks of instructions.

20.5 Memory Allocation

20.5.1 Data Equivalence

Both our while programs and register machine programs compute over arbitrary data types.Let us fix arbitrary data types A with the Booleans for while programs and D with the Booleansfor register machine programs to compute over.

We shall suppose that any value we can compute in A, we can compute an equivalent valuein D . The syntactic representations of these values need not be the same, but we shall wanttheir values to be the same under some representation.

We shall use functionsEncodeData : A→ DDecodeData : D → A

to convert between the data types A and D , such that

(i) EncodeData is a surjective function;

(ii) DecodeData(EncodeData(a)) = a; and

(iii) EncodeData is a homomorphism.


20.5.2 Memory Allocation and Structuring the Register MachineStates

Recall from ..., that in a register machine

(pc, ρ) ∈ VRMState(D)

we have a program counter pc ∈ N and an infinite number of registers

ρ ∈ [Reg → D ]

in which we can access and update data of type D according to the instructions of the VRMprogram.

Because we want to construct VRM programs to simulate while programs, we shall imposean artificial construction on VRM architectures to help us simulate the architecture of while

program machines. We think of while programs that compute over some data type A asoperating over an abstract state space

State(A) = [Var → A].

We shall reserve some of the registers of the VRM to store the data associated with while

program variables. We shall also categorise the remaining VRM registers according to differenttasks that we shall need to do to simulate the execution of while programs.

Register Allocation

LetAllocate : Var → RegD

Var

Restore : RegDVar → Var

be mappings between the while variables and their VRM registers, such that Allocate is aninjective function and Restore is a surjective function, with

Restore(Allocate(x )) = x .

We shall distinguish in the register machine between four types of register:

(i) the registerPC

that we use to storeVRMState(D)(PC ) ∈ N

the value of the program counter;

(ii) the registersRegD

Var

that we use to storeVRMStateVar (D) = [RegD

Var → D ]

the values of variables appearing in a while program;


(iii) the registersRegD

IO and RegBIO

that we use to store

VRMStateIO(D) = [RegDIO → D ]× [RegB

IO → B]

input and output data and Booleans; and

(iv) the registersRegD

Temp and RegBTemp


VRMStateTemp(D) = [RegDTemp → D ]× [RegB

Temp → B]

values on a temporary basis.

This gives us the set

Reg = PC ∪ RegDVar ∪ RegD

IO ∪ RegBIO ∪ RegD

Temp ∪ RegBTemp

of registers, which we use to define the set

VRMState(D) = [PC → N]× VRMStateVar (D)× VRMStateIO(D)× VRMStateTemp(D)

of VRM states.

Retrieving Values

An VRM stateτ = (pc, ρVar , ρIO , ρTemp)

consists of a collection of stores.

(i) We use the registerPC

to store the valuepc = τ(PC )

of the program counter.

(ii) We use the storeρVar : VRMStateVar (D)

for the valuesρVar (Allocate(x )) ∈ D

of the while program variables x ∈ Var .


(iii) A storeρIO : VRMStateIO(D)

of input and output values consists of

ρIO = (ρDIO , ρ

BIO)

whereρDIO : [RegD

IO → D ]

is a store for data andρDIO : [RegD

IO → D ]

is a store for Booleans.

We use ρDIO to store the values

ρDIO(inD

i ) ∈ D

of data input registers inDi ∈ RegD

IO , and the values

ρDIO(outD

i ) ∈ D

of data output registers outDi ∈ RegD

IO .

We use ρBIO to store the values

ρBIO(inB

i ) ∈ B

of Boolean input registers inBi ∈ RegB

IO , and the values

ρBIO(outB

i ) ∈ B

of Boolean output registers outBi ∈ RegB

IO .

(iv) A storeρTemp : VRMStateTemp(D)

of temporary working values consists of

ρTemp = (ρDTemp , ρ

BTemp)

whereρDTemp : [RegD

Temp → D ]

is a store for data andρDTemp : [RegD

Temp → D ]


We use ρDTemp to store the values

ρDTemp(wkD

i ) ∈ D

of data registers wkDi ∈ RegD

Temp during calculations and ρBTemp to store the values

ρDTemp(wkB

i ) ∈ B

of Boolean registers wkBi ∈ RegB

Temp during calculations.

We want to define what it means for while states to be equivalent to VRM states.


20.5.3 State Equivalence

We shall define a function

Encode : State(A)→ VRMState(D)

so thatEncode(σ) = (1 , ρVar , ρIO , ρTemp)

gives an VRM program state equivalent to σ: in particular, we set the program counter to be1 and we initialise the set ρD

Var ⊂ ρ of register values so that the value of a variable in a while

program state and the value of a register used to store the corresponding variable, containequivalent values:

ρDVar (Allocate(x )) = EncodeData(σ(x )).

Thus, for any state τ ∈ VRMState(D) and any variable x ∈ Var , we require that

Decode(τ)(x ) = Decode(τ(Allocate(x ))).

We shall also define a function

Decode : VRMState(D)→ State(A)

so thatDecode(τ) = σ

gives an equivalent while program state. In particular, we are interested only in the correspon-dence of those registers of the VRM state that we use to store the values of compiled while

program variables:

(∀x ∈ Var)[τ(Allocate(x )) = EncodeData(σ(x ))]⇔ Decode(τ) = σ

for any states σ ∈ State(A), τ ∈ VRMState(D).We can substitute a value for a variable on a decoded target state, or we can translate

the state that results from substituting the equivalent value on the appropriate register on thetarget state:

Lemma (State Substitution Transfer) For any variable x ∈ Var, values v ∈ A and d ∈ D,and state τ ∈ VRMState(D),

(i) (Decode(τ))[v/x ] = Decode(τ [EncodeData(v)/Allocate(x )]).

(ii) (Decode(τ))[Decodedata(d)/x ] = Decode(τ [d/Allocate(x )]).

Proof (i) Choose a variable y ∈ Var . Then by the definition of the substitution function onwhile states,

(Decode(τ))[v/x ](y) =

v if x = y ;

(Decode(τ))(y) otherwise.

And by the definition of Decode on the evaluation function on states,

= Decodedata(τ(Allocate(y))).

As Allocate is an injective function, and by the definition of the substitution function onVRM states,

= Decodedata(τ [EncodeData(v)/Allocate(x )])(Allocate(y)).


(ii) Similarly.2

Observe thatDecode(Encode(σ)) = σ

but thatEncode(Decode(τ)) = τ ⇔ τ(PC ) = 1 .

The execution of a compiled while program, is independent of the registers other than theprogram counter and those used to store variable values:

Chapter 21

Compiler Verification

We have examined in very general terms what it means to compile one language into another,and in particular,

What exactly does it mean for a compiler to be correct?

We have shown how to formulate correctness equations that express the fact that a compileris correct. Now we will apply this general discussion to the two main languages we have studied.We think of the while language as a high-level language that may be compiled into the registerlanguage, which is based on lower level data and control constructs. We will show in detail howto

• define a compiler between the two languages using structural induction; and

• prove its correctness equations using structural induction.

Thus, in this final chapter we will use many of the concepts and techniques we have developedin our studies of data, syntax and semantics.

21.1 Comparing while Statements and Register Machine

Programs

We will now start our case study of a compiler that translates the while programming languageWP to the register machine language VRMProg . Following our general discussion, we will:

1. Choose an abstract syntax for WP .

2. Compare data types in WP and VRMProg , choose the input-output semantics for WPand the input-output semantics derived from the machine semantics of VRMProg , anddefine appropriate maps Encode and Decode between state spaces, to obtain a precisecompiler correctness criterion.

3. Define a mapCompile : WP → VRMProg

using structural induction on the abstract syntax of WP .

763

764 CHAPTER 21. COMPILER VERIFICATION

4. Prove Compile is correct under the compiler correctness criterion.

We will also

5. Establish algebraic structures for the syntax of the languages so that

Compile : WP → VRMProg

is a homomorphism.

What are the differences between the two languages? Essentially, they are as follows.

Data Types Because we have emphasised the role of abstract user-defined data types in thenotion of a virtual machine, our register machine language VRMProg and while language bothoperate over arbitrary data, modelled by signatures and algebras.

Atomic Statements In WP , we have skip and assignments x :=e with arbitrary expressionevaluation. In VRMProg , we have the copy instructions x ← y and instructions that performsingle applications of atomic data type operations x ← c and x ← f (y1 , . . . , yn). Clearly, incomputing, we will need to construct register machine programs to evaluate expressions.

Control Statements In WP , we have constructs that control arbitrary subprograms, andin VRMProg instructions are scheduled one at a time. For example, in WP , we can write

S1 ;S2

which schedules “blocks“ of constructs S1 and S2 ; in VRMProg , we can write lists of singleinstructions.

In WP , we can writeif b then S1 else S2 fi

andwhile b do S0 od

which evaluates arbitrary Boolean-valued tests and then schedules “blocks” of constructs, byselection and iteration respectively.

In VRMProg , there is the unconstrained jump control construct n?← b1 , b2 that allows any

choice of next instruction. Clearly, in compilation, we will need to construct register machineprograms to evaluate Boolean expressions and reschedule blocks of instructions.

21.2 Memory Allocation

21.2.1 Data Equivalence

Both our while programs and register machine programs compute over arbitrary data types.Let us fix arbitrary data types A with the Booleans for while programs and D with the Booleansfor register machine programs to compute over.


We shall suppose that any value we can compute in A, we can compute an equivalent valuein D . The syntactic representations of these values need not be the same, but we shall wanttheir values to be the same under some representation.

We shall use functionsEncodeData : A→ DDecodeData : D → A

to convert between the data types A and D , such that

(i) EncodeData is a surjective function;

(ii) DecodeData(EncodeData(a)) = a; and

(iii) EncodeData is a homomorphism.

21.2.2 Memory Allocation and Structuring the Register MachineStates

Recall from ..., that in a register machine

(pc, ρ) ∈ VRMState(D)

we have a program counter pc ∈ N and an infinite number of registers

ρ ∈ [Reg → D ]

in which we can access and update data of type D according to the instructions of the VRMprogram.

Because we want to construct VRM programs to simulate while programs, we shall imposean artificial construction on VRM architectures to help us simulate the architecture of while

program machines. We think of while programs that compute over some data type A asoperating over an abstract state space

State(A) = [Var → A].

We shall reserve some of the registers of the VRM to store the data associated with while

program variables. We shall also categorise the remaining VRM registers according to differenttasks that we shall need to do to simulate the execution of while programs.

Register Allocation

LetAllocate : Var → RegD

Var

Restore : RegDVar → Var

be mappings between the while variables and their VRM registers, such that Allocate is aninjective function and Restore is a surjective function, with

Restore(Allocate(x )) = x .

We shall distinguish in the register machine between four types of register:


(i) the registerPC

that we use to storeVRMState(D)(PC ) ∈ N

the value of the program counter;

(ii) the registersRegD

Var

that we use to storeVRMStateVar (D) = [RegD

Var → D ]

the values of variables appearing in a while program;

(iii) the registersRegD

IO and RegBIO


VRMStateIO(D) = [RegDIO → D ]× [RegB

IO → B]

input and output data and Booleans; and

(iv) the registersRegD

Temp and RegBTemp


VRMStateTemp(D) = [RegDTemp → D ]× [RegB

Temp → B]

values on a temporary basis.

This gives us the set

Reg = PC ∪ RegDVar ∪ RegD

IO ∪ RegBIO ∪ RegD

Temp ∪ RegBTemp

of registers, which we use to define the set

VRMState(D) = [PC → N]× VRMStateVar (D)× VRMStateIO(D)× VRMStateTemp(D)

of VRM states.

Retrieving Values

An VRM stateτ = (pc, ρVar , ρIO , ρTemp)

consists of a collection of stores.


(i) We use the registerPC

to store the valuepc = τ(PC )

of the program counter.

(ii) We use the storeρVar : VRMStateVar (D)

for the valuesρVar (Allocate(x )) ∈ D

of the while program variables x ∈ Var .

(iii) A storeρIO : VRMStateIO(D)

of input and output values consists of

ρIO = (ρDIO , ρ

BIO)

whereρDIO : [RegD

IO → D ]

is a store for data andρDIO : [RegD

IO → D ]


We use ρDIO to store the values

ρDIO(inD

i ) ∈ D

of data input registers inDi ∈ RegD

IO , and the values

ρDIO(outD

i ) ∈ D

of data output registers outDi ∈ RegD

IO .

We use ρBIO to store the values

ρBIO(inB

i ) ∈ B

of Boolean input registers inBi ∈ RegB

IO , and the values

ρBIO(outB

i ) ∈ B

of Boolean output registers outBi ∈ RegB

IO .

(iv) A storeρTemp : VRMStateTemp(D)

of temporary working values consists of

ρTemp = (ρDTemp , ρ

BTemp)


whereρDTemp : [RegD

Temp → D ]

is a store for data andρDTemp : [RegD

Temp → D ]


We use ρDTemp to store the values

ρDTemp(wkD

i ) ∈ D

of data registers wkDi ∈ RegD

Temp during calculations and ρBTemp to store the values

ρDTemp(wkB

i ) ∈ B

of Boolean registers wkBi ∈ RegB

Temp during calculations.

We want to define what it means for while states to be equivalent to VRM states.

21.2.3 State Equivalence

We shall define a function

Encode : State(A)→ VRMState(D)

so thatEncode(σ) = (1 , ρVar , ρIO , ρTemp)

gives an VRM program state equivalent to σ: in particular, we set the program counter to be1 and we initialise the set ρD

Var ⊂ ρ of register values so that the value of a variable in a while

program state and the value of a register used to store the corresponding variable, containequivalent values:

ρDVar (Allocate(x )) = EncodeData(σ(x )).

Thus, for any state τ ∈ VRMState(D) and any variable x ∈ Var , we require that

Decode(τ)(x ) = Decode(τ(Allocate(x ))).

We shall also define a function

Decode : VRMState(D)→ State(A)

so thatDecode(τ) = σ

gives an equivalent while program state. In particular, we are interested only in the correspon-dence of those registers of the VRM state that we use to store the values of compiled while

program variables:

(∀x ∈ Var)[τ(Allocate(x )) = EncodeData(σ(x ))]⇔ Decode(τ) = σ

for any states σ ∈ State(A), τ ∈ VRMState(D).We can substitute a value for a variable on a decoded target state, or we can translate

the state that results from substituting the equivalent value on the appropriate register on thetarget state:

21.3. DEFINING THE COMPILER 769

Lemma (State Substitution Transfer) For any variable x ∈ Var, values v ∈ A and d ∈ D,and state τ ∈ VRMState(D),

(i) (Decode(τ))[v/x ] = Decode(τ [EncodeData(v)/Allocate(x )]).

(ii) (Decode(τ))[Decodedata(d)/x ] = Decode(τ [d/Allocate(x )]).

Proof (i) Choose a variable y ∈ Var . Then by the definition of the substitution function onwhile states,

(Decode(τ))[v/x ](y) =

v if x = y ;

(Decode(τ))(y) otherwise.

And by the definition of Decode on the evaluation function on states,

= Decodedata(τ(Allocate(y))).

As Allocate is an injective function, and by the definition of the substitution function onVRM states,

= Decodedata(τ [EncodeData(v)/Allocate(x )])(Allocate(y)).

(ii) Similarly.2

Observe thatDecode(Encode(σ)) = σ

but thatEncode(Decode(τ)) = τ ⇔ τ(PC ) = 1 .

The execution of a compiled while program, is independent of the registers other than theprogram counter and those used to store variable values:

21.3 Defining the Compiler

Recall that while statements fall into two categories:

(i) atomic statements (skip and assignments);

(ii) composite statements constructed from simpler statements, that are concerned with flow-of-control (sequencing, conditionals and iteration).

When we compile the atomic statements, their compilation will be independent of anyother statements. But because assignment statements are constructed from expressions, thecompilation of an assignment statement x :=e will be dependent on the compilation of theexpression e.

Thus, the VRM program we produce for Compile(x :=e) will be constructed from basicVRM instructions and the VRM program CompileExp(e) for the expression e.

The constructed statements by their nature are dependent on other statements, so thecompilation of a constructed statement will be dependent on the compilation of its component


statements. Thus, the compilation Compile(S1 ;S2 ) of a sequenced statement will depend onthe compilation Compile(S1 ) and Compile(S2 ) of its components S1 and S2 .

Similarly, the compilation Compile(if b then S1 else S2 fi) of a conditional statement willdepend on the compilation Compile(S1 ) and Compile(S2 ) of its component statements, but inaddition will depend on the compilation CompileBExp(b) of its component Boolean expressiontest b.

And the compilation Compile(while b do S0 od) of an iterative statement will depend onthe compilation CompileBExp(b) of its Boolean test b and the compilation Compile(S0 ) of itsbody S0 .

We will construct the compiled programs Compile(S ) from the compilation of the con-stituent elements of the statement S as described above. We join the compiled components bysequencing them and basic VRM program instructions together.

21.3.1 Compiling the Identity Statement

The identity, orskip

statement simply requires that no variable values are altered by its execution. An algorithmfor the compiled code is:

Compare the value held in the Boolean register out B1 to itself.

Increment the program counter register by 1.

We compile the skip statement by:

Compile(skip) = 1 . +1?← (outB

1 , outB1 )

This simply forces the program counter to be incremented by one, which will trigger the next(if any) program instruction be executed. No registers other than the program counter areupdated, and it is simply the Boolean output register outB

1 that is (artificially) consulted todetermine the execution of the instruction. Note that the value held in the register outB

1 doesnot affect the execution as is shown by the execution trace:

PC outB1 comment

1 ? initial state2 ? program counter is always incremented

21.3.2 Compiling Assignment Statements

We compile an assignment statementx := e

by compiling the expression e, then setting the value of the register used for the variable x . Inalgorithmic terms, we:

Evaluate the expression e, storing the value in the register outD1 .

Place a copy of the value in the register outD1 into that allocated for

the variable x.


Expressing this as an VRM program, we get:

Compile(x := e) =

1 ....α.

CompileExp(e)

α + 1 . Allocate(x )← outD1

The execution trace of this code is:

PC outD1 Allocate(x ) comment

1 ? ? initial state2 ? ?... evaluate the expression for e

α + 1 v ?α + 2 v v assign the result to the register for the variable x

21.3.3 Compiling Sequencing

We compile a sequenced statementS1 ;S2

by placing the code for the compiled version of S2 immediately following the code for thecompiled version of S1 , and relabelling. Algorithmically, this gives us:

Execute the compiled code for the program S1.

Then execute the compiled code for the program S2 which has been relabelled

to follow S1.

Rendering this algorithm into VRM code gives:

Compile(S1 ;S2 ) =

1 ....γ1 .

Compile(S1 )

γ1 + 1 ....γ1 + γ2 .

Compile(S2 )

The execution trace for such a program will depend on the nature of S1 and S2 as is shownin Figure 21.1. More precisely, because of the way in which we define our compiled programs,if the code for S1 terminates, then it will do so with a value for the program counter that isthe first instruction for the code for S2 . Otherwise, S1 does not terminate. Similarly, if S2 isreached, then if S2 terminates, then it will do so with a value for the program counter that isone more than the last instruction for the compiled code (i.e., the length of S1 + the length ofS2 + 1). Alternatively, S2 is reached, but may not terminate.


PC

1

Execution of Compile(S1)

γ1 + 1

Execution of Compile(S2)

Infinite loop

γ1 + γ2 + 1Infinite loop

Figure 21.1: Execution of compiled sequenced statements.

21.3.4 Compiling Conditionals

We compile a conditional statement


by placing jump instructions around the compiled forms of b, S1 and S2 to direct the flow ofcontrol. First, we evaluate the compiled code for the Boolean expression b. Then, we checkthe output of this against a Boolean output register which we have set to be false. If thesetwo values are the same, we jump to the compiled code for the else-statement S2 . Otherwise,the compiled code for the then-statement S1 is automatically executed. We do, though, haveto ensure that after we have executed S1 that we set the program counter to skip over thecompiled code for S2 . In algorithmic terms, we get:

Evaluate the Boolean expression b.Set a test register to false.

Compare the result of the compiled code for b and the test register.

If they are the same then jump over the compiled code for S1 to execute

the compiled code for the else-statement S2.

If they differ, i.e., b evaluated to true, then execute the compiled code

for the then-statement S1 and exit the compiled code by jumping over the

compiled code for S2.


Writing this algorithm as an VRM program, we get:

Compile(if b then S1 else S2 fi) =

1 ....β.

CompileBExp(b)

β + 1 . outB2 ← ff

β + 2 . +(γ1 + 2 )?← (outB

1 , outB2 )

β + 3 ....β + γ1 + 2 .

Compile(S1 )

β + γ1 + 3 . +(γ2 + 1 )?← (outB

1 , outB1 )

β + γ1 + 4 ....β + γ1 + γ2 + 3 .

Compile(S2 )

The execution trace for the code is shown in Figure 21.2. Again, the situation is complicateda little by the fact that we cannot guarantee the termination of either sub-statement S1 or S2 .

β + 1

β + 2

β + 3

tt

tt

tt

?

ff

ff

? ?? ?

β + γ1 + γ2 + 4β + γ1 + 3

β + 2

β + 1 ff

ff

?

ff

ffffβ + γ1 + 4

? ?β + γ1 + γ2 + 4

outBool1

?1 ?

Execution of CompileBExp(b)

Execution of Compile(S1 )

Infinite loop


Infinite loop

PC outBool2

Figure 21.2: Execution of compiled conditional statements.


21.3.5 Compiling Iterative Statements

The cause of potential non-terminating programs is, of course, iterative statements of the form:

while b do S0 od

We compile such statements by using jump statements to effect the flow-of-control around thecompiled Boolean expression for b, and the compiled statement for S0 .

The algorithm for the compiled code is:

Evaluate the Boolean expression b.Set a test register to false.

Compare the result of the compiled code for b and the test register.

If they are the same, exit this code.

If they differ, i.e., b evaluated to true, then execute the compiled code

for the body statement S0.

Repeat the code from the start.

We implement this algorithm by:

Compile(while b do S0 od) =

1 ....β.

CompileBExp(b)

β + 1 . outB2 ← ff

β + 2 . +(γ0 + 2 )?← (outB

1 , outB2 )

β + 3 ....β + γ0 + 2 .

Compile(S0 )

β + γ0 + 3 . −(β + γ0 + 2 )?← (outB

1 , outB1 )

The execution trace for this code is shown in Figure 21.3. Note the backwards loop inthe code that sets up the potentially infinite repetition. Note also that this code may fail toterminate, either because of non-termination in this while loop, or because of non-terminationin the sub-statement S0 (due to a nested while loop not terminating). If the code does terminatethough, then it will do so with the value of the program counter set to one more than the lastinstruction.


β + 1 ttttβ + 2

β + 3 tt ffff? ff ?

ffffffffβ + γ0 + 4

β + 2β + 1

?

PC

1

outBool1

?

outBool2

Execution of CompileBExp(b)


Infinite loop? ?β + γ0 + 3

Figure 21.3: Execution of compiled iterative statements.


21.3.6 Summary of the Compiler

Basis

Compile(skip) = 1. +1?← (outB

1 , outB1 )

Compile(x :=e) =

1 ....α.

CompileExp(e)

α + 1 . Allocate(x )← outD1

Induction Step

Compile(S1 ;S2 ) =

1 ....γ1 .

Compile(S1 )

γ1 + 1 ....γ1 + γ2 .

Compile(S2 )

Compile(if b then S1 else S2 fi) =

1 ....β.

CompileBExp(b)

β + 1 . outB2 ← ff

β + 2 . +(γ1 + 1 )?← (outB

1 , outB2 )

β + 3 ....β + γ1 + 2 .

Compile(S1 )

β + γ1 + 3 . +γ2?← (outB

1 , outB1 )

β + γ1 + 4 ....β + γ1 + γ2 + 3 .

Compile(S2 )

Compile(while b do S0 od) =

1 ....β.

CompileBExp(b)

β + 1 . outB2 ← ff

β + 2 . +(γ0 + 1 )?← (outB

1 , outB2 )

β + 3 ....β + γ0 + 2 .

Compile(S0 )

β + γ0 + 3 . −(β + γ0 + 2 )?← (outB

1 , outB1 )


21.3.7 Constructing Compiled Programs

We can define the compiled while programs more precisely by introducing an operation

Join : VRMProg × VRMProg → VRMProg

such thatJoin(P1 ,P2 )

is the VRM program created by joining together the VRM programs P1 ,P2 ∈ VRMProg . Theprogram P1 forms the first part of Join(P1 ,P2 ) and the relabelled program P2 the remain-der. The relabelling process renumbers the labels of P2 ’s instructions so that the labels ofJoin(P1 ,P2 ) are contiguous.

Using Join, we can define Compile by:

Compile(skip) = 1.+ 1?← (outB

1 , outB1 )

Compile(x :=e) = Join(CompileExp(e), 1 .Allocate(x )← outD1 )

Compile(S1 ;S2 ) = Join(Compile(S1 ),Compile(S2 ))

Compile(if b then S1 else S2 fi)= Join(CompileBExp(b),

Join(1 .outB2 ← ff

2 .+ (|Compile(S1 )|+ 1 )?← (outB

1 , outB2 ),

Join(Compile(S1 ),

Join(1 .+ |Compile(S2 )| ?← (outB1 , out

B1 ),

Compile(S2 )))))Compile(while b do S0 od)

= Join(CompileBExp(b)Join(1 .outB

2 ← ff

2 .+ (|Compile(S0 )|+ 1 )?← (outB

1 , outB2 )

Join(Compile(S0 )

1 .− (|CompileBExp(b)|+ |Compile(S2 )|) ?← (outB1 , out

B1 ))))

21.3.8 Compiled Statements Behaviour

The register programs that result from compiling while statements reflect the structured natureof the source while programs. Accordingly, they have some useful properties which are not trueof VRM programs in general.

Observation (Sequential Labelling) The labels of the instructions in a compiled while

program Compile(S ) are consecutive, starting at 1 and ending at |Compile(S )|.Observation (Restricted Jumps) Providing that any jump instructions in compiled expres-sions or Boolean expressions only set the program counter to a value within the expression,then any jump instructions in a compiled while program Compile(S ) can only set the programcounter to a value between 1 and |Compile(S )|+ 1 .

The value of the program counter after executing a terminating compiled while programCompile(S ), will be |Compile(S )|+ 1 :


Lemma (Controlled Exiting) For all while programs S, and for all VRM states (1 , ρ), τ ∈VRMState,

M ioVRM (D)(Compile(S ))((1 , ρ)) ↓ τ ⇒ τ(PC ) = |Compile(S )|+ 1 .

Proof By the Sequential Labelling Observation, the Restricted Jumps Observation and theSequential Instruction Lemma of Chapter 19. 2

Lemma (Subprogram Execution) For all while programs S and any VRM programs P ,P1 ,P2

and VRM state τ ,

(i) τ(PC ) = 1 ⇒

M ioVRM (D)(Join(Compile(S ),P))(τ)

= M ioVRM (D)(Join(Compile(S ),P)), (M io

VRM (D)(Compile(S ))(τ));

(ii) τ(PC ) = Last(P) + 1 ⇒

M ioVRM (D)(Join(P ,Compile(S )))(τ)

= M ioVRM (D)(Join(P ,Compile(S ))),

(M ioVRM (D)(Compile(S ))(τ [1/PC ])[Last(P) + |Compile(S )|+ 1/PC ]);

and

(iii) τ(PC ) = Last(P1 ) + 1 ⇒

M ioVRM (D)(Join(P1 , Join(Compile(S ),P2 )))(τ)

= M ioVRM (D)(Join(P1 , Join(Compile(S ),P2 ))),

(M ioVRM (D)(Compile(S ))(τ [1/PC ])[Last(P1 ) + |Compile(S )|+ 1/PC ]).

Proof

(i) As the value of the program counter is 1 , the first instruction that will be executedwill come from Compile(S ). Whilst the value of the program counter lies within 1 and|Compile(P)|, the instructions that are executed will come from Compile(P).

But, by the Controlled Exiting Lemma, we know that the value of the program countercan only be set to |Compile(P) + 1 | when the program Compile(P) terminates.

So, by induction on the length of programs, we can deduce that once the program counteris set to 1 , the only instructions that we can execute will be from Compile(P) untilCompile(P) terminates.

(ii) As the program counter is set to Last(P)+1 , by the definition of Join, the first instructionwe will execute will be the first of Relabel(Last(P) + 1 ,Compile(P)).

By the semantics of VRM programs, the execution of any non-jump instruction in Compile(P)can only increment the program counter. Jump instructions can decrement the programcounter, but by the Restricted Jumps Observation, in the program Compile(P), they canonly decrement the program counter to a value which lies within Compile(P).

21.4. CORRECTNESS OF COMPILATION: STATEMENTS 779

So, by induction on the length of programs, we can deduce that once the programcounter is set to Start(Compile(P)) the only instructions that we can execute will befrom Compile(P).

Thus, by the Relabelling Invariance Lemma, once the program counter is set to Last(P)+1 , the only instructions that we can execute will be from Relabel(Last(P)+1 ,Compile(P)).

(iii) Follows from the previous two cases.

2

21.4 Correctness of Compilation: Statements

We shall consider the correctness of the compilation of statements with as much independencefrom the compilation of expressions and Boolean expressions as is possible. However, we shallneed to suppose the compiler satisfies certain behavioural requirements.

21.4.1 Requirements for Compiled Expressions

We shall prove that our compiler from while statements to Register Machine programs is correct,providing that we have a correct compiler for expressions and Boolean expressions.

Definition (Correct Expression Compilation) A compiler

CompileExp : Exp(Σ )→ VRMProg(Σ )

will be correct for any expression e ∈ Exp(Σ ) if, some value equivalent to the value

V Aio (e)(σ)

of the expression e on the state σ is stored in the state that results from executing the programCompileExp(e) on a state Encode(σ) equivalent to σ.

In particular, we want the evaluation of the expression stored in the register

outD1

that has been allocated expressly for this purpose.Just as importantly, the execution of the program CompileExp(e) should not alter the values

of any variable-storing registers nor those of any of the designated input/output registers.The program CompileExp(e) should terminate and should do so in a controlled manner, i.e.,

the value of the program counter should be set at a value |CompileExp(e)|+1 of one more thanthe length of the program.

Lemma (Compiled Expression Requirements) For all expressions e ∈ Exp(Σ ) and allstates σ ∈ State(A), a compiler Compile satisfies the Compiled Expression Requirements if itsatisfies all of the following:

(i) DecodeData(MDio (CompileExp(e))(Encode(σ))(outD

1 )) = V Aio (e)(σ);

(ii) Decode(M Dio (CompileExp(e))(Encode(σ))) = σ;

(iii) r ∈ Reg IO ∧ r 6= outD1 ⇒ M Dio (CompileExp(e))(Encode(σ))(r) = r ; and

(iv) EvalPC (M Dio (CompileExp(e))(Encode(σ))) = |e|+ 1 .


21.4.2 Requirements for Compiled Boolean Expressions

A compilerCompileBExp : BExp(Σ )→ VRMProg(Σ )

will be correct for any Boolean expression b ∈ BExp(Σ ) if, some value equivalent to the value

W Aio (b)(σ)

of the Boolean expression b on the state σ is stored in the state that results from executingthe program CompileBExp(b) on a state Encode(σ) equivalent to σ. In particular, we want theevaluation of the expression stored in the register

outB1

that has been allocated expressly for this purpose.As for expressions, we require that the execution of the program CompileBExp(b) should not

alter the values of any variable-storing registers, nor those of any of the designated input/outputregisters.

The program CompileBExp(b) should terminate and should do so in a controlled manner,i.e., the value of the program counter should be set at a value |CompileBExp(b)|+1 of one morethan the length of the program.

Lemma (Compiled Boolean Expression Requirements) For all Boolean expressions b ∈BExp(Σ ) and all states σ ∈ State(A),

(i) DecodeData(MDio (CompileBExp(b))(Encode(σ))(outB

1 )) = W Aio (b)(σ);

(ii) Decode(M Dio (CompileBExp(b))(Encode(σ))) = σ;

(iii) r ∈ Reg IO ∧ r 6= outB1 ⇒ M D

io (CompileBExp(b))(Encode(σ))(r) = r ; and

(iv) EvalPC (M Dio (CompileBExp(b))(Encode(σ))) = |b|+ 1 .

21.4.3 Correctness of Statements

A compilerCompile : Stmt(Σ )→ VRMProg(Σ )

will be correct for any statement S ∈ Stmt(Σ ) if, the execution of the compiled statementCompile(S ) on a state Encode(σ) produces a state

M Dio (Compile(S ))(Encode(σ))

that is equivalent to the executionM A

io (S )(σ)

of the statement S on an equivalent initial state σ.

Theorem (Compiler Correctness) If a compiler Compile satisfies the Compiled ExpressionRequirements and the Compiled Boolean Expression Requirements Lemma, then for all while

programs S ∈WP(Σ ) and all states σ ∈ State(A),

M Aio (S )(σ) ' Decode(M D

io (Compile(S ))(Encode(σ))).


Proof We shall prove the Correctness Equation by structural induction on while programs.Let σ ∈ State(A) be some arbitrary while state and let

Encode(σ) = (1 , ρ)

for some VRM state ρ ∈ VRMState(D).

Basis

Skip Statement By definition of Compile and Encode,

Decode(M Dio (Compile(skip))(Encode(σ))) = Decode(M D

io (1 .+ 1?← (outB

1 , outB1 ))((1 , ρ))).

The behaviour of this instruction will always just increment the value of the program counterby one. More formally, by the Unconditional Jump Behaviour Lemma,

= Decode((2 , ρ)).

The decode function is independent of the value of the program counter, so by the definitionof Decode,

= Decode((1 , ρ)).

But this is just our initial state= σ

by the definition of Decode. In turn, this is the same state that we get by executing the programskip. Thus, by the semantics M A

io of while programs,

= M Aio (skip)(σ).

Assignment Statements By definition of Compile and Encode,

Decode(M Dio (Compile(x :=e))(Encode(σ)))

= Decode(M Dio (CompileExp(e), α + 1 .Allocate(x )← outD

1 )((1 , ρ))).

The first instruction we execute is the first of CompileExp(e) as the program counter is 1 .By the Compiled Expression Requirements Lemma, we know that executing the programCompileExp(e) is guaranteed to terminate, in a state where the value of the program counteris γ + 1 . So by induction on the length of programs, we know that the value of the programcounter must lie between 1 and γ + 1 during the execution of e. Thus,

= Decode(M Dio (Compile(x :=e))(M D

io (CompileExp(e))((1 , ρ)))).

By the Compiled Expression Requirements Lemma, we know that executing the programCompileExp(e) will guarantee to exit in a controlled manner with the evaluation of the expressione stored in the output register outD

1 . We can tolerate other register values changing, providedthat these registers are not variable-storing.

Let us suppose that the execution of CompileExp(e) on the state (1 , ρ) produces some state(α + 1 , ρ′) satisfying the above conditions. Thus, by the Compiled Expression RequirementsLemma,

= Decode(M Dio (Compile(x :=e))((α+ 1 , ρ′))).


By the Single Instruction Execution Lemma,

= Decode(M Dio (Compile(x :=e))(M D

io (α + 1 .Allocate(x )← outD1 )(α + 1 , ρ′)))

Executing the instruction α+ 1 .Allocate(x )← outD1 copies the value held in the register outD

1

to the register Allocate(x ), and increments the program counter by one. Thus, by the definitionof M D

io

= Decode(M Dio (Compile(x :=e))((α + 1 , ρ′)[ρ′(PC ) + 1/PC ][ρ′(outD

1 )/Allocate(x )]))

The value of the program counter on the state ρ′ is α + 1 . So, by the Register SubstitutionLemma,

= Decode(M Dio (Compile(x :=e))((α+ 2 , ρ′)[ρ′(outD

1 )/Allocate(x )]))

The program counter now exceeds all the instruction labels, so the program has terminated.Thus, by the Execution Cessation Lemma,

= Decode((α + 2 , ρ′)[ρ′(outD1 )/Allocate(x )]).

By the Compiled Expression Requirements Lemma, the evaluation of the expression e is not al-lowed to alter any variable-storing registers. The decode function is only dependent on variable-storing registers, so by the Compiled Expression Execution Lemma and the definition of Decode,

= Decode((1 , ρ)[ρ′(outD1 )/Allocate(x )]).

Expanding the definition of ρ′(outD1 ):

= Decode((1 , ρ)[M Dio (CompileExp(e))(Encode(σ))(outD

1 )/Allocate(x )]).

= (Decode((1 , ρ)[Decodedata(MDio (CompileExp(e))(Encode(σ))(outD

1 ))/Allocate(x )]).

Using the Compiled Expression Requirements Lemma,

= Decode((1 , ρ)[EncodeData(VAio (e)(σ))/Allocate(x )]).

By the State Substitution Transfer Lemma,

= Decode((1 , ρ))[V Aio (e)(σ)/x ].

And decoding the state (1 , ρ),= σ[V A

io (e)(σ)/x ].

But this is the semantics of the while assignment statement x :=e. So, by the definition of M Aio ,

= M Aio (x :=e)(σ).

Induction Step

Suppose that our correctness statement holds for programs S0 , S1 and S2 .


Sequenced Statements By the definition of Encode and Compile,

Decode(M Dio (Compile(S1 ;S2 ))(Encode(σ)))

= Decode(M Dio (Join(Compile(S1 ),Compile(S2 )))((1 , ρ))).

By the Subprogram Execution Lemma, we can split the execution of Compile(S1 ;S2 ) up:

= Decode(M Dio (Compile(S1 ;S2 ))(M D

io (Compile(S1 ))((1 , ρ)))).

Case: Compile(S1 ) terminates If Compile(S1 ) terminates, then by the Controlled Ex-iting Lemma, we know that it will do so with the program counter set at one more than thelength of Compile(S1 ). Suppose that M D

io (Compile(S1 ))(ρ) ↓ (γ1 + 1 , ρ1 ):

= Decode(M Dio (Compile(S1 ;S2 ))((γ1 + 1 , ρ1 ))).

By the Subprogram Execution Lemma, we can again split this execution up:

= Decode(M Dio (Compile(S1 ;S2 ))(M D

io (Compile(S2 ))((γ1 + 1 , ρ1 )[PC/1 ])[γ1 + γ2 + 1/PC ])).

By the Execution Cessation Lemma,

= Decode(M Dio (Compile(S2 ))((γ1 + 1 , ρ1 )[1/PC ])[γ1 + γ2 + 1/PC ]).

And now if we expand the state (γ1 + 1 , ρ1 ), we get

= Decode(M Dio (Compile(S2 ))(M D

io (Compile(S1 ))((1 , ρ))[1/PC ])[γ1 + γ2 + 1/PC ]).

But decoding is independent of the program counter, so by the definition of Decode,


io (Compile(S1 ))((1 , ρ))[1/PC ])).

But the state (1 , ρ) is the encoding of the initial state:


io (Compile(S1 ))(Encode(σ))[1/PC ])).

As we reset the program counter to 1 after executing Compile(S1 ), we can rewrite using thedefinition of Encode and Decode,

= Decode(M Dio (Compile(S2 ))(Encode(Decode(M D

io (Compile(S1 ))(Encode(σ)))))).

By Induction Hypothesis on Compile(S1 ),

= Decode(M Dio (Compile(S2 ))(Encode(M A

io (S1 )(σ)))).

And now by induction hypothesis on Compile(S2 ),

' M Aio (S2 )(M A

io (S1 )(σ)).

This is just the semantics of the sequenced while program S1 ;S2 , so by the definition of M Aio ,

= M Aio (S1 ;S2 )(σ).


Case: Compile(S1 ) does not terminate If Compile(S1 ) does not terminate, then exe-cution of the whole does not terminate, which is what results from M A

io (S1 ;S2 ) also.

Conditional Statements By definition of Encode and Compile,

Decode(M Dio (Compile(if b then S1 else S2 fi))(Encode(σ)))

= Decode(M Dio (

1 ....β.

CompileBExp(b)

β + 1 . outB2 ← ff

β + 2 . +(γ1 + 2 )?← (outB

1 , outB2 )

β + 3 ....β + γ1 + 2 .

Compile(S1 )

β + γ1 + 3 . +(γ2 + 1 )?← (outB

1 , outB1 )

β + γ1 + 4 ....β + γ1 + γ2 + 3 .

Compile(S2 ))

((1 , ρ)))

As the program counter is set to 1 , the first instruction that will be executed will be from b.By the Compiled Boolean Expression Requirements Lemma, the execution of b is guaranteedto terminate, in a state where the value of the program counter is β + 1 . So by induction onthe length of programs, we know that the value of the program counter must lie between 1 andβ + 1 during the execution of b:

= Decode(M Dio (Compile(if b then S1 else S2 fi))(M D

io (Compile(b))((1 , ρ)))).

By the Compiled Boolean Expression Requirements Lemma, we know that executing the pro-gram CompileBExp(b) will guarantee to exit in a controlled manner with the evaluation of theBoolean expression b stored in the output register outB

1 . We can tolerate other register valueschanging, provided that these registers are not variable-storing.

Let us suppose that the execution of CompileBExp(b) on the state (1 , ρ) produces some state(β + 1 , ρ′) satisfying the above conditions:

= Decode(M Dio (Compile(if b then S1 else S2 fi))((β + 1 , ρ′)))

As the program counter is set to β+ 1 , the next instruction we execute is β+ 1 .outB2 ← ff. So,

by the Single Instruction Execution Lemma,

= Decode(M Dio (Compile(if b then S1 else S2 fi))(M D

io (β + 1 .outB2 ← ff)((β + 1 , ρ′)))),

which by the definition of M Dio ,

= Decode(M Dio (Compile(if b then S1 else S2 fi))((β + 1 , ρ′)[(β + 1 , ρ)(PC )/PC ][ff /outB

2 ]))


and simplifying,

= Decode(M Dio (Compile(if b then S1 else S2 fi))((β + 2 , ρ′)[ff /outB

2 ])).

Now the value of the program counter is β + 2 , so the next instruction we execute is β + 2 .+

(γ1 + 2 )?← (outB

1 , outB2 ). So, by the Single Instruction Execution Lemma,

= Decode(M Dio (Compile(if b then S1 else S2 fi))

(M Dio (β + 2 .+ (γ1 + 1 )

?← (outB1 , out

B2 ))((β + 2 , ρ′)[ff /outB

2 ])))

There are two possible outcomes for the jump instruction. Either the source comparisonregisters (outB

1 and outB2 ) hold the same value (i.e., the evaluation of the Boolean test Compile(b)

is false), or they differ (i.e., the evaluation of the Boolean test Compile(b) is true).

Case: b evaluates to true. Suppose that the Boolean test b evaluates to true on thestate σ. Then, by the definition of M D

io ,

EncodeData(WA(b)(σ)) = tt

⇒ Decode(M Dio (Compile(if b then S1 else S2 fi))(Encode(σ)))

= Decode(M Dio (Compile(if b then S1 else S2 fi))((β + 3 , ρ′)[ff /outB

2 ]))

Now the value of the program counter is set to β + 3 , the first instruction of Compile(S1 ).Thus, we can apply the Subprogram Execution Lemma,


(M Dio (Compile(S1 ))((β + 3 , ρ′)[ff /outB

2 ][1/PC ])[β + γ1 + 3/PC ])),

which by the Labelling Invariance Lemma is:


(M Dio (Relabel(β + 3 ,Compile(S1 )))((β + 3 , ρ′)[ff /outB

2 ])))

The sub-program Compile(S1 ) for the then-clause will either terminate or not. We can checkfor termination

M Dio (Compile(S1 ))((1 , ρ)) ↓

of Compile(S1 ) on the state (1 , ρ) rather than

M Dio (Compile(S1 ))((1 , ρ′)[ff /outB

2 ]) ↓

because the registers with changed values are always set by compiled statements before beingaccessed:

• The evaluation of expressions and Boolean expressions are not allowed to alter the valuesof input/output registers other than outD

1 and outB1 , by the Expression Requirements

Lemma and the Boolean Expression Requirements Lemma.

• The evaluation of expressions and Boolean expressions are not allowed to alter the valuesof variable-storing registers by the Expression Requirements Lemma and the BooleanExpression Requirements Lemma.


• The register outB2 is only used by compiled conditional and iteration statements. In both

cases, the instruction to set the value is independent of any registers (it is only ever set tofalse) and it always occurs immediately prior to a test. As there are no jump instructionsto the test itself, the instruction to set to the value in outB

2 is always executed before thevalue is determined.

So, consider the case that Compile(S1 ) terminates. By the Controlled Exiting Lemma, weknow that if Compile(S1 ) terminates, then it will do so with a value of γ1 + 1 , and so by theRelabelling Invariance Lemma, we know that if Relabel(β + 3 ,Compile(S1 )) terminates, it willdo so with a value of β + γ1 + 3 . Suppose then, that Relabel(β + 3 ,Compile(S1 )) terminatesin a state (β + γ1 + 3 , ρ1 ).

EncodeData(WA(b)(σ)) = tt ∧M D

io (Compile(S1 ))(Encode(σ)) ↓⇒ Decode(M D

io (Compile(if b then S1 else S2 fi))(Encode(σ)))= Decode(M D

io (Compile(if b then S1 else S2 fi))(β + γ1 + 3 , ρ1 ))

As the value of the program counter is β + γ1 + 3 , the next instruction that we execute is

β + γ1 + 3 .+ (γ2 + 1 )?← (outB

1 , outB1 ). So, by the Single Instruction Execution Lemma,

= Decode(M Dio (Compile(if b then S1 else S2 fi))(β + γ1 + 3 , ρ1 )[β + γ1 + γ2 + 4/PC ]).

This terminates the program: by the Execution Cessation Lemma,

= Decode((β + γ1 + 3 , ρ1 )[β + γ1 + γ2 + 4/PC ]).

As decode is independent of the value of the program counter, by the definition of Decode,

= Decode((β + γ1 + 3 , ρ1 )).

Now substituting back for the state (β + γ1 + 3 , ρ1 ),

= Decode(M Dio (Relabel(β + 3 ,Compile(S1 )))((β + 3 , ρ′)))

and substituting back for the state (β + 3 , ρ′),

= Decode(M Dio (Relabel(β + 3 ,Compile(S1 )))(M D

io (Compile(b))((1 , ρ)))).

But the state (1 , ρ) is the encoding of the initial state:

= Decode(M Dio (Relabel(β + 3 ,Compile(S1 )))(M D

io (Compile(b))(Encode(σ)))).

By the Relabelling Invariance Lemma,


io (Compile(b))(Encode(σ)))[1/PC ]).

As we set the program counter back to 1 after evaluating the Boolean expression,

= Decode(M Dio (Compile(S1 ))(Encode(Decode(M D

io (Compile(b))(Encode(σ)))))).

Now applying the Boolean Expression Requirements Lemma,

= Decode(M Dio (Compile(S1 ))(Encode(σ))).

And applying the Induction Hypothesis to S1 ,

' M Aio (S1 )(σ).

But this is the semantics of the while program if b then S1 else S2 fi when the Boolean ex-pression evaluates to true. So, by the definition of M A

io ,

= M Aio (if b then S1 else S2 fi)(σ).


Case: Compile(S1 ) does not terminate Now consider the case that the then-clauseCompile(S1 ) does not terminate:

EncodeData(WA(b)(σ)) = tt ∧M D

io (Compile(S1 ))(Encode(σ)) ↑⇒ Decode(M D

io (Compile(if b then S1 else S2 fi))(Encode(σ))) ↑

which is the semantics of if b then S1 else S2 fi.

Case: b evaluates to false Now, returning to the possibility that the else-clause isexecuted. This occurs when the values in the registers outB

1 and outB2 contain the same values,

so the jump instruction causes the program counter to be incremented to the location of thesub-program corresponding to S2 .

So by the definition of M Dio ,

EncodeData(WA(b)(σ)) = ff

⇒ Decode(M Dio (Compile(if b then S1 else S2 fi))(Encode(σ)))

= Decode(M Dio (Compile(if b then S1 else S2 fi))(β + γ1 + 4 , ρ′)[ff /outB

2 ])

Now that the value of the program counter is β + γ1 + 4 , the instruction that is executed nextis the first instruction of Compile(S2 ). So applying the Subprogram Execution Lemma and theLabelling Invariance Lemma, Thus,


(M Dio (Relabel(β + γ1 + 4 ,Compile(S2 )))((β + γ1 + 4 , ρ′)[ff /outB

2 ])))

Again, there are now two cases to consider, depending on whether Compile(S2 ) terminates ornot. We can check for termination

M Dio (Compile(S2 ))((1 , ρ)) ↓

of Compile(S2 ) on the state (1 , ρ) rather than

M Dio (Compile(S2 ))((1 , ρ′)[ff /outB

2 ]) ↓

for the same reasons outlined for determining the termination in the then-case.First, suppose the else-clause Compile(S2 ) does terminate: By the Controlled Exiting and

Execution Cessation Lemmas,

EncodeData(WA(b)(σ)) = ff ∧M D

io (Compile(S2 ))(Encode(σ)) ↓⇒ Decode(M D

io (Compile(if b then S1 else S2 fi))(Encode(σ)))= Decode(M D

io (Relabel(β + γ1 + 4 ,Compile(S2 )))((β + γ1 + 4 , ρ′))[ff /outB2 ])

And then the result follows by similar reasoning for the then case.And now suppose the else-clause Compile(S2 ) does not terminate:

EncodeData(WA(b)(σ)) = ff ∧M D

io (Compile(S2 ))(Encode(σ)) ↑⇒ Decode(M D

io (Compile(if b then S1 else S2 fi))(Encode(σ))) ↑

which is the semantics of if b then S1 else S2 fi.


Iterative Statements

The execution of the statement while bdo S0 od on a state σ ∈ State(A) produces the sequence

σ0 , σ1 , . . . , σn , . . .

of states from the repeated execution of the body S0 on the initial state σ0 = σ. The sequenceof states produced may be infinite if either the execution of S0 does not terminate, or if thecontrolling condition b of the while loop determines non-termination.

We need to show that the execution of the program

Compile(while b do S0 od)

on a state τ ∈ VRMState(D) that is equivalent to σ ∈ State(A) produces an equivalent sequence

τ0 , τ1 , . . . , τn , . . .

of states. I.e., we need to show that

if Decode(τ) = σ then W A(b)(Decode(τi)) = tt for 0 ≤ i < nW A(b)(Decode(τn)) = ffand Decode(τi) = σi

Let us first examine the execution of compiled while statements. Let τ = (1 , ρ) ∈ VRMState(D)be an arbitrary VRM state. Then, following the same reasoning as for conditional statements:

M Dio (Compile(while b do S0 od))(τ)= M D

io (Compile(while b do S0 od))((β + 2 , ρ′)[ff /outB2 ])

where the values of the variable-storing registers in ρ′ remain the same as they were in ρ, thevalue in the output register outD

1 contains the evaluation of the Boolean expression b, and theprogram counter is set to |b| + 1 . We shall use the state produced by executing the programCompileBExp(b) for the evaluation of the Boolean expression as our state τ0 :

τ0 = (β + 2 , ρ′)[ff /outB2 ]

This corresponds to the initial state σ0 :

Decode(τ0 ) = Decode(τ [ff /outB2 ][β + 2/PC ])

by definition of τ0= Decode(τ) by definition of Decode= σ by assumption= σ0 by definition of σ0 .

Continuing with our execution, if the Boolean test evaluated to false, then

EncodeData(WA(b)(Decode(τ))) = ff ⇒ M D

io (Compile(while b do S0 od))(τ)= M D

io (Compile(while b do S0 od))

(M Dio (β + 2 ,+(γ0 + 2 )

?← (outB1 , out

B2 ))((β + 2 , ρ′)[ff /outB

2 ]))by the same reasoning as for conditional statements

= M Dio (Compile(while b do S0 od))((β + γ0 + 4 , ρ′)[ff /outB

2 ])by definition of M D

io

= (β + γ0 + 4 , ρ′)[ff /outB2 ]

by the Execution Cessation Lemma


Thus, ifEncodeData(W

A(b)(Decode(τn))) = ff

then execution of the compiled while statement will cease. But we can simplify this terminationcondition to:

EncodeData(WA(b)(Decode(τn))) = ff

⇔ DecodeData(EncodeData(WA(b)(Decode(τn)))) = DecodeData(ff )

by applying DecodeData to both sides⇔ W A(b)(Decode(τn)) = ff

by definition of DecodeData

as we required.Now, we return to the execution, but considering the case where the Boolean test evaluates

to true.

W Dio (CompileBExp(b))(τ) = tt ⇒ M D

io (Compile(while b do S0 od))(τ)= M D

io (Compile(while b do S0 od))(M D

io (Relabel(β + 3 ,Compile(S0 )))((β + 3 , ρ′)[ff /outB2 ]))

by the same reasoning as for conditional statements

Using the same reasoning contained in the conditionals case, we can ask whether Compile(S0 )terminates on the state τ or not. If it does not terminate, then

W Dio (CompileBExp(b)(τ)) = tt ∧M D

io (Compile(S0 ))(τ) ↑⇒ M D

io (Compile(while b do S0 od))(τ) ↑ .If it does terminate, then

W Dio (CompileBExp(b)(τ)) = tt ⇒ M D

io (Compile(while b do S0 od))(τ)= (Compile(while b do S0 od))

(M Dio (β + γ0 + 3 .− (β + γ0 + 2 )

?← (outB1 , out

B1 )))

(M Dio (Relabel(β + 3 ,Compile(S0 )))((β + 3 , ρ′)[ff /outB

2 ]))by the Execution Cessation and Single Instruction Execution Lemmas

= M Dio (Compile(while b do S0 od))

(M Dio (Compile(S0 ))((1 , ρ′)[ff /outB

2 ]))[1/PC ]by arithmetic and the Relabelling Isomorphism Invariance Theorem

This tells us how our states in the iteration sequence will be produced.

Claim For 0 ≤ i < n, the sequence of iterates is determined by

τi+1 ' M Dio (Compile(S0 ))(τi [EncodeData(W

A(b)(Decode(τi))/outB1 ])[1/PC ].

Proof We prove the claim by induction.For the base case i = 0 ,

τ1 ' M Dio (Compile(S0 ))((1 , ρ′)[ff /outB

2 ])[1/PC ]by the semantics of compiled while programs

' M Dio (Compile(S0 ))(τ0 [tt/outB

1 ])[1/PC ]by the definition of τ0

' M Dio (Compile(S0 ))(τ0 [EncodeData(W

A(b)(Decode(τi)))/outB1 ])[1/PC ]

by assumption


For the Induction Hypothesis, we assume the case holds up to i . Then, for the InductionStep, i + 1 :

M Dio (Compile(while b do S0 od))(τ)' Compile(while b do S0 od)(τi)

by induction' (Compile(while b do S0 od))

(M Dio (Compile(S0 ))((τi)[ff /out

B2 ]))[1/PC ]

by the semantics of compiled while statements' (Compile(while b do S0 od))

(M Dio (Compile(S0 ))((τi)[EncodeData(W

A(b)(Decode(τi)))/outB1 ])[1/PC ]

by the definition of τi

2

And finally, we examine the iteration condition

EncodeData(WA(b)(Decode(τn))) = tt

which we can simplify to

EncodeData(WA(b)(Decode(τn))) = tt

⇔ DecodeData(EncodeData(WA(b)(Decode(τn)))) = DecodeData(tt)

by applying DecodeData to both sides⇔ W A(b)(Decode(τn)) = tt

by definition of DecodeData

as we required.2

Chapter 22

Further reading

Our choice of subjects to study is made, our account of them is written. We have selected topicsthat we believe are fundamental and can serve as a foundation for thinking about programminglanguages and investigating languages in particular areas or studying theoretical aspects of themto an advanced level. It is also time for the reader to reflect on and, hopefully, look forward toa deeper level of knowledge and understanding. One thing is clear:

A computer scientist cannot know too much about programming languages.

More generally,

A computer scientist cannot know too much about data, syntax and semantics.

In this last chapter we will look at the theory of programming languages and point out somespecial topics and a small selection of text-books, handbooks, and research papers that couldhelp the reader to progress.

22.1 On the study of programming languages

A computer scientist can, and should, study programming languages from a number of pointsof view. Curricula commonly begin with the subject of

• Practical Programming

where one meets and, hopefully, masters a general purpose programming language. Commonly,the language is imperative, including object-oriented, or functional. There are so many differentlanguages and different educational objectives that one hesitates to make too many assumptions.Usually one objective is to learn how to design programs that express algorithms and to performinteresting computations on a computer. Then there is the subject of

• Programming Paradigms

where one learns programming again, using a language based on some different model chosenfrom the imperative, object-oriented, functional, logic or algebraic paradigms. The objectiveis to demonstrate that there are radically different ways of programming a computer and thatdifferent methods can have advantages and disadvantages. Upon this simple foundation ofpractical programming the study of programming languages can begin in earnest.

793

794 CHAPTER 22. FURTHER READING

We will look briefly at a few of the subjects that follow. For breadth, there is the taskof surveying and classifying languages in terms of their features. This is a difficult task thatleads to a comparative theory of paradigms and languages based on observation, rather like inclassical botany. There are several texts on

• Comparative Programming Languages

that introduce the extended family of languages and compare them: see, for example, ........Usually, the languages discussed are either general purpose or have some reputation in one of afew selected areas such as artificial intelligence. There are also areas such as visual computingwhich are rich in their own languages.

A more advanced form of comparative theory is often called

• Concepts of Programming Languages

wherein the classifications are more abstract, being based on concepts to be found, in differentforms, across many languages. Some texts are: Wilson and Clark [2001], Mitchell [2003], andWatt [2004]....

A broad approach to languages can also be found in

• Programming Language Pragmatics

which is about how programming languages work, and looks at languages, implementationsand systems as an integrated whole. An excellent introduction is Scott [2000].

For depth, there are specialist subjects on tools and users. One of the most important is

• Parsing, Translation and Compilation

in which the extensive theory of processing syntax and techniques for translating betweenlanguages are studied. Compilation of high level languages to low level languages is usually thecentral problem. Some standard textbooks are: Hopcroft and Ullman [1979]

The technical needs of programmers for high-level modelling, specification and reasoningare the subject of

• Specification and Verification

which may cover a host of problems, approaches, and applications with and without decenttheoretical frameworks and tools.

And, finally, of course there is the mathematical modelling and analysis of programminglanguages in

• Theory of programming languages

All of these subjects are huge and have been lively research areas for decades.

22.2 The theory of programming languages

We will look at the some of the literature for the Theory of Programming Languages.

22.2. THE THEORY OF PROGRAMMING LANGUAGES 795

22.2.1 Data

We have covered the elements of the theory of data in great detail. One reason is that thematerial is so important. The notion of an algebra is as fundamental to computer scienceas the more widely known notions of finite state automaton and grammar. The concept ofan algebra is destined to model data for centuries. Furthermore, the technologies of data arecentre stage for we amass huge amounts of data in all parts of science and society that weanalyse, visualise, combine, store and share. Commercial demand for data has turned it into acommodity to buy and sell. Computing has become datacentric.

Yet, the basic theory of data is not easily accessible to computer scientists at large and itis especially difficult to find material suitable for introducing it to undergraduate students.

There are some advanced books on the specification of data types and algebraic developmenttechniques.

Ehrig and MahrSome specialist books applying algebraic methods to .....Goguen....There are plenty of mathematical books on algebra.The theory of algebras in general is called universal algebra. The main books have been

designed for graduate courses in mathematics: Cohn , Gratzer , Mal’cev , .....For computer scientists there is Wechler [1992]. The grand Handbook of Logic in Computer

Science contains an introduction to algebras and their use for computer scientists: Meinke and Tucker [1992]For particular types of algebras such as groups, rings and fields, and Boolean algebras see

the texts:

22.2.2 Syntax

The theory of syntax appears somewhere in all computer science courses. We have presented itas an independent subject motivated and focused by the needs of programming languages —where the term programming language has a very broad meaning. There are a few texts thatintroduce the subject in that way. An excellent account of syntax that is clearly focused onprogramming languages is Backhouse [1979]. And there some excellent general accounts of howa language is specified that though include syntax mainly occupy themselves with semantics,see Pagan .

Commonly, the theory of grammars and formal languages is presented in the companyof automata, Turing machines, undecidability etc, as a mathematical subject in the broadercontext of the theory of computation. This is an excellent approach to the subject for it is thenatural form of the mathematical theory, and formal language theory is intimately connectedto computability theory both technically and historically. There are plenty of texts for thisapproach: Hopcroft and Ullman [1979]

Often, the theory of grammars is presented in the context of compiling and language pro-cessing covering parsers and compilation.

Algebraic methods for syntax processing have made enormous progress Klint

22.2.3 Semantics

There are some excellent general accounts of how a programming language is specified and inwhich both syntax and semantics are discussed. These texts complement and extend the ma-


terial in Parts II and III here. A number of semantics, including the operational, denotationaland axiomatic, are given for some simple languages. These languages possess only natural num-bers or integers as data types. See, for example, Pagan [1981] and Slonneger and Kurtz [1995].These texts are quite formal but less concerned with concepts and their mathematical analysis.

There are plenty of undergraduate books on semantics that emphasise conceptual analysiswith mathematical techniques. For example, Gordon [1988], Hennessy [1990], Nielson and Nielson [1991],and Winskel [1993].

For graduate studies on semantics there are: Manna [1974], Gunter [1992], Mitchell , Tennent [1981].

22.2.4 Specification and correctness

An important subject is logics for reasoning and verifying programs. Typically, an imperativeprogram S is equipped with a pre-condition p on its input states and a post-condition q on itsoutput states to form a program specification written

pSq.

The idea is that if the initial state of the computation satisfies precondition p, and the programS is executed, then the final state satisfies postcondition q . Since programs need not terminate,two interpretations of this idea are possible called total correctness and partial correctness. AFloyd-Hoare logic is a system of axioms and rules for deriving or proving formulae of the formpSq, i.e., for verifying programs meet input-output specifications.

Logics for verifying programs is a subject we have omitted. Some of the books on semanticsmentioned in 22.2.3 contain treatments of Floyd-Hoare logics from which one can gain anintroduction: Gordon [1988] and Winskel [1993].

An excellent textbook introduction to specification and verification and other logical mattersdo with programs is Sperschneider and Antoniou .

The semantics of the primary imperative constructs and the logics for proving their programscorrect were fully analysed in de Bakker [1980], which is a landmark for the subject. Themethods were subsequently extended to arbitrary data types in Tucker and Zucker [1988].

Accounts of the theory of programming verification are in Francez [1991] and Apt and Olderog [1991].An early readable if out of date survey is Apt and a fuller survey is Cousot [1990].

22.2.5 Compilation

Karen to make first draft.

22.3 Advanced Topics

22.3.1 Abstract Data Types, Equational Specifications and TermRewriting

Computable data types

22.3. ADVANCED TOPICS 797

22.3.2 Fixed Points and Domain Theory

In the course of this book we have met innumerable equations, often as axioms requiringoperations to behave in a certain way — specifically, requiring operations to qualify as solutionsof these equations. Sometimes we met equations that had the specific task to define a functionwe needed. For example, wherever we use recursion and induction to define something — afunction on natural numbers, a procedure, a data structure, the term evaluation map, the statetransformation of a program etc — we see special forms of equations that we need to have aparticular, unique, solution. These equations are about functions and have a special form thatare called fixed-point equations. The solution of equations is a major topic in data, syntax andsemantics. We have met dozens of important equations that demonstrate this fact and motivatea deep mathematical theory of fixed-point equations.

Fixed-point equations have a special form:

find φ such that φ = F (φ)

where F is an operation with φ as an argument. Many of the equations we have met can betransformed into this form.

Fixed-point methods were discovered early on in the development of computability theoryand so entered programming language theory in an obvious way. However, programming theoryand the early attempts to define rigorous mathematical semantics for certain programmingconstructs, such as recursion and data structures, provides a wonderful range of new problemsthat were more complicated and varied. They also provided an appetite for more precise andrigorous treatments of technical issues, and a sense of urgency for such problems were in theway of the development of programming languages.

Early on, before computability theory, fixed-point methods were studied abstractly in thecontext of ordered sets. This theory was an ideal starting point for a theory of fixed-points ofwide applicability, as was needed in programming language theory.

The use of fixed-points can be found in early texts on programming theory such as Manna [1974].The standard mathematical text-book is Stoltenberg-Hansen et al. [1994].

22.3.3 λ-Calculus and Type Theory

22.3.4 Concurrency

22.3.5 Computability theory

Bibliography

Abbate [1999]J Abbate. Inventing the internet. MIT Press, 1999.

Apt and Olderog [1991]K R Apt and E-R Olderog. Verification of sequential and concurrent programs. Springer-Verlag, New York, 1991.

Asveld and Tucker [1982]P R J Asveld and J V Tucker. Complexity theory and the operational structure of algebraicprogramming systems. Acta Informatica, 17:451–476, 1982.

Babbage [1989]C Babbage. Collected Works Volume III: Analytical Engine. W Pickering, London, 1989.

Back and von Wright [1998]R J R Back and J von Wright. Refinement Calculus: A Systematic Introduction. GraduateTexts in Computer Science. Springer-Verlag, 1998.

Backhouse [1979]R C Backhouse. Syntax of programming languages. Prentice-Hall, 1979.

Backus [1959]J Backus. The syntax and semantics of the proposed international algebraic language of theZurich ACM-GAMM conference. In Proceedings of the International Conference on Informa-tion Processing, UNESCO, Paris, pages 125–132, London, 1959. Butterworth.

Backus [1980]J Backus. Programming in America in the 1950s — some personal impressions. In Metropoliset al. [1980], pages 125–135.

Backus [1981]J Backus. The history of FORTRAN I, II and III. In Wexelblat [1981], pages 25–74.

Bar-Hillel et al. [1961]Y Bar-Hillel, M Perles, and E Shamir. On formal properties of simple phrase structure gram-mars. Zeitschrift fur Phonetik, Sprachwissenschaft und Kommunikationsforschung, 14:143–172, 1961.

Barsky [1997]R F Barsky. Noam Chomsky A life of dissent. MIT Press, 1997.

799

800 BIBLIOGRAPHY

Bauer and Wossner [1972]F L Bauer and H Wossner. The “Plankalkul” of Konrad Zuse: A forerunner of today’sprogramming languages. Journal of the ACM, 15(7):678–685, July 1972.

Bemer [1969]R W Bemer. A politico-social history of Algol. In M Halpern and C Shaw, editors, AnnualReview of Automatic Programming, volume 5, pages 151–238, Oxford, 1969. Pergamon Press.

Bergin and Gibson [1996]T J Bergin and R G Gibson. History of programming languages. ACM Press, Addison Wesley,1996.

Bergstra and Tucker [1982]J A Bergstra and J V Tucker. The completeness of the algebraic specification methods fordata types. Information and Control, 54:186–200, 1982.

Bergstra and Tucker [1983]J A Bergstra and J V Tucker. Initial and final algebra semantics for data type specifications:two characterisation theorems. SIAM Journal of Computing, 12:366–387, 1983.

Bergstra and Tucker [1987]J A Bergstra and J V Tucker. Algebraic specifications of computable and semicomputabledata types. Theoretical Computer Science, 50:137–181, 1987.

Bergstra and Tucker [1993]J A Bergstra and J V Tucker. Equational specifications for computable data types: 6 hiddenfunctions suffice and other sufficiency bounds. In K Meinke and J V Tucker, editors, ManySorted Logic and its Applications, pages 89–102. Wiley, 1993.

Bergstra and Tucker [1995a]J A Bergstra and J V Tucker. The data type variety of stack algebras. Annals of Pure andApplied Logic, 73:11–36, 1995.

Bergstra and Tucker [1995b]J A Bergstra and J V Tucker. Equational specifications, complete term rewriting systems,and computable and semicomputable algebras. Journal of the ACM, 42:1194–1230, 1995.

Birkhoff and MacLane [1965]G Birkhoff and S MacLane. A Survey of Modern Algebra. Macmillan, New York, third edition,1965.

Blanck [1996]J Blanck. Domain representability of metric spaces. Annals of Pure and Applied Logic, 83:225–247, 1996.

Blum et al. [1989]L Blum, M Shub, and S Smale. On a theory of computation and complexity over the realnumbers: Np-completeness, recursive functions, and universal machines. Bulletin AmericanMathematical Society, 21:1–46, 1989.

BIBLIOGRAPHY 801

Blum et al. [1998]L Blum, F Cucker, M Shub, and S Smale. Complexity and Real Computation. Springer-Verlag,Berlin, 1998.

Borst [1993]A Borst. The Ordering of Time. Polity Press, 1993.

Broy and Denert [2002]M Broy and E Denert, editors. Software pioneers. Springer-Verlag, 2002.

Burrill [1967]C W Burrill. Foundations of Real Numbers. McGraw-Hill, 1967.

Burris and Sankappanavar [1981]S Burris and H P Sankappanavar. A Course in Universal Algebra. Springer-Verlag, Berlin,1981.

Campbell-Kelly and Aspray [1996]M Campbell-Kelly and W Aspray. Computer A history of the information machine. BasicBooks, New York, 1996.

Cantor [1962]D G Cantor. On the ambiguity problem of Backus systems. Journal of the ACM, 9(4):477–479,1962.

Ceruzzi [1998]P Ceruzzi. A history of modern computing. MIT Press, 1998.

Chen and Tucker [2000]M Chen and J V Tucker. Constructive volume geometry. Computer Graphics Forum, 19:281–293, 2000.

Chen et al. [2000]M Chen, A Kaufman, and R Yagel. Volume Graphics. Springer-Verlag, Berlin, 2000.

Chomsky [1956]N Chomsky. Three models for the description of a language. IRE Transactions on InformationTheory, IT-2(3):113–124, 1956.

Chomsky [1959]N Chomsky. On certain formal properties of grammars. Information and Control, 2:137–167,1959.

Cocke et al. [1966]Cocke, Kasami, and Younger. 1966.

Cohn [1977]P M Cohn. Algebra, volume 2. Wiley, Chichester, 1977.

Cohn [1981]P M Cohn. Universal Algebra. D Reidel, Dordrecht, 1981.

802 BIBLIOGRAPHY

Cohn [1982]P M Cohn. Algebra, volume 1. Wiley, second edition, 1982.

Comer [1995]D E Comer. Internetworking with TCP/IP, volume 1: Principles, Protocols and Architectures.Prentice-Hall, third edition, 1995.

Cousot [1990]P Cousot. Logical methods for proving programs. In J van Leuwen, editor, Handbook oftheoretical computer science, volume B, pages 841–993. North-Holland, Amsterdam, 1990.

Cutland [1980]N J Cutland. Computability: An Introduction to Recursive Function Theory. CambridgeUniversity Press, Cambridge, 1980.

Dale and Walker [1996]N Dale and H Walker. Abstract Data Types. Jones and Bartlett, 1996.

Davis [1977]M Davis. Unsolvable problems. In K J Barwise, editor, Handbook of Mathematical Logic,volume 90 of Studies in Logic, pages 567–594. North-Holland, Amsterdam, 1977.

Davis [2000]M Davis. The universal computer. Norton, 2000.

de Bakker and de Vink [1996]J W de Bakker and E de Vink. Control flow semantics. MIT Press, 1996.

de Bakker and Rutten [1992]J W de Bakker and J J M M Rutten. Ten years of concurrency semantics. World Scientific,Singapore, Amsterdam, 1992.

de Bakker [1980]J W de Bakker. Mathematical theory of program correctness. Prentice-Hall, 1980.

de Roever et al. [2001]W-P de Roever, F de Boer, U Hannemann, J Hooman, Y Lakhnech, M Poel, and J Zwiers.Concurrency Verification: Introduction to Compositional and Noncompositional Methods.Cambridge University Press, 2001.

Dedekind [1888a]R Dedekind. Was Sind und was Sollen die Zahlen? Braunschweig, 1888.

Dedekind [1888b]R Dedekind. Was sind und was sollen die Zahlen? Vieweg, Braunschweig, 1888. Reprintedin: Essays on the Theory of Numbers, Dover, New York, 1963.

Dijkstra [1962]E W Dijkstra. A Primer of Algol 60 Programming Together with a Report on the AlgorithmicLanguage Algol 60. Academic Press, 1962.

BIBLIOGRAPHY 803

Dijkstra [1976]E W Dijkstra. A Discipline of Programming. Prentice-Hall International, 1976.

Ershov [1977]Y L Ershov. Theory of Numberings. Nauka, Moscow, 1977. (in Russian).

Fauvel and Gray [1987]J Fauvel and J Gray. The History of Mathematics. Macmillan Press and Open University,1987.

Floyd [1962]R W Floyd. On ambiguity in phrase structure languages. Communications of the ACM,5(9):526,534, 1962.

Fraleigh [1967]J Fraleigh. A First Course in Abstract Algebra. Addison-Wesley, 1967.

Francez [1991]N Francez. Program verification. Addison Wesley, 1991.

Frohlich and Shepherdson [1956]A Frohlich and J C Shepherdson. Effective procedures in field theory. Philosophical Transac-tions of the Royal Society London Series A, 248:407–432, 1956.

Gandy [1988]R Gandy. The confluence of ideas in 1936. In R Herkin, editor, The Universal Turing Machine,pages 55–111. Oxford University Press, Oxford, 1988.

Ginsburg and Rice [1962]S Ginsburg and H G Rice. Two families of languages related to ALGOL. Journal of the ACM,9:350–371, 1962.

Goldstine and von Neumann [1947]H H Goldstine and J von Neumann. Planning and coding problems for an electronic computinginstrument. 1947.

Gordon [1988]M J C Gordon. Programming language theory and its implementation. Prentice-Hall, 1988.

Gratzer [1979]G Gratzer. Universal Algebra. Springer-Verlag, Berlin, 1979.

Griebach [1981]S A Griebach. Formal languages: Origins and directions. Annals of the History of Computing,3:14–41, 1981.

Gunter [1992]C A Gunter. Semantics of programming languages: structures and techniques. MIT Press,1992.

804 BIBLIOGRAPHY

Hafner and Lyon [1996]K Hafner and M Lyon. Where wizards stay up late: the orgins of the internet. Touchstone,New York, 1996.

Hansen [2002]Per Brinch Hansen. The origin of concurrent programming. Springer-Verlag, 2002.

Harries [1946]Z Harries. From morpheme to utterance. Language, 22:161–183, 1946.

Harrison [1978]M A Harrison. Introduction to Formal Language Theory. Addison-Wesley, 1978.

Hashagen et al. [2002]U Hashagen, R Keil-Slawik, , and A Norberg, editors. History of computing: software issues.Springer-Verlag, 2002.

Hennessy [1990]M Hennessy. The semantics of programming languages: an elementary introduction usingstructural operational semantics. J Wiley, 1990.

Herstein [1964]I N Herstein. Topics in Algebra. Wiley, New York, first edition, 1964.

Hopcroft and Ullman [1979]J E Hopcroft and J D Ullman. Introduction to Automata Theory, Languages and Computation.Addison-Wesley, 1979.

Huntington [1904]E V Huntington. Sets of independent postulates for the algebra of logic. American Mathe-matical Society, 5:288–309, 1904.

Kaufman et al. [1993]A Kaufman, D Cohen, and R Yagel. Volume graphics. IEEE Computer, 26:51–64, 1993.

Kleene [1956]S C Kleene. Representations of events in nerve nets and finite automata. In C D Shannon andJ McCarthy, editors, Automata Studies, pages 3–42, Princeton, 1956. Princeton UniversityPress.

Kleene [1981]S C Kleene. Origins of recursive function theory. Annals of the History of Computing, 3:52–67,1981.

Knuth and Pardo [1977]D E Knuth and L T Pardo. The early development of programming languages. In J Belzer, A GHolzman, and A Kent, editors, Encyclopaedia of Computer Science and Technology, volume 5,pages 419–493, New York, 1977. Dekker. Reprinted in Metropolis et al. [1980].

BIBLIOGRAPHY 805

Knuth [1962]D E Knuth. History of writing compilers. Computers and Automation, 11(12):8–18, December1962. Reprinted in pp 38–56, Compiler Techniques, B W Pollock (ed.), Auerbach Publishers,Princeton, 1972.

Knuth [1965]D E Knuth. On the translation of languages from left to right. Information and Control,8:607–639, 1965.

Knuth [1974]D E Knuth. Structured programming with goto statements. Computing Surveys, 6(4):261–301,December 1974.

Lammers [1989]S Lammers. Programmers at work. Tempus/Microsoft Press, 1989.

Lewis and Papadimitrou [1981]H R Lewis and C H Papadimitrou. Elements of the theory of computation. Prentice-Hall,1981.

Lohr [2002a]S Lohr. Go To. Profile Books, 2002.

Lohr [2002b]S Lohr. Go To. Profile Books, 2002.

Machtey and Young [1978]M Machtey and P R Young. An Introduction to the General Theory of Algorithms. Elsevier,Amsterdam, 1978.

MacKenzie [2001]D MacKenzie. Mechanising proofComputing, risk and trust. MIT Press, 2001.

Mal’cev [1970]A I Mal’cev. Algorithms and Recursive Functions. Wolters-Noordhoff, Groningen, 1970.

Mal’cev [1971]A I Mal’cev. Constructive algebras i. In B F Wells III, editor, The Metamathematics ofAlgebraic Systems Collected Papers: 1936–1967, pages 148–212. North-Holland, Amsterdam,1971.

Manna [1974]Z Manna. Mathematical theory of computation. McGraw-Hill, 1974.

McCarthy [1960]J McCarthy. Recursive functions of symbolic expressions and their computation by machine,part I. Communications of the ACM, pages 184–195, 1960.

806 BIBLIOGRAPHY

McCarthy [1963]J McCarthy. A basis for a mathematical theory of computation. In P Braffort andD Hirschberg, editors, Computer Programming and Formal Systems, pages 33–70, Amster-dam, 1963. North-Holland.

Meinke and Tucker [1992]K Meinke and J V Tucker. Universal algebra. In S Abramsky, D Gabbay, and T S E Maibaum,editors, Handbook of Logic in Computer Science, volume 1: Mathematical Structures, pages189–411. Oxford University Press, Oxford, 1992.

Meseguer and Goguen [1985]J Meseguer and J A Goguen. Initiality, induction and computability. In M Nivat andJ Reynolds, editors, Algebraic Methods in Semantics, pages 459–541. Cambridge UniversityPress, Cambridge, 1985.

Metropolis et al. [1980]N Metropolis, J Howlett, and G Rota, editors. A History of Computing in the TwentiethCentury. Academic Press, 1980.

Mitchell [2003]J C Mitchell. Concepts in programming languages. Cambridge University Press, 2003.

Moody [2001]G Moody. Rebel code: Linux and the open source revolution. Allen Lane, The Penguin Press,2001.

Moreau [1984]R Moreau. The computer comes of age. MIT Press, 1984.

Morris and Jones [1984]F L Morris and C B Jones. An early program proof by Alan Turing. Annals of the History ofComputing, 6(2):139–143, April 1984.

Moss et al. [1992]L Moss, J Meseguer, and J A Goguen. Final algebras, cosemicomputable algebras, and degreesof unsolvability. Theoretical Computer Science, 100:267–302, 1992.

Napier [1616]J Napier. A Description of the Admirable Table of Logarithms. Nicholas Okes, London, 1616.Translated from original Latin by Edward Wright.

Naur and Perlis [1981]P Naur and A J Perlis. In Wexelblat [1981], pages 75–139.

Nielson and Nielson [1991]H R Nielson and F Nielson. Semantics with applications: a formal introduction. J Wiley,1991.

Odifreddi [1989]P Odifreddi. Classical Recursion Theory, volume 125 of Studies in Logic. North-Holland,Amsterdam, 1989.

BIBLIOGRAPHY 807

Pagan [1981]F G Pagan. Formal specification of programming languages: a panoramic primer. Prentice-Hall, 1981.

Peacock [1830]G Peacock. A Treatise on Algebra. J & J J Deighton, Cambridge, 1830.

Post [1943]E L Post. Formal reductions of the general combinatorial decision problem. American Journalof Mathematics, 65:197–215, 1943.

Post [1946]E L Post. A variant of an unsolvable problem. Bulletin of the American Mathematical Society,52:264–268, 1946.

Post [1947]E L Post. Recursive unsolvability of a problem of thue. Journal Symbolic Logic, 12:1–11, 1947.

Raymond [1999]E S Raymond. The cathedral and the bazaar. O’Reilly, 1999.

Salomaa [1981]A Salomaa. Jewels of Formal Lanugage Theory. Computer Science Press, 1981.

Sammet [1981]J E Sammet. The early history of COBOL. In Wexelblat [1981], pages 199–217.

Scott [2000]M L Scott. Programming Language Pragmatics. Morgan Kaufmann Publishers, 2000.

Shasha and Lazere [1995]D Shasha and C Lazere. Out of their minds. Copernicus/Springer-Verlag, 1995.

Shepherdson [1985]J C Shepherdson. Algorithmic procedures, generalised turing algorithms, and elementaryrecursion theory. In L A Harrington, M D Morley, A Scedrov, and S G Simpson, editors, Har-vey Friedman’s Research on the Foundations of Mathematics, pages 285–308. North-Holland,Amsterdam, 1985.

Slater [1987]R Slater. Portraits in silicon. MIT Press, 1987.

Slonneger and Kurtz [1995]K Slonneger and B L Kurtz. Formal syntax and semantics of programming languages. Addison-Wesley, 1995.

Stoltenberg-Hansen and Tucker [1988]V Stoltenberg-Hansen and J V Tucker. Complete local rings as domains. Journal SymbolicLogic, 53:603–624, 1988.

808 BIBLIOGRAPHY

Stoltenberg-Hansen and Tucker [1991]V Stoltenberg-Hansen and J V Tucker. Algebraic equations and fixed-point equations ininverse limits. Theoretical Computer Science, 87:1–24, 1991.

Stoltenberg-Hansen and Tucker [1993]V Stoltenberg-Hansen and J V Tucker. Infinite systems of equations over inverse limits andinfinite synchronous concurrent algorithms. In J W de Bakker, G Rozenberg, and W Pde Roever, editors, Semantics – Foundations and Applications, volume 666 of Lecture Notesin Computer Science, pages 531–562. Springer-Verlag, 1993.

Stoltenberg-Hansen and Tucker [1995]V Stoltenberg-Hansen and J V Tucker. Effective algebras. In S Abramsky, D M Gabbay,and T S E Maibaum, editors, Handbook of Logic in Computer Science, volume IV: SemanticModelling, pages 357–526. Oxford University Press, Oxford, 1995.

Stoltenberg-Hansen et al. [1994]V Stoltenberg-Hansen, I Lindstrom, and E R Griffor. Mathematical theory of domains. Cam-bridge Tracts in Theoretical Computer Science, 22, 1994.

Struik [1969]D J Struik, editor. A Sourcebook in the Mathematics, 1200–1800. Harvard University Press,1969.

Tennent [1981]R D Tennent. Principles of programming languages. Prentice-Hall, 1981.

Thue [1914]A Thue. Probleme uber veranderungen von zeichenreihen nach gegebenen regeln. Skrifterutgit av Videnskappselskapet i Kristiania, I Mat Naturv Klasse, 10:34, 1914.

Thue [1977]A Thue. Axel Thue’s Selected Works. University of Oslo Press, Oslo, 1977.

Torvalds [2001]L Torvalds. Just for fun. Textere, 2001.

Tucker and Zucker [1988]J V Tucker and J I Zucker. Program Correctness over Abstract Data Types with Error-StateSemantics. North-Holland, Amsterdam, 1988.

Tucker and Zucker [1991]J V Tucker and J I Zucker. Examples of semicomputable sets of real and complex numbers.In J P Myers Jr and M J O’Donnell, editors, Constructivity in Computer Science, volume 613of Lecture Notes in Computer Science, pages 179–198, Berlin, 1991. Springer-Verlag.

Tucker and Zucker [1993]J V Tucker and J I Zucker. Provable computable selection functions on abstract structures.In P Aczel, H Simmons, and S S Wainer, editors, Proof Theory, pages 277–306. CambridgeUniversity Press, Cambridge, 1993.

BIBLIOGRAPHY 809

Tucker and Zucker [1994]J V Tucker and J I Zucker. Computable functions on stream algebras. In H Schwichtenberg,editor, Proof and Computation, pages 341–383. Springer-Verlag, Berlin, 1994.

Tucker and Zucker [1999]J V Tucker and J I Zucker. Computable functions and semicomputable sets on many sortedalgebras. In S Abramsky, D Gabbay, and T S E Maibaum, editors, Handbook of Logic inComputer Science, volume V. Oxford University Press, 1999. in preparation.

Tucker [1980]J V Tucker. Computing in algebraic systems. In F R Drake and S S Wainer, editors, Recur-sion Theory, its Generalisations and Applications, volume 45 of London Mathematical SocietyLecture Note Series, pages 215–235. Cambridge University Press, Cambridge, 1980.

Valiant [1975]L G Valiant. General context-free recognition in less than cubic time. Journal of Computerand System Sciences, 10:308–315, 1975.

van der Waerden [1949]B L van der Waerden. Algebra, volume I and II. Ungar, New York, 1949.

Watt [2004]D A Watt. Programming language design concepts. John Wiley and Sons, Ltd, 2004.

Wechler [1992]W Wechler. Universal Algebra for Computer Scientists, volume 25 of EATCS Monographs onTheoretical Computer Science. Springer-Verlag, Berlin, 1992.

Weihrauch [2000]K Weihrauch. Computable analysis. Springer-Verlag, Heidelberg, 2000.

Wexelblat [1981]R L Wexelblat, editor. History of programming languages. Academic Press, 1981.

Whitrow [1980]A J Whitrow. The Natural Philosohpy of Time. Oxford University Press, second edition, 1980.

Wilkes et al. [1951]M V Wilkes, D J Wheeler, and S Gill. The Preparation of Programs for an Electronic DigitalComputer. Addison-Wesley, Cambridge MA, 1951.

Wilson and Clark [2001]L B Wilson and R Clark. Comparative programming languages. Addison-Wesley, third edition,2001.

Winskel [1993]G Winskel. The formal semantics of programming languages: an introduction. MIT Press,1993.

810 BIBLIOGRAPHY

Wirsing [1990]M Wirsing. Algebraic specifications. In J van Leeuwen, editor, Handbook of TheoreticalComputer Science, volume B: Formal Models and Semantics, pages 675–788. North-Holland,1990.

Yourdon [1982]E Yourdon. Writings of the revolution: Selected readings on software engineering. YourdonPress, 1982.

data, syntax and semantics - better together toolkit

Documents