introduction to functional programming. john harrison
TRANSCRIPT
Introduction to Functional
Programming
John Harrisonjrh�cl�cam�ac�uk
�rd December ����
i
Preface
These are the lecture notes accompanying the course Introduction to FunctionalProgramming� which I taught at Cambridge University in the academic year�������
This course has mainly been taught in previous years by Mike Gordon� I haveretained the basic structure of his course� with a blend of theory and practice�and have borrowed heavily in what follows from his own lecture notes� availablein book form as Part II of Gordon ���� I have also been in�uenced by thosewho have taught related courses here� such as Andy Gordon and Larry Paulsonand� in the chapter on types� by Andy Pitts s course on the subject�
The large chapter on examples is not directly examinable� though studying itshould improve the reader s grasp of the early parts and give a better idea abouthow ML is actually used�
Most chapters include some exercises� either invented specially for this courseor taken from various sources� They are normally intended to require a littlethought� rather than just being routine drill� Those I consider fairly di�cult aremarked with a ���
These notes have not yet been tested extensively and no doubt contain variouserrors and obscurities� I would be grateful for constructive criticism from anyreaders�
John Harrison jrh�cl�cam�ac�uk��
ii
Plan of the lectures
This chapter indicates roughly how the material is to be distributed over a courseof twelve lectures� each of slightly less than one hour�
�� Introduction and Overview Functional and imperative programming�contrast� pros and cons� General structure of the course� how lambda cal�culus turns out to be a general programming language� Lambda notation�how it clari�es variable binding and provides a general analysis of mathe�matical notation� Currying� Russell s paradox�
�� Lambda calculus as a formal system Free and bound variables� Sub�stitution� Conversion rules� Lambda equality� Extensionality� Reductionand reduction strategies� The Church�Rosser theorem� statement and con�sequences� Combinators�
�� Lambda calculus as a programming language Computability back�ground� Turing completeness no proof�� Representing data and basic op�erations� truth values� pairs and tuples� natural numbers� The predecessoroperation� Writing recursive functions� �xed point combinators� Let ex�pressions� Lambda calculus as a declarative language�
�� Types Why types� Answers from programming and logic� Simply typedlambda calculus� Church and Curry typing� Let polymorphism� Mostgeneral types and Milner s algorithm� Strong normalization no proof��and its negative consequences for Turing completeness� Adding a recursionoperator�
�� ML ML as typed lambda calculus with eager evaluation� Details of evalu�ation strategy� The conditional� The ML family� Practicalities of interact�ing with ML� Writing functions� Bindings and declarations� Recursive andpolymorphic functions� Comparison of functions�
�� Details of ML More about interaction with ML� Loading from �les� Com�ments� Basic data types� unit� booleans� numbers and strings� Built�inoperations� Concrete syntax and in�xes� More examples� Recursive typesand pattern matching� Examples� lists and recursive functions on lists�
iii
iv
�� Proving programs correct The correctness problem� Testing and veri��cation� The limits of veri�cation� Functional programs as mathematicalobjects� Examples of program proofs� exponential� GCD� append and re�verse�
� E�ective ML Using standard combinators� List iteration and other usefulcombinators� examples� Tail recursion and accumulators� why tail recursionis more e�cient� Forcing evaluation� Minimizing consing� More e�cientreversal� Use of �as � Imperative features� exceptions� references� arraysand sequencing� Imperative features and types� the value restriction�
�� ML examples I� symbolic di�erentiation Symbolic computation� Datarepresentation� Operator precedence� Association lists� Prettyprintingexpressions� Installing the printer� Di�erentiation� Simpli�cation� Theproblem of the �right simpli�cation�
��� ML examples II� recursive descent parsing Grammars and the parsingproblem� Fixing ambiguity� Recursive descent� Parsers in ML� Parsercombinators� examples� Lexical analysis using the same techniques� Aparser for terms� Automating precedence parsing� Avoiding backtracking�Comparison with other techniques�
��� ML examples III� exact real arithmetic Real numbers and �nite repre�sentations� Real numbers as programs or functions� Our representation ofreals� Arbitrary precision integers� Injecting integers into the reals� Nega�tion and absolute value� Addition� the importance of rounding division�Multiplication and division by integers� General multiplication� Inverse anddivision� Ordering and equality� Testing� Avoiding reevaluation throughmemo functions�
��� ML examples IV� Prolog and theorem proving Prolog terms� Case�sensitive lexing� Parsing and printing� including list syntax� Uni�cation�Backtracking search� Prolog examples� Prolog�style theorem proving� Ma�nipulating formulas� negation normal form� Basic prover� the use of con�tinuations� Examples� Pelletier problems and whodunit�
Contents
� Introduction �
��� The merits of functional programming � � � � � � � � � � � � � � � ���� Outline � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� Lambda calculus �
��� The bene�ts of lambda notation � � � � � � � � � � � � � � � � � � � ��� Russell s paradox � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Lambda calculus as a formal system � � � � � � � � � � � � � � � � � ��
����� Lambda terms � � � � � � � � � � � � � � � � � � � � � � � � � ������� Free and bound variables � � � � � � � � � � � � � � � � � � � ������� Substitution � � � � � � � � � � � � � � � � � � � � � � � � � � ������� Conversions � � � � � � � � � � � � � � � � � � � � � � � � � � ������� Lambda equality � � � � � � � � � � � � � � � � � � � � � � � ������� Extensionality � � � � � � � � � � � � � � � � � � � � � � � � � ������� Lambda reduction � � � � � � � � � � � � � � � � � � � � � � ����� Reduction strategies � � � � � � � � � � � � � � � � � � � � � ������� The Church�Rosser theorem � � � � � � � � � � � � � � � � � ��
��� Combinators � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
� Lambda calculus as a programming language ��
��� Representing data in lambda calculus � � � � � � � � � � � � � � � � ������� Truth values and the conditional � � � � � � � � � � � � � � ������� Pairs and tuples � � � � � � � � � � � � � � � � � � � � � � � � ������� The natural numbers � � � � � � � � � � � � � � � � � � � � � ��
��� Recursive functions � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Let expressions � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Steps towards a real programming language � � � � � � � � � � � � ����� Further reading � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
� Types �
��� Typed lambda calculus � � � � � � � � � � � � � � � � � � � � � � � � ������� The stock of types � � � � � � � � � � � � � � � � � � � � � � ������� Church and Curry typing � � � � � � � � � � � � � � � � � � ��
v
vi CONTENTS
����� Formal typability rules � � � � � � � � � � � � � � � � � � � � ������� Type preservation � � � � � � � � � � � � � � � � � � � � � � � ��
��� Polymorphism � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� Let polymorphism � � � � � � � � � � � � � � � � � � � � � � ������� Most general types � � � � � � � � � � � � � � � � � � � � � � ��
��� Strong normalization � � � � � � � � � � � � � � � � � � � � � � � � � ��
A taste of ML �
��� Eager evaluation � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Consequences of eager evaluation � � � � � � � � � � � � � � � � � � ����� The ML family � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Starting up ML � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Interacting with ML � � � � � � � � � � � � � � � � � � � � � � � � � ����� Bindings and declarations � � � � � � � � � � � � � � � � � � � � � � ����� Polymorphic functions � � � � � � � � � � � � � � � � � � � � � � � � ��� Equality of functions � � � � � � � � � � � � � � � � � � � � � � � � � ��
� Further ML ��
��� Basic datatypes and operations � � � � � � � � � � � � � � � � � � � ����� Syntax of ML phrases � � � � � � � � � � � � � � � � � � � � � � � � ����� Further examples � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Type de�nitions � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
����� Pattern matching � � � � � � � � � � � � � � � � � � � � � � � ������� Recursive types � � � � � � � � � � � � � � � � � � � � � � � � ������� Tree structures � � � � � � � � � � � � � � � � � � � � � � � � ������� The subtlety of recursive types � � � � � � � � � � � � � � � �
� Proving programs correct �
��� Functional programs as mathematical objects � � � � � � � � � � � ���� Exponentiation � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Greatest common divisor � � � � � � � � � � � � � � � � � � � � � � � ���� Appending � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Reversing � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
E�ective ML �
�� Useful combinators � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Writing e�cient code � � � � � � � � � � � � � � � � � � � � � � � � � ��
���� Tail recursion and accumulators � � � � � � � � � � � � � � � ������ Minimizing consing � � � � � � � � � � � � � � � � � � � � � � ����� Forcing evaluation � � � � � � � � � � � � � � � � � � � � � � ���
�� Imperative features � � � � � � � � � � � � � � � � � � � � � � � � � � ������� Exceptions � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� References and arrays � � � � � � � � � � � � � � � � � � � � � ���
CONTENTS vii
���� Sequencing � � � � � � � � � � � � � � � � � � � � � � � � � � ������� Interaction with the type system � � � � � � � � � � � � � � ���
Examples ��
��� Symbolic di�erentiation � � � � � � � � � � � � � � � � � � � � � � � �������� First order terms � � � � � � � � � � � � � � � � � � � � � � � �������� Printing � � � � � � � � � � � � � � � � � � � � � � � � � � � � �������� Derivatives � � � � � � � � � � � � � � � � � � � � � � � � � � �������� Simpli�cation � � � � � � � � � � � � � � � � � � � � � � � � � ���
��� Parsing � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� Recursive descent parsing � � � � � � � � � � � � � � � � � � �������� Parser combinators � � � � � � � � � � � � � � � � � � � � � � �������� Lexical analysis � � � � � � � � � � � � � � � � � � � � � � � � �������� Parsing terms � � � � � � � � � � � � � � � � � � � � � � � � � �������� Automatic precedence parsing � � � � � � � � � � � � � � � � �������� Defects of our approach � � � � � � � � � � � � � � � � � � � ���
��� Exact real arithmetic � � � � � � � � � � � � � � � � � � � � � � � � � ������� Representation of real numbers � � � � � � � � � � � � � � � �������� Arbitrary�precision integers � � � � � � � � � � � � � � � � � �������� Basic operations � � � � � � � � � � � � � � � � � � � � � � � �������� General multiplication � � � � � � � � � � � � � � � � � � � � �������� Multiplicative inverse � � � � � � � � � � � � � � � � � � � � � �������� Ordering relations � � � � � � � � � � � � � � � � � � � � � � � ������� Caching � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
��� Prolog and theorem proving � � � � � � � � � � � � � � � � � � � � � �������� Prolog terms � � � � � � � � � � � � � � � � � � � � � � � � � �������� Lexical analysis � � � � � � � � � � � � � � � � � � � � � � � � �������� Parsing � � � � � � � � � � � � � � � � � � � � � � � � � � � � �������� Uni�cation � � � � � � � � � � � � � � � � � � � � � � � � � � � �������� Backtracking � � � � � � � � � � � � � � � � � � � � � � � � � �������� Examples � � � � � � � � � � � � � � � � � � � � � � � � � � � �������� Theorem proving � � � � � � � � � � � � � � � � � � � � � � � ���
Chapter �
Introduction
Programs in traditional languages� such as FORTRAN� Algol� C and Modula���rely heavily on modifying the values of a collection of variables� called the state� Ifwe neglect the input�output operations and the possibility that a program mightrun continuously e�g� the controller for a manufacturing process�� we can arriveat the following abstraction� Before execution� the state has some initial value�� representing the inputs to the program� and when the program has �nished�the state has a new value �� including the results�� Moreover� during execution�each command changes the state� which has therefore proceeded through some�nite sequence of values�
� � �� � �� � �� � � � � � �n � ��
For example in a sorting program� the state initially includes an array ofvalues� and when the program has �nished� the state has been modi�ed in such away that these values are sorted� while the intermediate states represent progresstowards this goal�
The state is typically modi�ed by assignment commands� often written inthe form v � E or v �� E where v is a variable and E some expression� Thesecommands can be executed in a sequential manner by writing them one after theother in the program� often separated by a semicolon� By using statements likeif and while� one can execute these commands conditionally� and repeatedly�depending on other properties of the current state� The program amounts to a setof instructions on how to perform these state changes� and therefore this style ofprogramming is often called imperative or procedural� Correspondingly� the tra�ditional languages intended to support it are known as imperative or procedurallanguages�
Functional programming represents a radical departure from this model� Es�sentially� a functional program is simply an expression� and execution meansevaluation of the expression�� We can see how this might be possible� in gen�
�Functional programming is often called �applicative programming� since the basic mecha�
�
� CHAPTER �� INTRODUCTION
eral terms� as follows� Assuming that an imperative program as a whole� isdeterministic� i�e� the output is completely determined by the input� we can saythat the �nal state� or whichever fragments of it are of interest� is some func�tion of the initial state� say �� � f���� In functional programming this viewis emphasized� the program is actually an expression that corresponds to themathematical function f � Functional languages support the construction of suchexpressions by allowing rather powerful functional constructs�
Functional programming can be contrasted with imperative programming ei�ther in a negative or a positive sense� Negatively� functional programs do notuse variables � there is no state� Consequently� they cannot use assignments�since there is nothing to assign to� Furthermore the idea of executing commandsin sequence is meaningless� since the �rst command can make no di�erence tothe second� there being no state to mediate between them� Positively however�functional programs can use functions in much more sophisticated ways� Func�tions can be treated in exactly the same way as simpler objects like integers� theycan be passed to other functions as arguments and returned as results� and ingeneral calculated with� Instead of sequencing and looping� functional languagesuse recursive functions� i�e� functions that are de�ned in terms of themselves� Bycontrast� most traditional languages provide poor facilities in these areas� C al�lows some limited manipulation of functions via pointers� but does not allow oneto create new functions dynamically� FORTRAN does not even support recursionat all�
To illustrate the distinction between imperative and functional programming�the factorial function might be coded imperatively in C without using C s un�usual assignment operations� as�
int fact�int n�
� int x �
while �n � ��
� x � x n
n � n �
�
return x
�
whereas it would be expressed in ML� the functional language we discuss later�as a recursive function�
let rec fact n �
if n � � then
else n fact�n � �
nism is the application of functions to arguments��Compare Naur�s remarks �Raphael �� that he can write any program in a single state�
ment Output � Program�Input�
���� THE MERITS OF FUNCTIONAL PROGRAMMING �
In fact� this sort of de�nition can be used in C too� However for more sophis�ticated uses of functions� functional languages stand in a class by themselves�
��� The merits of functional programming
At �rst sight� a language without variables or sequencing might seem completelyimpractical� This impression cannot be dispelled simply by a few words here�But we hope that by studying the material that follows� readers will gain anappreciation of how it is possible to do a lot of interesting programming in thefunctional manner�
There is nothing sacred about the imperative style� familiar though it is�Many features of imperative languages have arisen by a process of abstractionfrom typical computer hardware� from machine code to assemblers� to macro as�semblers� and then to FORTRAN and beyond� There is no reason to supposethat such languages represent the most palatable way for humans to communi�cate programs to a machine� After all� existing hardware designs are not sacredeither� and computers are supposed to do our bidding rather than conversely�Perhaps the right approach is not to start from the hardware and work upwards�but to start with programming languages as an abstract notation for specifyingalgorithms� and then work down to the hardware Dijkstra ������ Actually� thistendency can be detected in traditional languages too� Even FORTRAN allowsarithmetical expressions to be written in the usual way� The programmer is notburdened with the task of linearizing the evaluation of subexpressions and �ndingtemporary storage for intermediate results�
This suggests that the idea of developing programming languages quite dif�ferent from the traditional imperative ones is certainly defensible� However� toemphasize that we are not merely proposing change for change s sake� we shouldsay a few words about why we might prefer functional programs to imperativeones�
Perhaps the main reason is that functional programs correspond more directlyto mathematical objects� and it is therefore easier to reason about them� In orderto get a �rm grip on exactly what programs mean� we might wish to assign anabstract mathematical meaning to a program or command � this is the aimof denotational semantics semantics � meaning�� In imperative languages� thishas to be done in a rather indirect way� because of the implicit dependencyon the value of the state� In simple imperative languages� one can associatea command with a function � � �� where � is the set of possible values forthe state� That is� a command takes some state and produces another state�It may fail to terminate e�g� while true do x �� x�� so this function mayin general be partial� Alternative semantics are sometimes preferred� e�g� interms of predicate transformers Dijkstra ������ But if we add features that canpervert the execution sequence in more complex ways� e�g� goto� or C s break
� CHAPTER �� INTRODUCTION
and continue� even these interpretations no longer work� since one commandcan cause the later commands to be skipped� Instead� one typically uses a morecomplicated semantics based on continuations�
By contrast functional programs� in the words of Henson ����� �wear theirsemantics on their sleeves �� We can illustrate this using ML� The basic datatypeshave a direct interpretation as mathematical objects� Using the standard notationof ��X for �the semantics of X � we can say for example that ��int � Z� Now theML function fact de�ned by�
let rec fact n �
if n � � then
else n fact�n � �
has one argument of type int� and returns a value of type int� so it can simplybe associated with an abstract partial function Z� Z�
��fact n� �
�n! if n � �� otherwise
Here � denotes unde�nedness� since for negative arguments� the programfails to terminate�� This kind of simple interpretation� however� fails in non�functional programs� since so�called �functions might not be functions at all inthe mathematical sense� For example� the standard C library features a functionrand��� which returns di�erent� pseudo�random values on successive calls� Thissort of thing can be implemented by using a local static variable to rememberthe previous result� e�g�
int rand�void�
� static int n � �
return n � �������� n � �������
�
Thus� one can see the abandonment of variables and assignments as the logicalnext step after the abandonment of goto� since each step makes the semanticssimpler� A simpler semantics makes reasoning about programs more straightfor�ward� This opens up more possibilities for correctness proofs� and for provablycorrect transformations into more e�cient programs�
Another potential advantage of functional languages is the following� Since theevaluation of expressions has no side�e�ect on any state� separate subexpressionscan be evaluated in any order without a�ecting each other� This means thatfunctional programs may lend themselves well to parallel implementation� i�e�the computer can automatically farm out di�erent subexpressions to di�erent
�More� denotational semantics can be seen as an attempt to turn imperative languages intofunctional ones by making the state explicit�
���� OUTLINE �
processors� By contrast� imperative programs often impose a fairly rigid order ofexecution� and even the limited interleaving of instructions in modern pipelinedprocessors turns out to be complicated and full of technical problems�
Actually� ML is not a purely functional programming language� it does havevariables and assignments if required� Most of the time� we will work inside thepurely functional subset� But even if we do use assignments� and lose some of thepreceding bene�ts� there are advantages too in the more �exible use of functionsthat languages like ML allow� Programs can often be expressed in a very conciseand elegant style using higher�order functions functions that operate on otherfunctions��� Code can be made more general� since it can be parametrized evenover other functions� For example� a program to add up a list of numbers and aprogram to multiply a list of numbers can be seen as instances of the same pro�gram� parametrized by the pairwise arithmetic operation and the correspondingidentity� In one case it is given " and � and in the other case� � and ��� Finally�functions can also be used to represent in�nite data in a convenient way � forexample we will show later how to use functions to perform exact calculationwith real numbers� as distinct from �oating point approximations�
At the same time� functional programs are not without their problems� Sincethey correspond less directly the the eventual execution in hardware� it can bedi�cult to reason about their exact usage of resources such as time and space�Input�output is also di�cult to incorporate neatly into a functional model� thoughthere are ingenious techniques based on in�nite sequences�
It is up to readers to decide� after reading this book� on the merits of thefunctional style� We do not wish to enforce any ideologies� merely to point outthat there are di�erent ways of looking at programming� and that in the rightsituations� functional programming may have considerable merits� Most of ourexamples are chosen from areas that might loosely be described as �symboliccomputation � for we believe that functional programs work well in such appli�cations� However� as always one should select the most appropriate tool for thejob� It may be that imperative programming� object�oriented programming orlogic programming are more suited to certain tasks� Horses for courses�
��� Outline
For those used to imperative programming� the transition to functional program�ming is inevitably di�cult� whatever approach is taken� While some will be im�
�Elegance is subjective and conciseness is not an end in itself� Functional languages andother languages like APL often create a temptation to produce very short tricky code whichis elegant to cognoscenti but obscure to outsiders�
�This parallels the notion of abstraction in pure mathematics e�g� that the additive andmultiplicative structures over numbers are instances of the abstract notion of a monoid� Thissimilarly avoids duplication and increases elegance�
� CHAPTER �� INTRODUCTION
patient to get quickly to real programming� we have chosen to start with lambdacalculus� and show how it can be seen as the theoretical underpinning for func�tional languages� This has the merit of corresponding quite well to the actualhistorical line of development�
So �rst we introduce lambda calculus� and show how what was originallyintended as a formal logical system for mathematics turned out to be a completelygeneral programming language� We then discuss why we might want to add typesto lambda calculus� and show how it can be done� This leads us into ML� which isessentially an extended and optimized implementation of typed lambda calculuswith a certain evaluation strategy� We cover the practicalities of basic functionalprogramming in ML� and discuss polymorphism and most general types� Wethen move on to more advanced topics including exceptions and ML s imperativefeatures� We conclude with some substantial examples� which we hope provideevidence for the power of ML�
Further reading
Numerous textbooks on �functional programming include a general introductionto the �eld and a contrast with imperative programming � browse through afew and �nd one that you like� For example� Henson ���� contains a goodintroductory discussion� and features a similar mixture of theory and practiceto this text� A detailed and polemical advocacy of the functional style is givenby Backus ����� the main inventor of FORTRAN� Gordon ����� discusses theproblems of incorporating input�output into functional languages� and some solu�tions� Readers interested in denotational semantics� for imperative and functionallanguages� may look at Winskel ������
Chapter �
Lambda calculus
Lambda calculus is based on the so�called �lambda notation for denoting func�tions� In informal mathematics� when one wants to refer to a function� one usually�rst gives the function an arbitrary name� and thereafter uses that name� e�g�
Suppose f � R � R is de�ned by�
fx� �
�� if x � �x�sin��x�� if x �� �
Then f �x� is not Lebesgue integrable over the unit interval ��� � �
Most programming languages� C for example� are similar in this respect� wecan de�ne functions only by giving them names� For example� in order to usethe successor function which adds � to its argument� in nontrivial ways e�g�consider a pointer to it�� then even though it is very simple� we need to name itvia some function de�nition such as�
int suc�int n�
� return n �
�
In either mathematics or programming� this seems quite natural� and gener�ally works well enough� However it can get clumsy when higher order functionsfunctions that manipulate other functions� are involved� In any case� if we wantto treat functions on a par with other mathematical objects� the insistence onnaming is rather inconsistent� When discussing an arithmetical expression builtup from simpler ones� we just write the subexpressions down� without needing togive them names� Imagine if we always had to deal with arithmetic expressionsin this way�
De�ne x and y by x � � and y � � respectively� Then xx � y�
�
CHAPTER �� LAMBDA CALCULUS
Lambda notation allows one to denote functions in much the same way as anyother sort of mathematical object� There is a mainstream notation sometimesused in mathematics for this purpose� though it s normally still used as part ofthe de�nition of a temporary name� We can write
x �� t�x
to denote the function mapping any argument x to some arbitrary expression t�x �which usually� but not necessarily� contains x it is occasionally useful to #throwaway$ an argument�� However� we shall use a di�erent notation developed byChurch ������
�x� t�x
which should be read in the same way� For example� �x�x is the identity functionwhich simply returns its argument� while �x� x� is the squaring function�
The symbol � is completely arbitrary� and no signi�cance should be read intoit� Indeed one often sees� particularly in French texts� the alternative notation�x t�x �� Apparently it arose by a complicated process of evolution� Originally�the famous Principia Mathematica Whitehead and Russell ����� used the �hat notation t�%x for the function of x yielding t�x � Church modi�ed this to %x� t�x �but since the typesetter could not place the hat on top of the x� this appeared as�x� t�x � which then mutated into �x� t�x in the hands of another typesetter�
��� The bene�ts of lambda notation
Using lambda notation we can clear up some of the confusion engendered byinformal mathematical notation� For example� it s common to talk sloppily about�fx� � leaving context to determine whether we mean f itself� or the result ofapplying it to particular x� A further bene�t is that lambda notation givesan attractive analysis of practically the whole of mathematical notation� If westart with variables and constants� and build up expressions using just lambda�abstraction and application of functions to arguments� we can represent verycomplicated mathematical expressions�
We will use the conventional notation fx� for the application of a function fto an argument x� except that� as is traditional in lambda notation� the bracketsmay be omitted� allowing us to write just f x� For reasons that will becomeclear in the next paragraph� we assume that function application associates tothe left� i�e� f x y means fx��y�� As a shorthand for �x� �y� t�x� y we will use�x y� t�x� y � and so on� We also assume that the scope of a lambda abstractionextends as far to the right as possible� For example �x�x y means �x�x y� ratherthan �x� x� y�
���� THE BENEFITS OF LAMBDA NOTATION �
At �rst sight� we need some special notation for functions of several arguments�However there is a way of breaking down such applications into ordinary lambdanotation� called currying� after the logician Curry ������ Actually the devicehad previously been used by both Frege ���� and Sch&on�nkel ������ but it seasy to understand why the corresponding appellations haven t caught the publicimagination�� The idea is to use expressions like �x y� x " y� This may beregarded as a function R � R � R�� so it is said to be a �higher order function or �functional since when applied to one argument� it yields another function�which then accepts the second argument� In a sense� it takes its arguments oneat a time rather than both together� So we have for example�
�x y� x " y� � � � �y� � " y� � � � " �
Observe that function application is assumed to associate to the left in lambdanotation precisely because currying is used so much�
Lambda notation is particularly helpful in providing a uni�ed treatment ofbound variables� Variables in mathematics normally express the dependency ofsome expression on the value of that variable� for example� the value of x� " �depends on the value of x� In such contexts� we will say that a variable is free�However there are other situations where a variable is merely used as a place�marker� and does not indicate such a dependency� Two common examples arethe variable m in
nXm��
m �nn " ��
�
and the variable y in
Z x
��y " a dy � x� " ax
In logic� the quanti�ers x� P �x �for all x� P �x � and x� P �x �there existsan x such that P �x � provide further examples� and in set theory we have setabstractions like fx j P �x g as well as indexed unions and intersections� In suchcases� a variable is said to be bound� In a certain subexpression it is free� but inthe whole expression� it is bound by a variable�binding operation like summation�The part �inside this variable�binding operation is called the scope of the boundvariable�
A similar situation occurs in most programming languages� at least from Algol�� onwards� Variables have a de�nite scope� and the formal arguments of proce�dures and functions are e�ectively bound variables� e�g� n in the C de�nition ofthe successor function given above� One can actually regard variable declarationsas binding operations for the enclosed instances of the corresponding variables��Note� by the way� that the scope of a variable should be distinguished sharplyfrom its lifetime� In the C function rand that we gave in the introduction� n had
�� CHAPTER �� LAMBDA CALCULUS
a textually limited scope but it retained its value even outside the execution ofthat part of the code�
We can freely change the name of a bound variable without changing themeaning of the expression� e�g�Z x
��z " a dz � x� " ax
Similarly� in lambda notation� �x� E�x and �y� E�y are equivalent� this iscalled alpha�equivalence and the process of transforming between such pairs iscalled alpha�conversion� We should add the proviso that y is not a free variablein E�x � or the meaning clearly may change� just asZ x
��a " a da �� x� " ax
It is possible to have identically�named free and bound variables in the sameexpression� though this can be confusing� it is technically unambiguous� e�g�Z x
��x " a dx � x� " ax
In fact the usual Leibniz notation for derivatives has just this property� e�g� in�
d
dxx� � �x
x is used both as a bound variable to indicate that di�erentiation is to takeplace with respect to x� and as a free variable to show where to evaluate theresulting derivative� This can be confusing� e�g� f �gx�� is usually taken to meansomething di�erent from d
dxfgx��� Careful writers� especially in multivariate
work� often make the separation explicit by writing�
jd
dxx�jx � �x
or
jd
dzz�jx � �x
Part of the appeal of lambda notation is that all variable�binding opera�tions like summation� di�erentiation and integration can be regarded as func�tions applied to lambda�expressions� Subsuming all variable�binding operationsby lambda abstraction allows us to concentrate on the technical problems ofbound variables in one particular situation� For example� we can view d
dxx� as
a syntactic sugaring of D �x� x�� x where D � R � R� � R � R is a dif�ferentiation operator� yielding the derivative of its �rst function� argument atthe point indicated by its second argument� Breaking down the everyday syntaxcompletely into lambda notation� we have D �x� EXP x �� x for some constantEXP representing the exponential function�
���� RUSSELL�S PARADOX ��
In this way� lambda notation is an attractively general �abstract syntax formathematics� all we need is the appropriate stock of constants to start with�Lambda abstraction seems� in retrospect� to be the appropriate primitive in termsof which to analyze variable binding� This idea goes back to Church s encodingof higher order logic in lambda notation� and as we shall see in the next chapter�Landin has pointed out how many constructs from programming languages havea similar interpretation� In recent times� the idea of using lambda notation asa universal abstract syntax has been put especially clearly by Martin�L&of� andis often referred to in some circles as �Martin�L&of s theory of expressions andarities ��
��� Russell�s paradox
As we have said� one of the appeals of lambda notation is that it permits ananalysis of more or less all of mathematical syntax� Originally� Church hoped togo further and include set theory� which� as is well known� is powerful enough toform a foundation for much of modern mathematics� Given any set S� we canform its so�called characteristic predicate �S� such that�
�Sx� �
�true if x � Sfalse if x �� S
Conversely� given any unary predicate i�e� function of one argument� P �we can consider the set of all x satisfying P x� � we will just write P x� forP x� � true� Thus� we see that sets and predicates are just di�erent ways oftalking about the same thing� Instead of regarding S as a set� and writing x � S�we can regard it as a predicate and write Sx��
This permits a natural analysis into lambda notation� we can allow arbitrarylambda expressions as functions� and hence indirectly as sets� Unfortunately�this turns out to be inconsistent� The simplest way to see this is to consider theRussell paradox of the set of all sets that do not contain themselves�
R � fx j x �� xg
We have R � R� R �� R� a stark contradiction� In terms of lambda de�nedfunctions� we set R � �x� x x�� and �nd that R R � R R�� obviously counterto the intuitive meaning of the negation operator �
To avoid such paradoxes� Church ����� followed Russell in augmenting lambdanotation with a notion of type� we shall consider this in a later chapter� Howeverthe paradox itself is suggestive of some interesting possibilities in the standard�untyped� system� as we shall see later�
�This was presented at the Brouwer Symposium in ���� but was not described in the printedproceedings�
�� CHAPTER �� LAMBDA CALCULUS
��� Lambda calculus as a formal system
We have taken for granted certain obvious facts� e�g� that �y� � " y� � � � " ��since these re�ect the intended meaning of abstraction and application� which arein a sense converse operations� Lambda calculus arises if we enshrine certain suchprinciples� and only those� as a set of formal rules� The appeal of this is that therules can then be used mechanically� just as one might transform x � � � �� xinto �x � � " � without pausing each time to think about why these rules aboutmoving things from one side of the equation to the other are valid� As Whitehead����� says� symbolism and formal rules of manipulation�
�� � � have invariably been introduced to make things easy� �� � � bythe aid of symbolism� we can make transitions in reasoning almostmechanically by the eye� which otherwise would call into play thehigher faculties of the brain� �� � � Civilisation advances by extendingthe number of important operations which can be performed withoutthinking about them�
����� Lambda terms
Lambda calculus is based on a formal notion of lambda term� and these termsare built up from variables and some �xed set of constants using the operationsof function application and lambda abstraction� This means that every lambdaterm falls into one of the following four categories�
�� Variables� these are indexed by arbitrary alphanumeric strings� typicallywe will use single letters from towards the end of the alphabet� e�g� x� yand z�
�� Constants� how many constants there are in a given syntax of lambdaterms depends on context� Sometimes there are none at all� We will alsodenote them by alphanumeric strings� leaving context to determine whenthey are meant to be constants�
�� Combinations� i�e� the application of a function s to an argument t� boththese components s and t may themselves be arbitrary ��terms� We willwrite combinations simply as s t� We often refer to s as the �rator and tas the �rand short for �operator and �operand respectively��
�� Abstractions of an arbitrary lambda�term s over a variable x which mayor may not occur free in s�� denoted by �x� s�
Formally� this de�nes the set of lambda terms inductively� i�e� lambda termsarise only in these four ways� This justi�es our�
���� LAMBDA CALCULUS AS A FORMAL SYSTEM ��
� De�ning functions over lambda terms by primitive recursion�
� Proving properties of lambda terms by structural induction�
A formal discussion of inductive generation� and the notions of primitive re�cursion and structural induction� may be found elsewhere� We hope most readerswho are unfamiliar with these terms will �nd that the examples below give thema su�cient grasp of the basic ideas�
We can describe the syntax of lambda terms by a BNF Backus�Naur form�grammar� just as we do for programming languages�
Exp � V ar j Const j Exp Exp j � V ar� Exp
and� following the usual computer science view� we will identify lambda termswith abstract syntax trees� rather than with sequences of characters� This meansthat conventions such as the left�association of function application� the readingof �x y� s as �x� �y� s� and the ambiguity over constant and variable names arepurely a matter of parsing and printing for human convenience� and not part ofthe formal system�
One feature worth mentioning is that we use single characters to stand forvariables and constants in the formal system of lambda terms and as so�called�metavariables standing for arbitrary terms� For example� �x� s might representthe constant function with value s� or an arbitrary lambda abstraction using thevariable x� To make this less confusing� we will normally use letters such as s� tand u for metavariables over terms� It would be more precise if we denoted thevariable x by Vx the x th variable� and likewise the constant k by Ck � thenall the variables in terms would have the same status� However this makes theresulting terms a bit cluttered�
����� Free and bound variables
We now formalize the intuitive idea of free and bound variables in a term� which�incidentally� gives a good illustration of de�ning a function by primitive recursion�Intuitively� a variable in a term is free if it does not occur inside the scope of acorresponding abstraction� We will denote the set of free variables in a term sby FV s�� and de�ne it by recursion as follows�
FV x� � fxg
FV c� � �
FV s t� � FV s� � FV t�
FV �x� s� � FV s�� fxg
Similarly we can de�ne the set of bound variables in a term BV s��
�� CHAPTER �� LAMBDA CALCULUS
BV x� � �
BV c� � �
BV s t� � BV s� �BV t�
BV �x� s� � BV s� � fxg
For example� if s � �x y� x� �x� z x� we have FV s� � fzg and BV s� �fx� yg� Note that in general a variable can be both free and bound in the sameterm� as illustrated by some of the mathematical examples earlier� As an exampleof using structural induction to establish properties of lambda terms� we will provethe following theorem a similar proof works for BV too��
Theorem ��� For any lambda term s� the set FV s� is �nite�Proof� By structural induction� Certainly if s is a variable or a constant� thenby de�nition FV s� is �nite� since it is either a singleton or empty� If s isa combination t u then by the inductive hypothesis� FV t� and FV u� are both�nite� and then FV s� � FV t��FV u�� which is therefore also �nite �the unionof two �nite sets is �nite�� Finally� if s is of the form �x� t then FV t� is �nite�by the inductive hypothesis� and by de�nition FV s� � FV t�� fxg which mustbe �nite too� since it is no larger� Q�E�D�
����� Substitution
The rules we want to formalize include the stipulation that lambda abstractionand function application are inverse operations� That is� if we take a term �x� sand apply it as a function to an argument term t� the answer is the term s withall free instances of x replaced by t� We often make this more transparent indiscussions by using the notation �x� s�x and s�t for the respective terms�
However this simple�looking notion of substituting one term for a variable inanother term is surprisingly di�cult� Some notable logicians have made faultystatements regarding substitution� In fact� this di�culty is rather unfortunatesince as we have said� the appeal of formal rules is that they can be appliedmechanically�
We will denote the operation of substituting a term s for a variable x in an�other term t by t�s�x � One sometimes sees various other notations� e�g� t�x��s ��s�x t� or even t�x�s � The notation we use is perhaps most easily rememberedby noting the vague analogy with multiplication of fractions� x�t�x � t� At �rstsight� one can de�ne substitution formally by recursion as follows�
x�t�x � t
y�t�x � y if x �� y
���� LAMBDA CALCULUS AS A FORMAL SYSTEM ��
c�t�x � c
s� s���t�x � s��t�x s��t�x
�x� s��t�x � �x� s
�y� s��t�x � �y� s�t�x � if x �� y
However this isn t quite right� For example �y�x"y��y�x � �y�y"y� whichdoesn t correspond to the intuitive answer�� The original lambda term was �thefunction that adds x to its argument � so after substitution� we might expectto get �the function that adds y to its argument � What we actually get is �thefunction that doubles its argument � The problem is that the variable y that wehave substituted is captured by the variable�binding operation �y� � � �� We should�rst rename the bound variable�
�y� x " y� � �w� x " w�
and only now perform a naive substitution operation�
�w� x " w��y�x � �w� y " w
We can take two approaches to this problem� Either we can add a conditionon all instances of substitution to disallow it wherever variable capture wouldoccur� or we can modify the formal de�nition of substitution so that it performsthe appropriate renamings automatically� We will opt for this latter approach�Here is the de�nition of substitution that we use�
x�t�x � t
y�t�x � y if x �� y
c�t�x � c
s� s���t�x � s��t�x s��t�x
�x� s��t�x � �x� s
�y� s��t�x � �y� s�t�x � if x �� y and either x �� FV s� or y �� FV t�
�y� s��t�x � �z� s�z�y �t�x � otherwise� where z �� FV s� � FV t�
The only di�erence is in the last two lines� We substitute as before in thetwo safe situations where either x isn t free in s� so the substitution is trivial� orwhere y isn t free in t� so variable capture won t occur at this level�� Howeverwhere these conditions fail� we �rst rename y to a new variable z� chosen not tobe free in either s or t� then proceed as before� For de�niteness� the variable zcan be chosen in some canonical way� e�g� the lexicographically �rst name notoccurring as a free variable in either s or t��
�We will continue to use in�x syntax for standard operators� strictly we should write � x yrather than x� y�
�Cognoscenti may also be worried that this de�nition is not in fact primitive recursive
�� CHAPTER �� LAMBDA CALCULUS
����� Conversions
Lambda calculus is based on three �conversions � which transform one term intoanother one intuitively equivalent to it� These are traditionally denoted by theGreek letters � alpha�� � beta� and eta��� Here are the formal de�nitions ofthe operations� using annotated arrows for the conversion relations�
� Alpha conversion� �x� s ���
�y� s�y�x provided y �� FV s�� For example�
�u�u v ���
�w�w v� but �u�u v ����
�v� v v� The restriction avoids another
instance of variable capture�
� Beta conversion� �x� s� t ���
s�t�x �
� Eta conversion� �x�t x ���
t� provided x �� FV t�� For example �u�v u ���
v but �u� u u ����
u�
Of the three� ��conversion is the most important one to us� since it representsthe evaluation of a function on an argument� ��conversion is a technical device tochange the names of bound variables� while �conversion is a form of extensionalityand is therefore mainly of interest to those taking a logical� not a programming�view of lambda calculus�
����� Lambda equality
Using these conversion rules� we can de�ne formally when two lambda terms areto be considered equal� Roughly� two terms are equal if it is possible to get fromone to the other by a �nite sequence of conversions �� � or �� either forward orbackward� at any depth inside the term� We can say that lambda equality is thecongruence closure of the three reduction operations together� i�e� the smallestrelation containing the three conversion operations and closed under re�exivity�symmetry� transitivity and substitutivity� Formally we can de�ne it inductivelyas follows� where the horizontal lines should be read as �if what is above the lineholds� then so does what is below �
s ���
t or s ���
t or s ���
t
s � t
t � t
because of the last clause� However it can easily be modi�ed into a primitive recursive de�nitionof multiple parallel substitution� This procedure is analogous to strengthening an inductionhypothesis during a proof by induction� Note that by construction the pair of substitutions inthe last line can be done in parallel rather than sequentially without a�ecting the result�
�These names are due to Curry� Church originally referred to ��conversion and ��conversionas �rule of procedure I� and �rule of procedure II� respectively�
���� LAMBDA CALCULUS AS A FORMAL SYSTEM ��
s � t
t � s
s � t and t � u
s � u
s � t
s u � t u
s � t
u s � u t
s � t
�x� s � �x� t
Note that the use of the ordinary equality symbol �� here is misleading� Weare actually de�ning the relation of lambda equality� and it isn t clear that itcorresponds to equality of the corresponding mathematical objects in the usualsense�� Certainly it must be distinguished sharply from equality at the syntacticlevel� We will refer to this latter kind of equality as �identity and use the specialsymbol �� For example �x� x �� �y� y but �x� x � �y� y�
For many purposes� ��conversions are immaterial� and often �� is used in�stead of strict identity� This is de�ned like lambda equality� except that only��conversions are allowed� For example� �x� x�y �� �y� y�y� Many writers usethis as identity on lambda terms� i�e� consider equivalence classes of terms under��� There are alternative formalizations of syntax where bound variables areunnamed de Bruijn ������ and here syntactic identity corresponds to our ���
����� Extensionality
We have said that �conversion embodies a principle of extensionality� In generalphilosophical terms� two properties are said to be extensionally equivalent orcoextensive� when they are satis�ed by exactly the same objects� In mathematics�we usually take an extensional view of sets� i�e� say that two sets are equalprecisely if they have the same elements� Similarly� we normally say that twofunctions are equal precisely if they have the same domain and give the sameresult on all arguments in that domain�
As a consequence of �conversion� our notion of lambda equality is extensional�Indeed� if f x and g x are equal for any x� then in particular f y � g y wherey is chosen not to be free in either f or g� Therefore by the last rule above��y� f y � �y� g y� Now by �converting a couple of times at depth� we see that
�Indeed we haven�t been very precise about what the corresponding mathematical objectsare� But there are models of the lambda calculus where our lambda equality is interpreted asactual equality�
� CHAPTER �� LAMBDA CALCULUS
f � g� Conversely� extensionality implies that all instances of �conversion doindeed give a valid equation� since by ��reduction� �x� t x� y � t y for any ywhen x is not free in t� This is the import of �conversion� and having discussedthat� we will largely ignore it in favour of the more computationally signi�cant��conversion�
����� Lambda reduction
Lambda equality� unsurprisingly� is a symmetric relation� Though it capturesthe notion of equivalence of lambda terms well� it is more interesting from acomputational point of view to consider an asymmetric version� We will de�ne a�reduction relation �� as follows�
s ���
t or s ���
t or s ���
t
s �� t
t �� t
s �� t and t �� u
s �� u
s �� t
s u �� t u
s �� t
u s �� u t
s �� t
�x� s �� �x� t
Actually the name �reduction and one also hears ��conversion called ��reduction� is a slight misnomer� since it can cause the size of a term to grow�e�g�
�x� x x x� �x� x x x� �� �x� x x x� �x� x x x� �x� x x x�
�� �x� x x x� �x� x x x� �x� x x x� �x� x x x�
�� � � �
However reduction does correspond to a systematic attempt to evaluate a termby repeatedly evaluating combinations fx� where f is a lambda abstraction�When no more reductions except for � conversions are possible we say that theterm is in normal form�
���� LAMBDA CALCULUS AS A FORMAL SYSTEM ��
���� Reduction strategies
Let us recall� in the middle of these theoretical considerations� the relevance ofall this to functional programming� A functional program is an expression andexecuting it means evaluating the expression� In terms of the concepts discussedhere� we are proposing to start with the relevant term and keep on applying re�ductions until there is nothing more to be evaluated� But how are we to choosewhich reduction to apply at each stage� The reduction relation is not determin�istic� i�e� for some terms t there are several ti such that t �� ti� Sometimes thiscan make the di�erence between a �nite and in�nite reduction sequence� i�e� be�tween a program terminating and failing to terminate� For example� by reducingthe innermost redex reducible expression� in the following� we have an in�nitereduction sequence�
�x� y� �x� x x x� �x� x x x��
�� �x� y� �x� x x x� �x� x x x� �x� x x x��
�� �x� y� �x� x x x� �x� x x x� �x� x x x� �x� x x x��
�� � � �
and so ad in�nitum� However the alternative of reducing the outermost redex�rst gives�
�x� y� �x� x x x� �x� x x x�� �� y
immediately� and there are no more reductions to apply�The situation is clari�ed by the following theorems� whose proofs are too long
to be given here� The �rst one says that the situation we have noted above istrue in a more general sense� i�e� that reducing the leftmost outermost redex isthe best strategy for ensuring termination�
Theorem ��� If s �� t with t in normal form� then the reduction sequence thatarises from s by always reducing the leftmost outermost redex is guaranteed toterminate in normal form�
Formally� we de�ne the �leftmost outermost redex recursively� for a term�x� s� t it is the term itself� for any other term s t it is the leftmost outermostredex of s� and for an abstraction �x� s it is the leftmost outermost redex of s�In terms of concrete syntax� we always reduce the redex whose � is the furthestto the left�
���� The Church�Rosser theorem
The next assertion� the famous Church�Rosser theorem� states that if we startfrom a term t and perform any two �nite reduction sequences� there are always
�� CHAPTER �� LAMBDA CALCULUS
two more reduction sequences that bring the two back to the same term thoughof course this might not be in normal form��
Theorem ��� If t �� s� and t �� s�� then there is a term u such that s� �� uand s� �� u�
This has at least the following important consequences�
Corollary ��� If t� � t� then there is a term u with t� �� u and t� �� u�Proof� It is easy to see �by structural induction� that the equality relation � isin fact the symmetric transitive closure of the reduction relation� Now we canproceed by induction over the construction of the symmetric transitive closure�However less formally minded readers will probably �nd the following diagrammore convincing�
t� t���R
���
��R
���
��R
�����R
���
��R�����R�����R���
��R
���
��R
���
��R���u
We assume t� � t�� so there is some sequence of reductions in both directions�i�e� the zigzag at the top� that connects them� Now the Church�Rosser theoremallows us to �ll in the remainder of the sides in the above diagram� and hencereach the result by composing these reductions� Q�E�D�
Corollary �� If t � t� and t � t� with t� and t� in normal form� then t� �� t��i�e� t� and t� are equal apart from � conversions�Proof� By the �rst corollary� we have some u with t� �� u and t� �� u� Butsince t� and t� are already in normal form� these reduction sequences to u canonly consist of alpha conversions� Q�E�D�
Hence normal forms� when they do exist� are unique up to alpha conversion�This gives us the �rst proof we have available that the relation of lambda equalityisn t completely trivial� i�e� that there are any unequal terms� For example� since�x y� x and �x y� y are not interconvertible by alpha conversions alone� theycannot be equal�
Let us sum up the computational signi�cance of all these assertions� In somesense� reducing the leftmost outermost redex is the best strategy� since it willwork if any strategy will� This is known as normal order reduction� On the otherhand any terminating reduction sequence will always give the same result� andmoreover it is never too late to abandon a given strategy and start using normalorder reduction� We will see later how this translates into practical terms�
���� COMBINATORS ��
��� Combinators
Combinators were actually developed as an independent theory by Sch&on�nkel����� before lambda notation came along� Moreover Curry rediscovered thetheory soon afterwards� independently of Sch&on�nkel and of Church� When hefound out about Sch&on�nkel s work� Curry attempted to contact him� but by thattime� Sch&on�nkel had been committed to a lunatic asylum�� We will distort thehistorical development by presenting the theory of combinators as an aspect oflambda calculus�
We will de�ne a combinator simply to be a lambda term with no free vari�ables� Such a term is also said to be closed� it has a �xed meaning independentof the values of any variables� Now we will later� in the course of functionalprogramming� come across many useful combinators� But the cornerstone of thetheory of combinators is that one can get away with just a few combinators� andexpress in terms of those and variables any term at all� the operation of lambdaabstraction is unnecessary� In particular� a closed term can be expressed purelyin terms of these few combinators� We start by de�ning�
I � �x� x
K � �x y� x
S � �f g x� f x�g x�
We can motivate the names as follows�� I is the identity function� K producesconstant functions� when applied to an argument a it gives the function �y� a�Finally S is a �sharing combinator� which takes two functions and an argumentand shares out the argument among the functions� Now we prove the following�
Lemma ��� For any lambda term t not involving lambda abstraction� there isa term u also not containing lambda abstractions� built up from S� K� I andvariables� with FV u� � FV t� � fxg and u � �x� t� i�e� u is lambda�equal to�x� t�Proof� By structural induction on the term t� By hypothesis� it cannot be anabstraction� so there are just three cases to consider�
� If t is a variable� then there are two possibilities� If it is equal to x� then�x� x � I� so we are �nished� If not� say t � y� then �x� y � K y�
� If t is a constant c� then �x� c � K c�
� If t is a combination� say s u� then by the inductive hypothesis� there arelambda�free terms s� and u� with s� � �x� s and u� � �x� u� Now we claimS s� u� suces� Indeed�
�We are not claiming these are the historical reasons for them��Konstant � Sch�on�nkel was German though in fact he originally used C�
�� CHAPTER �� LAMBDA CALCULUS
S s� u� x � S �x� s� �x� u� x
� �x� s� x��x� u� x�
� s u
� t
Therefore� by �conversion� we have S s� u� � �x� S s� u� x � �x� t� sinceby the inductive hypothesis x is not free in s� or u��
Q�E�D�
Theorem ��� For any lambda term t� there is a lambda�free term t� built upfrom S� K� I and variables� with FV t�� � FV t� and t� � t�Proof� By structural induction on t� using the lemma� For example� if t is �x� s�we �rst �nd� by the inductive hypothesis� a lambda�free equivalent s� of s� Nowthe lemma can be applied to �x� s�� The other cases are straightforward� Q�E�D�
This remarkable fact can be strengthened� since I is de�nable in terms of Sand K� Note that for any A�
S K A x � K x�A x�
� �y� x�A x�
� x
So again by �converting� we see that I � S K A for any A� It is customary�for reasons that will become clear when we look at types� to use A � K� SoI � S K K� and we can avoid the use of I in our combinatory expression�
Note that the proofs above are constructive� in the sense that they guideone in a de�nite procedure that� given a lambda term� produces a combinatorequivalent� One proceeds bottom�up� and at each lambda abstraction� which byconstruction has a lambda�free body� applies the top�down transformations givenin the lemma�
Although we have presented combinators as certain lambda terms� they canalso be developed as a theory in their own right� That is� one starts with aformal syntax excluding lambda abstractions but including combinators� Insteadof �� � and conversions� one posits conversion rules for expressions involvingcombinators� e�g� K x y �� x� As an independent theory� this has manyanalogies with lambda calculus� e�g� the Church�Rosser theorem holds for thisnotion of reduction too� Moreover� the ugly di�culties connected with boundvariables are avoided completely� However the resulting system is� we feel� lessintuitive� since combinatory expressions can get rather obscure�
���� COMBINATORS ��
Apart from their purely logical interest� combinators have a certain practicalpotential� As we have already hinted� and as will become much clearer in the laterchapters� lambda calculus can be seen as a simple functional language� formingthe core of real practical languages like ML� We might say that the theorem ofcombinatory completeness shows that lambda calculus can be �compiled down toa �machine code of combinators� This computing terminology is not as fancifulas it appears� Combinators have been used as an implementation technique forfunctional languages� and real hardware has been built to evaluate combinatoryexpressions�
Further reading
An encyclopedic but clear book on lambda calculus is Barendregt ����� Anotherpopular textbook is Hindley and Seldin ����� Both these contain proofs of theresults that we have merely asserted� A more elementary treatment focusedtowards computer science is Part II of Gordon ���� Much of our presentationhere and later is based on this last book�
Exercises
�� Find a normal form for �x x x� x� a b c�
�� De�ne twice � �f x� ffx�� What is the intuitive meaning of twice�Find a normal form for twice twice twice f x� Remember that functionapplication associates to the left��
�� Find a term t such that t ���
t� Is it true to say that a term is in normal
form if and only if whenever t �� t� then t �� t��
�� What are the circumstances under which s�t�x �u�y �� s�u�y �t�x �
�� Find an equivalent in terms of the S� K and I combinators alone for�f x� fx x��
�� Find a single combinator X such that all ��terms are equal to a term builtfrom X and variables� You may �nd it helpful to consider A � �p�p K S Kand then think about A A A and A A A��
�� Prove that any X is a �xed point combinator if and only if it is itself a �xedpoint of G� where G � �y m� my m��
Chapter �
Lambda calculus as a
programming language
One of the central questions studied by logicians in the ����s was the Entschei�dungsproblem or �decision problem � This asked whether there was some system�atic� mechanical procedure for deciding validity in �rst order logic� If it turnedout that there was� it would have fundamental philosophical� and perhaps practi�cal� signi�cance� It would mean that a huge variety of complicated mathematicalproblems could in principle be solved purely by following some single �xed method� we would now say algorithm � no additional creativity would be required�
As it stands the question is hard to answer� because we need to de�ne in math�ematical terms what it means for there to be a �systematic� mechanical procedurefor something� Perhaps Turing ����� gave the best analysis� by arguing that themechanical procedures are those that could in principle be performed by a rea�sonably intelligent clerk with no knowledge of the underlying subject matter� Heabstracted from the behaviour of such a clerk and arrived at the famous notionof a �Turing machine � Despite the fact that it arose by considering a human�computer � and was purely a mathematical abstraction� we now see a Turingmachine as a very simple computer� Despite being simple� a Turing machine iscapable of performing any calculation that can be done by a physical computer��
A model of computation of equal power to a Turing machine is said to be Turingcomplete�
At about the same time� there were several other proposed de�nitions of�mechanical procedure � Most of these turned out to be equivalent in power toTuring machines� In particular� although it originally arose as a foundation formathematics� lambda calculus can also be seen as a programming language� to beexecuted by performing ��conversions� In fact� Church proposed before Turingthat the set of operations that can be expressed in lambda calculus formalizes the
�Actually more since it neglects the necessary �nite limit to the computer�s store� Arguablyany existing computer is actually a �nite state machine but the assumption of unboundedmemory seems a more fruitful abstraction�
��
��
intuitive idea of a mechanical procedure� a postulate known as Churchs thesis�Church ����� showed that if this was accepted� the Entscheidungsproblem isunsolvable� Turing later showed that the functions de�nable in lambda calculusare precisely those computable by a Turing machine� which lent more credibilityto Church s claims�
To a modern programmer� Turing machine programs are just about recog�nizable as a very primitive kind of machine code� Indeed� it seems that Turingmachines� and in particular Turing s construction of a �universal machine �� werea key in�uence in the development of modern stored program computers� thoughthe extent and nature of this in�uence is still a matter for controversy Robinson������ It is remarkable that several of the other proposals for a de�nition of�mechanical procedure � often made before the advent of electronic computers�correspond closely to real programming methods� For example� Markov algo�rithms� the standard approach to computability in the former� Soviet Union�can be seen as underlying the programming language SNOBOL� In what follows�we are concerned with the in�uence of lambda calculus on functional program�ming languages�
LISP� the second oldest high level language FORTRAN being the oldest�took a few ideas from lambda calculus� in particular the notation �LAMBDA � � �� for anonymous functions� However it was not really based on lambda calculusat all� In fact the early versions� and some current dialects� use a system ofdynamic binding that does not correspond to lambda calculus� We will discussthis later�� Moreover there was no real support for higher order functions� whilethere was considerable use of imperative features� Nevertheless LISP deserves tobe considered as the �rst functional programming language� and pioneered manyof the associated techniques such as automatic storage allocation and garbagecollection�
The in�uence of lambda calculus really took o� with the work of Landin andStrachey in the ��s� In particular� Landin showed how many features of currentimperative� languages could usefully be analyzed in terms of lambda calculus�e�g� the notion of variable scoping in Algol ��� Landin ����� went on to proposethe use of lambda calculus as a core for programming languages� and inventeda functional language ISWIM �If you See What I Mean �� This was widelyin�uential and spawned many real languages�
ML started life as the metalanguage hence the name ML� of a theorem prov�ing system called Edinburgh LCF Gordon� Milner� and Wadsworth ������ Thatis� it was intended as a language in which to write algorithms for proving theo�rems in a formal deductive calculus� It was strongly in�uenced by ISWIM butadded an innovative polymorphic type system including abstract types� as well asa system of exceptions to deal with error conditions� These facilities were dictated
�A single Turing machine program capable of simulating any other � we would now regardit as an interpreter� You will learn about this in the course on Computation Theory�
��CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE
by the application at hand� and this policy resulted in a coherent and focuseddesign� This narrowness of focus is typical of successful languages C is anothergood example� and contrasts sharply with the failure of committee�designed lan�guages like Algol �� which have served more as a source of important ideas thanas popular practical tools� We will have more to say about ML later� Let us nowsee how pure lambda calculus can be used as a programming language�
��� Representing data in lambda calculus
Programs need data to work on� so we start by �xing lambda expressions toencode data� Furthermore� we de�ne a few basic operations on this data� In manycases� we show how a string of human�readable syntax s can be translated directlyinto a lambda expression s�� This procedure is known as �syntactic sugaring �it makes the bitter pill of pure lambda notation easier to digest� We write suchde�nitions as�
s�� s�
This notation means �s � s� by de�nition � one also commonly sees this writtenas �s �def s� � If preferred� we could always regard these as de�ning a singleconstant denoting that operation� which is then applied to arguments in the usuallambda calculus style� the surface notation being irrelevant� For example we canimagine �if E then E� else E� as �COND E E� E� for some constant COND� Insuch cases� any variables on the left of the de�nition need to be abstracted over�e�g� instead of
fst p�� p true
see below� we could write
fst�� �p� p true
����� Truth values and the conditional
We can use any unequal lambda expressions to stand for �true and �false � butthe following work particularly smoothly�
true�� �x y� x
false�� �x y� y
Given those de�nitions� we can easily de�ne a conditional expression just likeC s ��� construct� Note that this is a conditional expression not a conditional
���� REPRESENTING DATA IN LAMBDA CALCULUS ��
command this notion makes no sense in our context� and therefore the �else armis compulsory�
if E then E� else E��� E E� E�
indeed we have�
if true then E� else E� � true E� E�
� �x y� x� E� E�
� E�
and
if false then E� else E� � false E� E�
� �x y� y� E� E�
� E�
Once we have the conditional� it is easy to de�ne all the usual logical operations�
not p�� if p then false else true
p and q�� if p then q else false
p or q�� if p then true else q
����� Pairs and tuples
We can represent ordered pairs as follows�
E�� E���� �f� f E� E�
The parentheses are not obligatory� but we often use them for the sake offamiliarity or to enforce associations� In fact we simply regard the comma as anin�x operator like "� Given the above de�nition� the corresponding destructorsfor pairs can be de�ned as�
fst p�� p true
snd p�� p false
It is easy to see that these work as required�
�CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE
fst p� q� � p� q� true
� �f� f p q� true
� true p q
� �x y� x� p q
� p
and
snd p� q� � p� q� false
� �f� f p q� false
� false p q
� �x y� y� p q
� q
We can build up triples� quadruples� quintuples� indeed arbitrary n�tuples� byiterating the pairing construct�
E�� E�� � � � � En� � E�� E�� � � � En��
We need simply say that the in�x comma operator is right�associative� andcan understand this without any other conventions� For example�
p� q� r� s� � p� q� r� s���
� �f� f p q� r� s��
� �f� f p �f� f q r� s��
� �f� f p �f� f q �f� f r s��
� �f� f p �g� g q �h� h r s��
where in the last line we have performed an alpha�conversion for the sake ofclarity� Although tuples are built up in a �at manner� it is easy to create arbitrary�nitely�branching tree structures by using tupling repeatedly� Finally� if oneprefers conventional functions over Cartesian products to our �curried functions�one can convert between the two using the following�
CURRY f�� �x y� fx� y�
UNCURRY g�� �p� g fst p� snd p�
���� REPRESENTING DATA IN LAMBDA CALCULUS ��
These special operations for pairs can easily be generalized to arbitrary n�tuples� For example� we can de�ne a selector function to get the ith componentof a �at tuple p� We will write this operation as p�i and de�ne p�� � fst pand the others by p�i � fst sndi�� p�� Likewise we can generalize CURRY andUNCURRY�
CURRYn f�� �x� � � � xn� fx�� � � � � xn�
UNCURRYn g�� �p� g p�� � � � p�n
We can now write �x�� � � � � xn�� t as an abbreviation for�
UNCURRYn �x� � � � xn� t�
giving a natural notation for functions over Cartesian products�
����� The natural numbers
We will represent each natural number n as follows��
n�� �f x� fn x
that is� � � �f x� x� � � �f x� f x� � � �f x� f f x� etc� These representa�tions are known as Church numerals� though the basic idea was used earlier byWittgenstein ������� They are not a very e�cient representation� amountingessentially to counting in unary� �� ��� ���� ����� ������ ������� � � �� There are�from an e�ciency point of view� much better ways of representing numbers� e�g�tuples of trues and falses as binary expansions� But at the moment we are onlyinterested in computability �in principle and Church numerals have various niceformal properties� For example� it is easy to give lambda expressions for thecommon arithmetic operators such as the successor operator which adds one toits argument�
SUC�� �n f x� n f f x�
Indeed
SUC n � �n f x� n f f x���f x� fn x�
� �f x� �f x� fn x�f f x�
� �f x� �x� fn x�f x�
�The n in fn x is just a meta�notation and does not mean that there is anything circularabout our de�nition�
������ A number is the exponent of an operation��
��CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE
� �f x� fn f x�
� �f x� fn� x
� n " �
Similarly it is easy to test a numeral to see if it is zero�
ISZERO n�� n �x� false� true
since�
ISZERO � � �f x� x��x� false� true � true
and
ISZERO n " �� � �f x� fn�x��x� false�true
� �x� false�n� true
� �x� false��x� false�n true�
� false
The following perform addition and multiplication on Church numerals�
m " n�� �f x� m f n f x�
m � n�� �f x� m n f� x
Indeed�
m " n � �f x� m f n f x�
� �f x� �f x� fm x� f n f x�
� �f x� �x� fm x� n f x�
� �f x� fm n f x�
� �f x� fm �f x� fn x� f x�
� �f x� fm �x� fn x� x�
� �f x� fm fn x�
� �f x� fmnx
and�
m � n � �f x� m n f� x
� �f x� �f x� fm x� n f� x
���� RECURSIVE FUNCTIONS ��
� �f x� �x� n f�m x� x
� �f x� n f�m x
� �f x� �f x� fn x� f�m x
� �f x� �x� fn x�m x
� �f x� fn�m x
� �f x� fmnx
Although those operations on natural numbers were fairly easy� a �predecessor function is much harder� What is required is a a lambda expression PRE so thatPRE � � � and PRE n " �� � n� Finding such an expression was the �rstpiece of mathematical research by the logician Kleene ������ His trick was� given�f x�fn x� to �throw away one of the applications of f � The �rst step is to de�nea function �PREFN that operates on pairs such that�
PREFN f true� x� � false� x�
and
PREFN f false� x� � false� f x�
Given this function� then PREFN f�n�true� x� � false� fn x�� which isenough to let us de�ne a predecessor function without much di�culty� Here is asuitable de�nition of �PREFN �
PREFN�� �f p� false� if fst p then snd p else fsnd p�
Now we de�ne�
PRE n�� �f x� sndn PREFN f� true� x��
It is left as an exercise to the reader to see that this works correctly�
��� Recursive functions
Being able to de�ne functions by recursion is a central feature of functional pro�gramming� it is the only general way of performing something comparable toiteration� At �rst sight� it appears that we have no way of doing this in lambdacalculus� Indeed� it would seem that the naming of the function is a vital part ofmaking a recursive de�nition� since otherwise� how can we refer to it on the righthand side of the de�nition without descending into in�nite regress� Rather sur�prisingly� it can be done� although as with the existence of a predecessor function�this fact was only discovered after considerable e�ort�
The key is the existence of so�called �xed point combinators� A closed lambdaterm Y is said to be a �xed point or �xpoint� combinator when for all lambda
��CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE
terms f � we have fY f� � Y f � That is� a �xed point combinator� given anyterm f as an argument� returns a �xed point for f � i�e� a term x such thatfx� � x� The �rst such combinator to be found by Curry� is usually called Y �It can be motivated by recalling the Russell paradox� and for this reason is oftencalled the �paradoxical combinator � We de�ned�
R � �x� x x�
and found that�
R R � R R�
That is� R R is a �xed point of the negation operator� So in order to get ageneral �xed point combinator� all we need to do is generalize into whateverfunction is given it as argument� Therefore we set�
Y�� �f� �x� fx x���x� fx x��
It is a simple matter to show that it works�
Y f � �f� �x� fx x���x� fx x��� f
� �x� fx x���x� fx x��
� f�x� fx x���x� fx x���
� fY f�
Though valid from a mathematical point of view� this is not so attractive froma programming point of view since the above was only valid for lambda equality�not reduction in the last line we performed a backwards beta conversion�� Forthis reason the following alternative� due to Turing� may be preferred�
T�� �x y� y x x y�� �x y� y x x y��
The proof that T f �� fT f� is left as an exercise�� However it doesno harm if we simply allow Y f to be beta�reduced throughout the reductionsequence for a recursively de�ned function� Now let us see how to use a �xedpoint combinator say Y � to implement recursive functions� We will take thefactorial function as an example� We want to de�ne a function fact with�
factn� � if ISZERO n then � else n � factPRE n�
The �rst step is to transform this into the following equivalent�
fact � �n� if ISZERO n then � else n � factPRE n�
which in its turn is equivalent to�
���� LET EXPRESSIONS ��
fact � �f n� if ISZERO n then � else n � fPRE n�� fact
This exhibits fact as the �xed point of a function F � namely�
F � �f n� if ISZERO n then � else n � fPRE n�
Consequently we merely need to set fact � Y F � Similar techniques work formutually recursive functions� i�e� a set of functions whose de�nitions are mutuallydependent� A de�nition such as�
f� � F� f� � � � fn
f� � F� f� � � � fn
� � � � � � �
fn � Fn f� � � � fn
can be transformed� using tuples� into a single equation�
f�� f�� � � � � fn� � F� f� � � � fn� F� f� � � � fn� � � � � Fn f� � � � fn�
If we now write t � f�� f�� � � � � fn�� then each of the functions on the right canbe recovered by applying the appropriate selector to t� we have fi � t�i� Afterabstracting out t� this gives an equation in the canonical form t � F t which canbe solved by t � Y F � Now the individual functions can be recovered from t bythe same method of selecting components of the tuple�
��� Let expressions
We ve touted the ability to write anonymous functions as one of the merits oflambda calculus� and shown that names are not necessary to de�ne recursivefunctions� Nevertheless it is often useful to be able to give names to expressions�to avoid tedious repetition of large terms� A simple form of naming can besupported as another syntactic sugar on top of pure lambda calculus�
let x � s in t�� �x� t� s
For example� as we would expect�
let z � � " � in z " z� � �z� z " z� � " �� � � " �� " � " ��
We can achieve the binding of multiple names to expressions either in a serialor parallel fashion� For the �rst� we will simply iterate the above construct� Forthe latter� we will write multiple bindings separated by �and �
��CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE
let x� � s� and � � � and xn � sn in t
and regard it as syntactic sugar for�
�x�� � � � � xn�� t� s�� � � � � sn�
To illustrate the distinction between serial and parallel binding� consider�
let x � � in let x � � in let y � x in x " y
which should yield �� while�
let x � � in let x � � and y � x in x " y
should yield �� We will also allow arguments to names bound by �let � againthis can be regarded as syntactic sugar� with f x� � � � xn � t meaning f ��x� � � � xn� t� Instead of a pre�x �let x � s in t � we will allow a post�x version�which is sometimes more readable�
t where x � s
For example� we might write �y y� where y � � " x � Normally� the �let and �where constructs will be interpreted as above� in a non�recursive fashion�For example�
let x � x� � in � � �
binds x to one less than whatever value it is bound to outside� rather than tryingto bind x to the �xed point of x � x���� However where a recursive interpretationis required� we add the word �rec � writing �let rec or �where rec � e�g�
let rec factn� � if ISZERO n then � else n � factPRE n�
This can be regarded as a shorthand for �let fact � Y F where
F � �f n� if ISZERO n then � else n � fPRE n�
as explained above�
�There is such a �xed point but the lambda term involved has no normal form so is in asense �unde�ned�� The semantics of nonterminating terms is subtle and we will not considerit at length here� Actually the salient question is not whether a term has a normal formbut whether it has a so�called head normal form � see Barendregt ����� and also Abramsky������
���� STEPS TOWARDS A REAL PROGRAMMING LANGUAGE ��
��� Steps towards a real programming language
We have established a fairly extensive system of �syntactic sugar to support ahuman�readable syntax on top of pure lambda calculus� It is remarkable thatthis device is enough to bring us to the point where we can write the de�nitionof the factorial function almost in the way we can in ML� In what sense� though�is the lambda calculus augmented with these notational conventions really aprogramming language�
Ultimately� a program is just a single expression� However� given the use oflet to bind various important subexpressions to names� it is natural instead toview the program as a set of de�nitions of auxiliary functions� followed �nally bythe expression itself� e�g�
let rec factn� � if ISZERO n then � else n � factPRE n� in� � �fact��
We can read these de�nitions of auxiliary functions as equations in the mathe�matical sense� Understood in this way� they do not give any explicit instructionson how to evaluate expressions� or even in which direction the equations areto be used� For this reason� functional programming is often referred to as adeclarative programming method� as also is logic programming e�g� PROLOG���
The program does not incorporate explicit instructions� merely declares certainproperties of the relevant notions� and leaves the rest to the machine�
At the same time� the program is useless� or at least ambiguous� unless themachine somehow reacts sensibly to it� Therefore it must be understood thatthe program� albeit from a super�cial point of view purely declarative� is to beexecuted in a certain way� In fact� the expression is evaluated by expandingall instances of de�ned notions i�e� reading the equations from left to right��and then repeatedly performing ��conversions� That is� although there seems tobe no procedural information in the program� a particular execution strategy isunderstood� In a sense then� the term �declarative is largely a matter of humanpsychology�
Moreover� there must be a de�nite convention for the reduction strategy touse� for we know that di�erent choices of ��redex can lead to di�erent behaviourwith respect to termination� Consequently� it is only when we specify this thatwe really get a de�nite programming language� We will see in later chapters howdi�erent functional languages make di�erent choices here� But �rst we pause toconsider the incorporation of types�
�Apparently Landin preferred �denotative� to �declarative��
��CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE
�� Further reading
Many of the standard texts already cited include details about the matters dis�cussed here� In particular Gordon ��� gives a clear proof that lambda calculusis Turing�complete� and that the lambda calculus analog of the �halting problem whether a term has a normal form� is unsolvable� The in�uence of lambda cal�culus on real programming languages and the evolution of functional languagesare discussed by Hudak �����
Exercises
�� Justify �generalized ��conversion � i�e� prove that�
�x�� � � � � xn�� t�x�� � � � � xn �t�� � � � � tn� � t�t�� � � � � tn
�� De�ne f � g�� �x� fg x�� and recall that I � �x� x� Prove that CURRY �
UNCURRY � I� Is it also true that UNCURRY � CURRY � I��
�� What arithmetical operation does �n f x� fn f x� perform on Churchnumerals� What about �m n� n m�
�� Prove that the following due to Klop� is a �xed point combinator�
��������������������������
where�
��� �abcdefghijklmnopqstuvwxyzr� rthisisafixedpointcombinator�
�� Make a recursive de�nition of natural number subtraction�
�� Consider the following representation of lists� due to Mairson ������
nil � �c n� n
cons � �x l c n� c x l c n�
head � �l� l�x y� x� nil
tail � �l� snd l�x p� cons x fst p�� fst p�� nil� nil��
append � �l� l�� l� cons l�
append lists � �L� L append nil
���� FURTHER READING ��
map � �f l� l �x� cons f x�� nil
length � �l� l�x� SUC��
tack � �x l� l cons cons x nil�
reverse � �l� l tack nil
�lter � �l test� l �x� test x�cons x��y� y�� nil
Why does Mairson call them �Church lists � What do all these functionsdo�
Chapter �
Types
Types are a means of distinguishing di�erent sorts of data� like booleans� naturalnumbers and functions� and making sure that these distinctions are respected�by� for example� ensuring that functions cannot be applied to arguments of thewrong type� Why should we want to add types to lambda calculus� and toprogramming languages derived from it� We can distinguish reasons from logicand reasons from programming�
From the logical point of view� we have seen that the Russell paradox makesit di�cult to extend lambda calculus with set theory without contradiction� TheRussell paradox may well arise because of the peculiar self�referential nature ofthe trick involved� we apply a function to itself� Even if it weren t necessary toavoid paradox� we might feel that intuitively we really have a poor grip of what sgoing on in a system where we can do such things� Certainly applying somefunctions to themselves� e�g� the identity function �x� x or a constant function�x� y� seems quite innocuous� But surely we would have a clearer picture of whatsort of functions lambda terms denote if we knew exactly what their domains andcodomains were� and only applied them to arguments in their domains� Thesewere the sorts of reasons why Russell originally introduced types in PrincipiaMathematica�
Types also arose� perhaps at �rst largely independently of the above con�siderations� in programming languages� and since we view lambda calculus as aprogramming language� this motivates us too� Even FORTRAN distinguishedintegers and �oating point values� One reason was clearly e�ciency� the machinecan generate more e�cient code� and use storage more e�ectively� by knowingmore about a variable� For example the implementation of C s pointer arith�metic varies according to the size of the referenced objects� If p is a pointer toobjects occupying � bytes� then what C programmers write as p"� becomes p"�inside a byte�addressed machine� C s ancestor BCPL is untyped� and is thereforeunable to distinguish pointers from integers� to support the same style of pointerarithmetic it is necessary to apply a scaling at every indirection� a signi�cantpenalty�
�
���� TYPED LAMBDA CALCULUS ��
Apart from e�ciency� as time went by� types also began to be appreciatedmore and more for their value in providing limited static checks on programs�Many programming errors� either trivial typos or more substantial conceptualerrors� are suggested by type clashes� and these can be detected before running theprogram� during the compilation process� Moreover� types often serve as usefuldocumentation for people reading code� Finally� types can be used to achievebetter modularization and data hiding by �arti�cially distinguishing some datastructure from its internal representation�
At the same time� some programmers dislike types� because they �nd therestrictions placed on their programming style irksome� There are untyped pro�gramming languages� imperative ones like BCPL and functional ones like ISWIM�SASL and Erlang� Others� like PL�I� have only weak typing� with the compilerallowing some mixtures of di�erent types and casting them appropriately� Thereare also languages like LISP that perform typechecking dynamically� during exe�cution� rather than statically during compilation� This can at least in principle�degrade runtime performance� since it means that the computer is spending sometime on making sure types are respected during execution � compare checking ofarray bounds during execution� another common runtime overhead� By contrast�we have suggested that static typing� if anything� can improve performance�
Just how restrictive a type system is found depends a lot on the nature of theprogrammer s applications and programming style� Finding a type system thatallows useful static typechecking� while at the same time allowing programmersplenty of �exibility� is still an active research area� The ML type system is animportant step along this road� since it features polymorphism� allowing the samefunction to be used with various di�erent types� This retains strong static type�checking while giving some of the bene�ts of weak or dynamic typing�� Moreover�programmers never need to specify any types in ML � the computer can infera most general type for every expression� and reject expressions that are not ty�pable� We will consider polymorphism in the setting of lambda calculus below�Certainly� the ML type system provides a congenial environment for many kindsof programming� Nevertheless� we do not want to give the impression that it isnecessarily the answer to all programming problems�
��� Typed lambda calculus
It is a fairly straightforward matter to modify lambda calculus with a notion oftype� but the resulting changes� as we shall see� are far�reaching� The basic ideais that every lambda term has a type� and a term s can only be applied to a termt in a combination s t when the types match up appropriately� i�e� s has the typeof a function � � � and t has type �� The result� s t� then has type � � This is�in programming language parlance� strong typing� The term t must have exactly
�As well perhaps as some of the overheads�
�� CHAPTER �� TYPES
the type �� there is no notion of subtyping or coercion� This contrasts with someprogramming languages like C where a function expecting an argument of typefloat or double will accept one of type int and appropriately cast it� Similarnotions of subtyping and coercion can also be added to lambda calculus� but toconsider these here would lead us too far astray�
We will use �t � � to mean �t has type � � This is already the standard notationin mathematics where function spaces are concerned� because f � � � � meansthat f is a function from the set � to the set � � We will think of types as sets inwhich the corresponding objects live� and imagine t � � to mean t � �� Thoughthe reader may like to do likewise� as a heuristic device� we will view typed lambdacalculus purely as a formal system and make the rules independent of any suchinterpretation�
����� The stock of types
The �rst stage in our formalization is to specify exactly what the types are�We suppose �rst of all that we have some set of primitive types� which might�for example� contain bool and int� We can construct composite types fromthese using the function space type constructor� Formally� we make the followinginductive de�nition of the set TyC of types based on a set of primitive types C�
� � C
� � TyC
� � TyC � � TyC� � � � TyC
For example� possible types might be int� bool � bool and int � bool� �int � bool� We assume that the function arrow associates to the right� i�e�� � � � � means � � � � ��� This �ts naturally with the other syntacticconventions connected with currying�
Before long� we shall extend the type system in two ways� First� we willallow so�called type variables as well as primitive type constants � these are thevehicle for polymorphism� Secondly� we will allow the introduction of other typeconstructors besides the function arrow� For example� we will later introduce aconstructor � for the product type� In that case a new clause�
� � TyC � � TyC� � � � TyC
needs to be added to the inductive de�nition� Once we move to the more concretecase of ML� there is the possibility of new� user�de�ned types and type construc�tors� so we will get used to arbitrary sets of constructors of any arity� We willwrite ��� � � � � �n�con for the application of an n�ary type constructor con to thearguments �i� Only in a few special cases like � and � do we use in�x syntax�
���� TYPED LAMBDA CALCULUS ��
for the sake of familiarity�� For example ��list is the type of lists whose elementsall have type ��
An important property of the types� provable because of free inductive gen�eration� is that � � � �� �� In fact� more generally� a type cannot be the sameas any proper syntactic subexpression of itself�� This rules out the possibility ofapplying a term to itself� unless the two instances in question have distinct types�
����� Church and Curry typing
There are two major approaches to de�ning typed lambda calculus� One ap�proach� due to Church� is explicit� A single� type is attached to each term�That is� in the construction of terms� the untyped terms that we have presentedare modi�ed with an additional �eld for the type� In the case of constants� thistype is pre�assigned� but variables may be of any type� The rules for valid termformation are then�
v � �
Constant c has type �
c � �
s � � � � t � �
s t � �
v � � t � �
�v� t � � � �
For our purposes� however� we prefer Curry s approach to typing� which ispurely implicit� The terms are exactly as in the untyped case� and a term mayor may not have a type� and if it does� may have many di�erent types�� Forexample� the identity function �x� x may be given any type of the form � � ��reasonably enough� There are two related reasons why we prefer this approach�First� it provides a smoother treatment of ML s polymorphism� and secondly� itcorresponds well to the actual use of ML� where it is never necessary to specifytypes explicitly�
At the same time� some of the formal details of Curry�style type assignmentare a bit more complex� We do not merely de�ne a relation of typability inisolation� but with respect to a context� i�e� a �nite set of typing assumptionsabout variables� We write�
' � t � �
�Some purists would argue that this isn�t properly speaking �typed lambda calculus� butrather �untyped lambda calculus with a separate notion of type assignment��
�� CHAPTER �� TYPES
to mean �in context '� the term t can be given type � � We will� however� write� t � � or just t � � when the typing judgement holds in the empty context�� Theelements of ' are of the form v � �� i�e� they are themselves typing assumptionsabout variables� typically those that are components of the term� We assumethat ' never gives contradictory assignments to the same variable� if preferredwe can think of it as a partial function from the indexing set of the variablesinto the set of types� The use of the symbol � corresponds to its general use forjudgements in logic of the form ' � � read as � follows from assumptions ' �
����� Formal typability rules
The rules for typing expressions are fairly natural� Remember the interpretationof t � � as �t could be given type � �
v � � � '
' � v � �
Constant c has type �
c � �
' � s � � � � ' � t � �
' � s t � �
' � fv � �g � t � �
' � �v� t � � � �
Once again� this is to be regarded as an inductive de�nition of the typabilityrelation� so a term only has a type if it can be derived by the above rules� Forexample� let us see how to type the identity function� By the rule for variableswe have�
fx � �g � x � �
and therefore by the last rule we get�
� � �x� x � � � �
By our established convention for empty contexts� we write simply �x� x �� � �� This example also illustrates the need for the context in Curry typing�and why it can be avoided in the Church version� Without the context we coulddeduce x � � for any � � and then by the last rule get �x� x � � � � � clearly atvariance with the intuitive interpretation of the identity function� This problemdoesn t arise in Church typing� since in that case either both variables have type�� in which case the rules imply that �x�x � � � �� or else the two x s are actuallydi�erent variables� since their types di�er and the types are a component of theterm� Now� indeed� �x � �� x � �� � � � � � but this is reasonable because the
���� TYPED LAMBDA CALCULUS ��
term is alpha�equivalent to �x � �� y � �� � � � � � In the Curry version of typing�the types are not attached to the terms� and so we need some way of connectinginstances of the same variable�
����� Type preservation
Since the underlying terms of our typed lambda calculus are the same as in theuntyped case� we can take over the calculus of conversions unchanged� However�for this to make sense� we need to check that all the conversions preserve typa�bility� a property called type preservation� This is not di�cult� we will sketcha formal proof for �conversion and leave the other� similar cases to the reader�First we prove a couple of lemmas that are fairly obvious but which need tobe established formally� First� adding new entries to the context cannot inhibittypability�
Lemma ��� If ' � t � � and ' � ( then ( � t � ��Proof� By induction on the structure of t� Formally we are �xing t and provingthe above for all ' and (� since in the inductive step for abstractions� the setsconcerned change� If t is a variable we must have that t � � � '� Thereforet � � � ( and the result follows� If t is a constant the result is immediate sincethe stock of constants and their possible types is independent of the context� Ift is a combination s u then we have� for some type � � that ' � s � � � � and' � u � � � By the inductive hypothesis� ( � s � � � � and ( � u � � and theresult follows� Finally� if t is an abstraction �x� s then we must have� by the lastrule� that � is of the form � � � � and that ' � fx � �g � s � � �� Since ' � (we also have ' � fx � �g � ( � fx � �g� and so by the inductive hypothesis( � fx � �g � s � � �� By applying the rule for abstractions� the result follows�Q�E�D�
Secondly� entries in the context for variables not free in the relevant term maybe discarded�
Lemma ��� If ' � t � �� then 't � t � � where 't contains only entries forvariables free in t� or more formally 't � fx � � j x � � � ' and x � FV t�g�Proof� Again� we prove the above for all ' and the corresponding 't by structuralinduction over t� If t is a variable� then since ' � t � � we must have an entryx � � in the context� But by the �rst rule we know fx � �g � x � � as required�In the case of a constant� then the typing is independent of the context� so theresult holds trivial ly� If t is a combination s u then we have for some � that' � s � � � � and ' � u � � � By the inductive hypothesis� we have 's � s � � � �and 'u � u � � � By the monotonicity lemma� we therefore have 'su � s � � � �and 'su � u � � � since FV s u� � FV s� � FV u�� Hence by the rule forcombinations we get 'su � t � �� Finally� if t has the form �x� s� we must have
�� CHAPTER �� TYPES
' � fx � �g � s � � � with � of the form � � � �� By the inductive hypothesis wehave ' � fx � �g�s � s � � �� and so ' � fx � �g�s � fx � �g � �x� s� � �� Now weneed simply observe that ' � fx � �g�s � fx � �g � 't and appeal once more tomonotonicity� Q�E�D�
Now we come to the main result�
Theorem ��� If ' � t � � and t ���
t�� then also ' � t� � �
Proof� Since by hypothesis t is an �redex� it must be of the form �x�t x� with x ��FV t�� Now� its type can only have arisen by the last typing rule� and therefore� must be of the form � � � �� and we must have fx � �g � t x� � � �� Now� thistyping can only have arisen by the rule for combinations� Since the context canonly give one typing to each variable� we must therefore have fx � �g � t � � � � ��But since by hypothesis� x �� FV t�� the lemma yields � t � � � � � as required�Q�E�D�
Putting together all these results for other conversions� we see that if ' � t � �and t �� t�� then we have ' � t� � �� This is reassuring� since if the evaluationrules that are applied during execution could change the types of expressions� itwould cast doubt on the whole enterprise of static typing�
��� Polymorphism
The Curry typing system already gives us a form of polymorphism� in that agiven term may have di�erent types� Let us start by distinguish between thesimilar concepts of polymorphism and overloading� Both of them mean that anexpression may have multiple types� However in the case of polymorphism� allthe types bear a systematic relationship to each other� and all types following thepattern are allowed� For example� the identity function may have type � � ��or � � � � or � � �� � � � ��� but all the instances have the same structure�By contrast� in overloading� a given function may have di�erent types that bearno structural relation to each other� or only certain instances may be selected�For example� a function " may be allowed to have type int� int� int or typefloat� float� float� but not bool � bool � bool�� Yet a third related notionis subtyping� which is a more principled version of overloading� regarding certaintypes as embedded in others� This� however� turns out to be a lot more complexthan it seems��
�Strachey who coined the term �polymorphism� referred to what we call polymorphism andoverloading as parametric and ad hoc polymorphism respectively�
�For example if � is a subtype of �� should �� � be a subtype or a supertype of �� � ��One can defend either interpretation ��covariant� or �contravariant� in certain situations�
���� POLYMORPHISM ��
����� Let polymorphism
Unfortunately the type system we have so far places some unpleasant restrictionson polymorphism� For example� the following expression is perfectly acceptable�
if �x� x� true then �x� x� � else �
Here is a proof according to the above rules� We assume that the constantscan be typed in the empty environment� and we fold together two successiveapplications of the rule for combinations as a rule for if� remembering that�if b then t� else t� is shorthand for �COND b t� t� �
fx � boolg � x � bool
� �x� x� � bool� bool � true � bool
� �x� x� true � bool
fx � intg � x � int
� �x� x� � int� int � � � int
� �x� x� � � int � � � int
� if �x� x� true then �x� x� � else � � int
The two instances of the identity function get types bool � bool and int� int�But consider the following�
let I � �x� x in if I true then I � else �
According to our de�nitions� this is just syntactic sugar for�
�I� if I true then I � else �� �x� x�
As the reader may readily con�rm� this cannot be typed according to ourrules� There is just one instance of the identity function and it must be given asingle type� This feature is unbearable� since in practical functional programmingwe rely heavily on using let� Unless we change the typing rules� many of thebene�ts of polymorphism are lost� Our solution is to make let primitive� ratherthan interpret it as syntactic sugar� and add the new typing rule�
' � s � � ' � t�s�x � �
' � let x � s in t � �
This rule� which sets up let polymorphism� expresses the fact that� with respectto typing at least� we treat let bindings just as if separate instances of the namedexpression were written out� The additional hypothesis ' � s � � merely ensuresthat s is typable� the exact type being irrelevant� This is to avoid admitting aswell�typed terms such as�
let x � �f� f f in �
Now� for example� we can type the troublesome example�
�� CHAPTER �� TYPES
fx � �g � x � �
� �x� x � � � �
As given above
� if �x� x� true then �x� x� � else � � int
� let I � �x� x in if I true then I � else � � int
����� Most general types
As we have said� some expressions have no type� e�g� �f� f f or �f� f true� f ���Typable expressions normally have many types� though some� e�g� true� haveexactly one� We have already commented that the form of polymorphism in MLis parametric� i�e� all the di�erent types an expression can have bear a structuralsimilarity to each other� In fact more is true� there exists for each typableexpression a most general type or principal type� and all possible types for theexpression are instances of this most general type� Before stating this preciselywe need to establish some terminology�
First� we extend our notion of types with type variables� That is� types canbe built up by applying type constructors either to type constants or variables�Normally we use the letters � and � to represent type variables� and � and �arbitrary types� Given this extension� we can de�ne what it means to substituteone type for a type variable in another type� This is precisely analogous tosubstitution at the term level� and we even use the same notation� For example� � bool��� � ���� � � � �� � bool� The formal de�nition is much simplerthan at the term level� because we do not have to worry about bindings� At thesame time� it is convenient to extend it to multiple parallel substitutions�
�i������� � � � � �n��k � �i if �i �� � for � � i � k
�������� � � � � �n��k � � if �i �� � for � � i � k
��� � � � � �n�con�� � ���� � � � � � �n�� �con
For simplicity we treat type constants as nullary constructors� e�g� consider�int rather than int� but if preferred it is easy to include them as a special case�Given this de�nition of substitution� we can de�ne the notion of a type � beingmore general than a type ��� written � � ���� This is de�ned to be true preciselyif there is a set of substitutions � such that �� � ��� For example we have�
� � �
�� � � � � �
�� bool � � � �� � bool
�This relation is re�exive so we should really say �at least as general as� rather than �moregeneral than��
���� STRONG NORMALIZATION ��
� � � � �� �
�� � �� � � �� � �
Now we come to the main theorem�
Theorem ��� Every typable term has a principal type� i�e� if t � � � then there issome � such that t � � and for any ��� if t � �� then � � ���
It is easy to see that the relation � is a preorder relation� i�e� is re�exiveand transitive� The principal type is not unique� but it is unique up to renamingof type variables� More precisely� if � and � are both principal types for anexpression� then � � � � meaning that � � � and � � ��
We will not even sketch a proof of the principal type theorem� it is not partic�ularly di�cult� but is rather long� What is important to understand is that theproof gives a concrete procedure for �nding a principal type� This procedure isknown as Milners algorithm� or often the Hindley�Milner algorithm�� All imple�mentations of ML� and several other functional languages� incorporate a versionof this algorithm� so expressions can automatically be allocated a principal type�or rejected as ill�typed�
��� Strong normalization
Recall our examples of terms with no normal form� such as�
�x� x x x� �x� x x x��
�� �x� x x x� �x� x x x� �x� x x x��
�� � � ��
In typed lambda calculus� this cannot happen by virtue of the following strongnormalization theorem� whose proof is too long for us to give here�
Theorem �� Every typable term has a normal form� and every possible reduc�tion sequence starting from a typable term terminates�
At �rst sight this looks good � a functional program respecting our typediscipline can be evaluated in any order� and it will always terminate in the uniquenormal form� The uniqueness comes from the Church�Rosser theorem� which is
�The above principal type theorem in the form we have presented it is due to Milner ����� but the theorem and an associated algorithm had earlier been discovered by Hindley in a similarsystem of combinatory logic�
�Weak normalization means the �rst part only i�e� every term can be reduced to normalform but there may still be reduction sequences that fail to terminate�
� CHAPTER �� TYPES
still true in the typed system�� However� the ability to write nonterminatingfunctions is essential to Turing completeness�� so we are no longer able to de�neall computable functions� not even all total ones�
This might not matter if we could de�ne all �practically interesting functions�However this is not so� the class of functions de�nable as typed lambda termsis fairly limited� Schwichtenberg ����� has proved that if Church numerals areused� then the de�nable functions are those that are polynomials or result frompolynomials using de�nition by cases� Note that this is strictly intensional� ifanother numeral representation is used� a di�erent class of functions is obtained�In any case� it is not enough for a general programming language�
Since all de�nable functions are total� we are clearly unable to make arbitraryrecursive de�nitions� Indeed� the usual �xpoint combinators must be untypable�Y � �f� �x� fx x���x� fx x�� clearly isn t well�typed because it applies x toitself and x is bound by a lambda� In order to regain Turing�completeness wesimply add a way of de�ning arbitrary recursive functions that is well�typed� Weintroduce a polymorphic recursion operator with all types of the form�
Rec � � � �� � � � ��� � � � �
and the extra reduction rule that for any F � � � �� � � � �� we have�
Rec F �� F Rec F �
From now on� we will suppose that recursive de�nitions� signalled by �let rec �map down to the use of these recursion operators�
Further reading
Barendregt ���� and Hindley and Seldin ���� also consider typed lambdacalculus� The original paper by Milner ���� is still a good source of informationabout polymorphic typing and the algorithm for �nding principal types� A niceelementary treatment of typed lambda calculus� which gives a proof of StrongNormalization and discusses some interesting connections with logic� is Girard�Lafont� and Taylor ����� This also discusses a more advanced version of typedlambda calculus called �System F where even though strong normalization holds�the majority of �interesting functions are de�nable�
Exercises
�� Is it true that if ' � t � � then for any substitution � we have ' � t � ����
�Given any recursive enumeration of total computable functions we can always create onenot in the list by diagonalization� This will be illustrated in the Computability Theory course�
���� STRONG NORMALIZATION ��
�� Prove formally the type preservation theorems for � and � conversion�
�� Show that the converse to type preservation does not hold� i�e� it is possiblethat t �� t� and ' � t� � � but not ' � t � ��
�� �� Prove that every term of typed lambda calculus with principal type�� �� � �� �� reduces to a Church numeral�
�� �� To what extent can the typechecking process be reversed� i�e� the terminferred from the type� For instance� is it true that in pure typed lambdacalculus with type variables but no constants and no recursion operator�that any t � � � � � � is in fact equal to K � �x y� x in the usual senseof lambda equality� If so� how far can this be generalized��
�� �� We will say that an arbitrary reduction relation �� is weakly Church�Rosser if whenever t �� t� and t �� t�� then there is a u with t� ��
� uand t� ��� u� where ��� is the re�exive transitive closure of ��� ProveNewmans Lemma� which states that if a relation is weakly Church�Rosserand obeys strong normalization� then ��� is Church�Rosser� Hint� use aform of wellfounded induction����
See Mairson ����� for more on this question��If you get stuck there is a nice proof in Huet ������
Chapter �
A taste of ML
In the last few chapters� we have started with pure lambda calculus and system�atically extended it with new primitive features� For example we added the �let construct as primitive for the sake of making the polymorphic typing more useful�and a recursion operation in order to restore the computational power that waslost when adding types� By travelling further along this path� we will eventuallyarrive at ML� although it s still valuable to retain the simple worldview of typedlambda calculus�
The next stage is to abandon the encodings of datatypes like booleans andnatural numbers via lambda terms� and instead make them primitive� That is� wehave new primitive types like bool and int actually positive and negative inte�gers� and new type constructors like �� a step we have in many ways anticipatedin the last chapter� Associated with these are new constants and new conversionrules� For example� the expression � " � is evaluated using machine arithmetic�rather than by translation into Church numerals and performing ��conversions�These additional conversions� regarded as augmenting the usual lambda opera�tions� are often referred to as ���conversions � We will see in due course how theML language is extended in other ways in comparison with pure lambda calculus�First� we must address the fundamental question of ML s evaluation strategy�
�� Eager evaluation
We have said that from a theoretical point of view� normal order top downleft�to�right� reduction of expressions is to be preferred� since if any strategyterminates� this one will�� However it has some practical defects� For example�consider the following�
�x� x " x " x� �� " ��
�This strategy is familiar from a few traditional languages like Algol � where it is referredto as call by name�
��
���� EAGER EVALUATION ��
A normal order reduction results in �� " �� " �� " �� " �� " ��� and thefollowing reduction steps need to evaluate three separate instances of the sameexpression� This is unacceptable in practice� There are two main solutions tothis problem� and these solutions divide the world of functional programminglanguages into two camps�
The �rst alternative is to stick with normal order reduction� but attemptto optimize the implementation so multiple subexpressions arising in this wayare shared� and never evaluated more than once� Internally� expressions arerepresented as directed acyclic graphs rather than as trees� This is known as lazyor call�by�need evaluation� since expressions are only evaluated when absolutelynecessary�
The second approach is to turn the theoretical considerations about reductionstrategy on their head and evaluate arguments to functions before passing thevalues to the function� This is known as applicative order or eager evaluation�The latter name arises because function arguments are evaluated even when theymay be unnecessary� e�g� t in �x� y� t� Of course� eager evaluation means thatsome expressions may loop that would terminate in a lazy regime� But this isconsidered acceptable� since these instances are generally easy to avoid in practice�In any case� an eager evaluation strategy is the standard in many programminglanguages like C� where it is referred to as call by value�
ML adopts eager evaluation� for two main reasons� Choreographing the re�ductions and sharings that occur in lazy evaluation is quite tricky� and implemen�tations tend to be relatively ine�cient and complicated� Unless the programmeris very careful� memory can �ll up with pending unevaluated expressions� andin general it is hard to understand the space behaviour of programs� In factmany implementations of lazy evaluation try to optimize it to eager evaluationin cases where there is no semantic di�erence�� By contrast� in ML� we always�rst evaluate the arguments to functions and only then perform the ��reduction� this is simple and e�cient� and is easy to implement using standard compilertechnology�
The second reason for preferring applicative order evaluation is that ML isnot a pure functional language� but includes imperative features variables� as�signments etc��� Therefore the order of evaluation of subexpressions can make asemantic di�erence� rather than merely a�ecting e�ciency� If lazy evaluation isused� it seems to become practically impossible for the programmer to visualize�in a nontrivial program� exactly when each subexpression gets evaluated� In theeager ML system� one just needs to remember the simple evaluation rule�
It is important to realize� however� that the ML evaluation strategy is notsimply bottom�up� the exact opposite of normal order reduction� In fact� MLnever evaluates underneath a lambda� In particular� it never reduces �redexes�
�They try to perform strictness analysis a form of static analysis that can often discoverthat arguments are necessarily going to be evaluated �Mycroft �����
�� CHAPTER �� A TASTE OF ML
only ��redexes�� In evaluating �x� s�x � t� t is evaluated �rst� However s�x isnot touched since it is underneath a lambda� and any subexpressions of t thathappen to be in the scope of a lambda are also left alone� The precise evaluationrules are as follows�
� Constants evaluate to themselves�
� Evaluation stops immediately at ��abstractions and does not look insidethem� In particular� there is no �conversion�
� When evaluating a combination s t� then �rst both s and t are evaluated�Then� assuming that the evaluated form of s is a lambda abstraction� atoplevel ��conversion is performed and the process is repeated�
The order of evaluation of s and t varies between versions of ML� In theversion we will use� t is always evaluated �rst� Strictly� we should also specifya rule for let expressions� since as we have said they are by now accepted as aprimitive� However from the point of view of evaluation they can be understoodin the original way� as a lambda applied to an argument� with the understandingthat the rand will be evaluated �rst� To make this explicit� the rule for
let x � s in t
is that �rst of all s is evaluated� the resulting value is substituted for x in t� and�nally the new version of t is evaluated� Let us see some examples of evaluatingexpressions�
�x� �y� y " y� x�� " �� �� �x� �y� y " y� x��
�� �y� y " y��
�� � " �
��
Note that the subterm �y� y" y� x is not reduced� since it is within the scopeof a lambda� However� terms that are reducible and not enclosed by a lambda inboth function and argument get reduced before the function application itself isevaluated� e�g� the second step in the following�
�f x� f x� �y� y " y�� � " �� �� �f x� f x� �y� y " y�� �
�� �x� �y� y " y� x� �
�� �y� y " y� �
�� � " �
���� CONSEQUENCES OF EAGER EVALUATION ��
The fact that ML does not evaluate under lambdas is of crucial importanceto advanced ML programmers� It gives precise control over the evaluation ofexpressions� and can be used to mimic many of the helpful cases of lazy evaluation�We shall see a very simple instance in the next section�
�� Consequences of eager evaluation
The use of eager evaluation forces us to make additional features primitive� withtheir own special reduction procedures� rather than implementing them directlyin terms of lambda calculus� In particular� we can no longer regard the conditionalconstruct�
if b then e� else e�
as the application of an ordinary ternary operator�
COND b e� e�
The reason is that in any particular instance� because of eager evaluation�we evaluate all the expressions b� e� and e� �rst� before evaluating the body ofthe COND� Usually� this is disastrous� For example� recall our de�nition of thefactorial function�
let rec factn� � if ISZERO n then � else n � factPRE n�
If the conditional evaluates all its arguments� then in order to evaluate fact���the �else arm must be evaluated� in its turn causing factPRE �� to be evalu�ated� This in its turn causes the evaluation of factPRE PRE ���� and so on�Consequently the computation will loop inde�nitely�
Accordingly� we make the conditional a new primitive construct� and modifythe usual reduction strategy so that �rst the boolean expression is evaluated� andonly then is exactly one of the two arms evaluated� as appropriate�
What about the process of recursion itself� We have suggested understandingrecursive de�nitions in terms of a recursion operator Rec with its own reductionrule�
Rec f �� fRec f�
This would also loop using an eager evaluation strategy�
Rec f �� fRec f� �� ffRec f�� �� fffRec f��� �� � � �
However we only need a very simple modi�cation to the reduction rule tomake things work correctly�
�� CHAPTER �� A TASTE OF ML
Rec f �� f�x� Rec f x�
Now the lambda on the right means that �x� Rec f x evaluates to itself� andso only after the expression has been further reduced following substitution inthe body of f will evaluation continue�
�� The ML family
We have been talking about �ML as if it were a single language� In fact thereare many variants of ML� even including �Lazy ML � an implementation fromChalmers University in Sweden that is based on lazy evaluation� The most popu�lar version of ML in education is �Standard ML � but we will use another version�called CAML �camel � Light�� We chose CAML Light for several reasons�
� The implementation is small and highly portable� so can be run e�ectivelyon Unix machines� PCs� Macs etc�
� The system is simple� syntactically and semantically� making it relativelyeasy to approach�
� The system is well suited to practical programming� e�g� it can easily beinterfaced to C and supports standard make�compatible separate compila�tion�
However� we will be teaching quite general techniques� and any code we writecan be run with a few minor syntactic changes in any version of ML� and indeedoften in other functional languages�
�� Starting up ML
ML has already been installed on Thor� In order to use it� you need to put theCAML binary directory on your PATH� This can be done assuming you use the�bash shell or one of its family�� as follows�
PATH���PATH��home�jrh���caml�bin�export PATH
To avoid doing this every time you log on� you can insert these lines near theend of your �bash profile� or the equivalent for your favourite shell� Now touse CAML in its normal interactive and interpretive mode� you simply need totype camllight� and the system should �re up and present its prompt �� ��
�The name stands for �Categorical Abstract Machine� the underlying implementationmethod�
���� INTERACTING WITH ML ��
� camllight Caml Light version ���
In order to exit the system� simply type ctrl�d or quit�� at the prompt�If you are interested in installing CAML Light on your own machine� you shouldconsult the following Web page for detailed information�
http���pauillac�inria�fr�caml�
� Interacting with ML
When ML presents you with its prompt� you can type in expressions� terminatedby two successive semicolons� and it will evaluate them and print the result� Incomputing jargon� the ML system sits in a read�eval�print loop� it repeatedlyreads an expression� evaluates it� and prints the result� For example� ML can beused as a simple calculator�
� � ���� � int � ��
The system not only returns the answer� but also the type of the expression�which it has inferred automatically� It can do this because it knows the typeof the built�in addition operator �� On the other hand� if an expression is nottypable� the system will reject it� and try to give some idea about how the typesfail to match up� In complicated cases� the error messages can be quite tricky tounderstand�
� � true��Toplevel input�let it � � � true�� ����This expression has type bool�but is used with type int�
Since ML is a functional language� expressions are allowed to have functiontype� The ML syntax for a lambda abstraction �x� t�x is fun x �� t�x�� Forexample we can de�ne the successor function�
fun x � x � ���� � int � int � �fun
�� CHAPTER �� A TASTE OF ML
Again� the type of the expression� this time int �� int� is inferred and dis�played� However the function itself is not printed� the system merely writes�fun�� This is because� in general� the internal representations of functions arenot very readable�� Functions are applied to arguments just as in lambda calcu�lus� by juxtaposition� For example�
�fun x � x � �� ���� � int � �
Again as in lambda calculus� function application associates to the left� andyou can write curried functions using the same convention of eliding multiple � si�e� fun s�� For example� the following are all equivalent�
��fun x � �fun y � x � y�� �� ���� � int � � �fun x � fun y � x � y� � ���� � int � � �fun x y � x � y� � ���� � int � �
� Bindings and declarations
It is not convenient� of course� to evaluate the whole expression in one go� rather�we want to use let to bind useful subexpressions to names� This can be done asfollows�
let successor � fun x � x � � insuccessor�successor�successor ����� � int � �
Function bindings may use the more elegant sugaring�
let successor x � x � � insuccessor�successor�successor ����� � int � �
and can be made recursive just by adding the rec keyword�
let rec fact n � if n � then �else n � fact�n � �� in
fact ���� � int � ��
By using and� we can make several binding simultaneously� and de�ne mutu�ally recursive functions� For example� here are two simple� though highly ine��cient� functions to decide whether or not a natural number is odd or even�
�CAML does not store them simply as syntax trees but compiles them into bytecode�
��� BINDINGS AND DECLARATIONS ��
let rec even n � if n � then true else odd �n � ��and odd n � if n � then false else even �n � ����
even � int � bool � �funodd � int � bool � �fun even ����� � bool � true odd ����� � bool � false
In fact� any bindings can be performed separately from the �nal application�ML remembers a set of variable bindings� and the user can add to that set inter�actively� Simply omit the in and terminate the phrase with a double semicolon�
let successor � fun x � x � ���successor � int � int � �fun
After this declaration� any later phrase may use the function successor� e�g�
successor ����� � int � ��
Note that we are not making assignments to variables� Each binding is onlydone once when the system analyses the input� it cannot be repeated or modi��ed� It can be overwritten by a new de�nition using the same name� but this isnot assignment in the usual sense� since the sequence of events is only connectedwith the compilation process� not with the dynamics of program execution� In�deed� apart from the more interactive feedback from the system� we could equallyreplace all the double semicolons after the declarations by in and evaluate ev�erything at once� On this view we can see that the overwriting of a declarationreally corresponds to the de�nition of a new local variable that hides the outerone� according to the usual lambda calculus rules� For example�
let x � ���x � int � � let y � ���y � int � � let x � ���x � int � � x � y��� � int � �
is the same as�
let x � � inlet y � � inlet x � � inx � y��
� � int � �
� CHAPTER �� A TASTE OF ML
Note carefully that� also following lambda calculus� variable binding is static�i�e� the �rst binding of x is still used until an inner binding occurs� and any usesof it until that point are not a�ected by the inner binding� For example�
let x � ���x � int � � let f w � w � x��f � int � int � �fun let x � ���x � int � � f ��� � int � �
The �rst version of LISP� however� used dynamic binding� where a rebindingof a variable propagated to earlier uses of the variable� so that the analogous se�quence to the above would return �� This was in fact originally regarded as a bug�but soon programmers started to appreciate its convenience� It meant that whensome low�level function was modi�ed� the change propagated automatically to allapplications of it in higher level functions� without the need for recompilation�The feature survived for a long time in many LISP dialects� but eventually theview that static binding is better prevailed� In Common LISP� static binding isthe default� but dynamic binding is available if desired via the keyword special�
�� Polymorphic functions
We can de�ne polymorphic functions� like the identity operator�
let I � fun x � x��I � �a � �a � �fun
ML prints type variables as �a� �b etc� These are supposed to be ASCIIrepresentations of �� � and so on� We can now use the polymorphic functionseveral times with di�erent types�
I true��� � bool � true I ���� � int � � I I I I ����� � int � ��
Each instance of I in the last expression has a di�erent type� and intuitivelycorresponds to a di�erent function� In fact� let s de�ne all the combinators�
��� POLYMORPHIC FUNCTIONS ��
let I x � x��I � �a � �a � �fun let K x y � x��K � �a � �b � �a � �fun let S f g x � �f x� �g x���S � ��a � �b � �c� � ��a � �b� � �a � �c � �fun
Note that the system keeps track of the types for us� even though in the lastcase they were quite complicated� Now� recall that I � S K K� let us try thisout in ML��
let I� � S K K��I� � ��a � ��a � �fun
It has the right type�� and it may easily be checked in all concrete cases� e�g�
I� � � ���� � bool � true
In the above examples of polymorphic functions� the system very quickly infersa most general type for each expression� and the type it infers is simple� Thisusually happens in practice� but there are pathological cases� e�g� the followingexample due to Mairson ������ The type of this expression takes about ��seconds to calculate� and occupies over ���� lines on an ��column terminal�
let pair x y � fun z � z x y inlet x� � fun y � pair y y inlet x� � fun y � x��x� y� inlet x� � fun y � x��x� y� inlet x� � fun y � x��x� y� inlet x� � fun y � x��x� y� inx��fun z � z���
We have said that the ML programmer need never enter a type� This is truein the sense that ML will already allocate as general a type as possible to anexpression� However it may sometimes be convenient to restrict the generality ofa type� This cannot make code work that didn t work before� but it may serve asdocumentation regarding the intended purpose of the code� it is also possible touse shorter synonyms for complicated types� Type restriction can be achieved inML by adding type annotations after some expressions�� These type annotationsconsist of a colon followed by a type� It usually doesn t matter exactly wherethese annotations are added� provided they enforce the appropriate constraints�For example� here are some alternative ways of constraining the identity functionto type int �� int�
�We remarked that from the untyped point of view S K A � I for any A� However thereader may try for example S K S and see that the principal type is less general than expected�
�Ignore the underscores for now� This is connected with the typing of imperative features and we will discuss it later�
�� CHAPTER �� A TASTE OF ML
let I �x�int� � x��I � int � int � �fun let I x � �x�int���I � int � int � �fun let �I�int�int� � fun x � x��I � int � int � �fun let I � fun �x�int� � x��I � int � int � �fun let I � ��fun x � x��int�int���I � int � int � �fun
�� Equality of functions
Instead of comparing the actions of I and I � on particular arguments like �� itwould seem that we can settle the matter de�nitively by comparing the functionsthemselves� However this doesn t work�
I� � I��Uncaught exception� Invalid�argument �equal� functional value�
It is in general forbidden to compare functions for equality� though a fewspecial instances� where the functions are obviously the same� yield true�
let f x � x � ���f � int � int � �fun let g x � x � ���g � int � int � �fun f � f��� � bool � true f � g��Uncaught exception� Invalid�argument �equal� functional value� let h � g��h � int � int � �fun h � f��Uncaught exception� Invalid�argument �equal� functional value� h � g��� � bool � true
Why these restrictions� Aren t functions supposed to be �rst�class objects inML� Yes� but unfortunately� extensional� function equality is not computable�This follows from a number of classic theorems in recursion theory� such as theunsolvability of the halting problem and Rices theorem� Let us give a concreteillustration of why this might be so� It is still an open problem whether the
�You will see these results proved in the Computation Theory course� Rice�s theorem is anextremely strong undecidability result which asserts that any nontrivial property of the functioncorresponding to a program is uncomputable from its text� An excellent computation theorytextbook is Davis Sigal and Weyuker ������
���� EQUALITY OF FUNCTIONS ��
following function terminates for all arguments� the assertion that it does beingknown as the Collatz conjecture��
let rec collatz n �if n �� � then else if even�n� then collatz�n � ��else collatz�� � n � ����
collatz � int � int � �fun
What is clear� though� is that if it does halt it returns �� Now consider thefollowing trivial function�
let f �x�int� � ��f � int � int � �fun
By deciding the equation collatz � f� the computer would settle the Collatzconjecture� It is easy to concoct other examples for open mathematical problems�
It is possible to trap out applications of the equality operator to functions anddatatypes built up from them as part of typechecking� rather than at runtime�Types that do not involve functions in these ways are known as equality types�since it is always valid to test objects of such types for equality� On the negativeside� this makes the type system much more complicated� However one mightargue that static typechecking should be extended as far as feasibility allows�
Further reading
Many books on functional programming bear on the general issues we have dis�cussed here� such as evaluation strategy� A good elementary introduction toCAML Light and functional programming is Mauny ������ Paulson ����� isanother good textbook� though based on Standard ML�
Exercises
�� Suppose that a �conditional function de�ned by ite�b�x�y� � if b then
x else y is the only function available that operates on arguments of typebool� Is there any way to write a factorial function�
�� Use the typing rules in the last chapter to write out a formal proof that theS combinator has the type indicated by the ML system�
�A good survey of this problem and attempts to solve it is given by Lagarias ������Strictly we should use unlimited precision integers rather than machine arithmetic� We willsee later how to do this�
�� CHAPTER �� A TASTE OF ML
�� Write a simple recursively de�ned function to perform exponentiation ofintegers� i�e� calculate xn for n � �� Write down two ML functions� theequality of which corresponds to the truth of Fermat s last theorem� thereare no integers x� y� z and natural number n � � with xn " yn � zn exceptfor the trivial case where x � � or y � ��
Chapter �
Further ML
In this chapter� we consolidate the previous examples by specifying the basicfacilities of ML and the syntax of phrases more precisely� and then go on to treatsome additional features such as recursive types� We might start by saying moreabout interaction with the system�
So far� we have just been typing phrases into ML s toplevel read�eval�printloop and observing the result� However this is not a good method for writingnontrivial programs� Typically� you should write the expressions and declarationsin a �le� To try things out as you go� they can be inserted in the ML window using�cut and paste � This operation can be performed using X�windows and similarsystems� or in an editor like Emacs with multiple bu�ers� However� this becomeslaborious and time�consuming for large programs� Instead� you can use ML sinclude function to read in the �le directly� For example� if the �le myprog�ml
contains�
let pythag x y z �x � x � y � y � z � z��
pythag � � ���
pythag � �� ����
pythag � � ���
then the toplevel phrase include �myprog�ml� results in�
include �myprog�ml���pythag � int � int � int � bool � �fun� � bool � true� � bool � true� � bool � false� � unit � ��
��
�� CHAPTER � FURTHER ML
That is� the ML system responds just as if the phrases had been entered atthe top level� The �nal line is the result of evaluating the include expressionitself�
In large programs� it is often helpful to include comments� In ML� these arewritten between the symbols � and �� e�g�
�� ������������������������������������������������������ ���� This function tests if �x�y�z� is a Pythagorean triple ���� ������������������������������������������������������ ��
let pythag x y z �x � x � y � y � z � z��
��comments�� pythag ��can�� � ��go�� � ��almost�� � ��anywhere���� and �� can �� be �� nested �� quite �� arbitrarily �� ����
�� Basic datatypes and operations
ML features several built�in primitive types� From these� composite types maybe built using various type constructors� For the moment� we will only use thefunction space constructor �� and the Cartesian product constructor � but wewill see in due course which others are provided� and how to de�ne new typesand type constructors� The primitive types that concern us now are�
� The type unit� This is a ��element type� whose only element is written��� Obviously� something of type unit conveys no information� so it iscommonly used as the return type of imperatively written �functions thatperform a side�e�ect� such as include above� It is also a convenient argu�ment where the only use of a function type is to delay evaluation�
� The type bool� This is a ��element type of booleans truth�values� whoseelements are written true and false�
� The type int� This contains some �nite subset of the positive and negativeintegers� Typically the permitted range is from ���� ����������� up to����� ������������ The numerals are written in the usual way� optionallywith a negation sign� e�g� �� ��� ����
� The type string contains strings i�e� �nite sequences� of characters� Theyare written and printed between double quotes� e�g� �hello�� In order toencode include special characters in strings� C�like escape sequences areused� For example� � is the double quote itself� and n is the newlinecharacter�
�This is even more limited than expected for machine arithmetic because a bit is stolen forthe garbage collector� We will see later how to use an alternative type of integers with unlimitedprecision�
��� BASIC DATATYPES AND OPERATIONS ��
The above values like ��� false� � and �caml� are all to be regarded asconstants in the lambda calculus sense� There are other constants correspondingto operations on the basic types� Some of these may be written as in�x operators�for the sake of familiarity� These have a notion of precedence so that expressionsare grouped together as one would expect� For example� we write x � y ratherthan � x y and x � � y � z rather than � x �� � � y� z�� The logicaloperator not also has a special parsing status� in that the usual left�associativityrule is reversed for it� not not p means not �not p�� User�de�ned functionsmay be granted in�x status via the �infix directive� For example� here is ade�nition of a function performing composition of functions�
let successor x � x � ���successor � int � int � �fun let o f g � fun x � f�g x���o � ��a � �b� � ��c � �a� � �c � �b � �fun let add� � o successor �o successor successor���add� � int � int � �fun add� ��� � int � � infix �o��� let add�� � successor o successor o successor��add�� � int � int � �fun add�� ��� � int � �
It is not possible to specify the precedence of user�de�ned in�xes� nor to makeuser�de�ned non�in�x functions right�associative� Note that the implicit opera�tion of �function application has a higher precedence than any binary operator�so successor � parses as �successor � �� If it is desired to use a func�tion with special status as an ordinary constant� simply precede it by prefix�For example�
o successor successor��Toplevel input�o successor successor���Syntax error� prefix o successor successor��� � int � int � �fun �prefix o� successor successor��� � int � int � �fun
With these questions of concrete syntax out of the way� let us present asystematic list of the operators on the basic types above� The unary operatorsare�
Operator Type Meaning� int �� int Numeric negationnot bool �� bool Logical negation
�� CHAPTER � FURTHER ML
and the binary operators� in approximately decreasing order of precedence� are�
Operator Type Meaningmod int �� int �� int Modulus remainder� int �� int �� int Multiplication� int �� int �� int Truncating division� int �� int �� int Addition� int �� int �� int Subtraction! string �� string �� string String concatenation� �a �� �a �� bool Equality�� �a �� �a �� bool Inequality� �a �� �a �� bool Less than�� �a �� �a �� bool Less than or equal� �a �� �a �� bool Greater than�� �a �� �a �� bool Greater than or equal" bool �� bool �� bool Boolean �and or bool �� bool �� bool Boolean �or
For example� x � � " x � is parsed as " �� x �� �� x �� Note thatall the comparisons� not just the equality relation� are polymorphic� They notonly order integers in the expected way� and strings alphabetically� but all otherprimitive types and composite types in a fairly natural way� Once again� however�they are not in general allowed to be used on functions�
The two boolean operations " and or have their own special evaluation strat�egy� like the conditional expression� In fact� they can be regarded as synonymsfor conditional expressions�
p ) q�� if p then q else false
p or q�� if p then true else q
Thus� the �and operation evaluates its �rst argument� and only if it is true�evaluates its second� Conversely� the �or operation evaluates its �rst argument�and only if it is false evaluates its second�
�� Syntax of ML phrases
Expressions in ML can be built up from constants and variables� any identi�erthat is not currently bound is treated as a variable� Declarations bind namesto values of expressions� and declarations can occur locally inside expressions�Thus� the syntax classes of expressions and declarations are mutually recursive�We can represent this by the following BNF grammar��
�We neglect many constructs that we won�t be concerned with� A few will be introducedlater� See the CAML manual for full details�
��� SYNTAX OF ML PHRASES ��
expression ��� variable
j constant
j expression expression
j expression infix expression
j not expression
j if expression then expression else expression
j fun pattern �� expression
j expression�
j declaration in expression
declaration ��� let let bindings
j let rec let bindings
let bindings ��� let binding
j let binding and let bindings
let binding ��� pattern � expression
pattern ��� variables
variables ��� variable
j variable variables
The syntax class pattern will be expanded and explained more thoroughlylater on� For the moment� all the cases we are concerned with are either justvariable or variable variable � � � variable� In the �rst case we simply bind an ex�pression to a name� while the second uses the special syntactic sugar for functiondeclarations� where the arguments are written after the function name to the leftof the equals sign� For example� the following is a valid declaration of a functionadd�� which can be used to add � to its argument�
let add� x �let y � successor x inlet z � let w � successor y in
successor w insuccessor z��
add� � int � int � �fun add� ���� � int � �
It is instructive to unravel this declaration according to the above grammar�A toplevel phrase� terminated by two successive semicolons� may be either anexpression or a declaration�
� CHAPTER � FURTHER ML
�� Further examples
It is easy to de�ne by recursion a function that takes a positive integer n and afunction f and returns fn� i�e� f � � � � � f n times��
let rec funpow n f x �if n � then xelse funpow �n � �� f �f x���
funpow � int � ��a � �a� � �a � �a � �fun
In fact� a little thought will show that the function funpow takes a machineinteger n and returns the Church numeral corresponding to n� Since functionsaren t printed� we can t actually look at the lambda expression representing aChurch numeral�
funpow ���� � ���a � ��a� � ��a � ��a � �fun
However it is straightforward to de�ne an inverse function to funpow thattakes a Church numeral back to a machine integer�
let defrock n � n �fun x � x � �� ��defrock � ��int � int� � int � �a� � �a � �fun defrock�funpow ������ � int � ��
We can try out some of the arithmetic operations on Church numerals�
let add m n f x � m f �n f x���add � ��a � �b � �c� � ��a � �d � �b� � �a � �d � �c � �fun let mul m n f x � m �n f� x��mul � ��a � �b � �c� � ��d � �a� � �d � �b � �c � �fun let exp m n f x � n m f x��exp � �a � ��a � �b � �c � �d� � �b � �c � �d � �fun let test bop x y � defrock �bop �funpow x� �funpow y����test �����a � �a� � �a � �a� ����b � �b� � �b � �b� � �int � int� � int � �c� �int � int � �c � �fun
test add � ���� � int � �� test mul � ���� � int � � test exp � ���� � int � ���
The above is not a very e�cient way of performing arithmetic operations�ML does not have a function for exponentiation� but it is easy to de�ne one byrecursion�
��� FURTHER EXAMPLES ��
let rec exp x n �if n � then �else x � exp x �n � ����
exp � int � int � int � �fun
However this performs n multiplications to calculate xn� A more e�cient wayis to exploit the facts that x�n � xn�� and x�n� � xxn�� as follows�
let square x � x � x��square � int � int � �fun let rec exp x n �
if n � then �else if n mod � � then square�exp x �n � ���else x � square�exp x �n � �����
exp � int � int � int � �fun infix �exp��� � exp ���� � int � ��� � exp ���� � int � ������
Another classic operation on natural numbers is to �nd their greatest commondivisor highest common factor� using Euclid s algorithm�
let rec gcd x y �if y � then x else gcd y �x mod y���
gcd � int � int � int � �fun gcd � ����� � int � � gcd � ������ � int � � gcd �� ���� � int � ��
We have used a notional recursion operator Rec to explain recursive de�ni�tions� We can actually de�ne such a thing� and use it�
let rec Rec f � f�fun x � Rec f x���Rec � ���a � �b� � �a � �b� � �a � �b � �fun let fact � Rec �fun f n � if n � then � else n � f�n � �����fact � int � int � �fun fact ���it � int � �
Note� however� that the lambda was essential� otherwise the expression Rec
f goes into an in�nite recursion before it is even applied to its argument�
let rec Rec f � f�Rec f���Rec � ��a � �a� � �a � �fun let fact � Rec �fun f n � if n � then � else n � f�n � �����Uncaught exception� Out�of�memory
�� CHAPTER � FURTHER ML
�� Type de�nitions
We have promised that ML has facilities for declaring new type constructors�so that composite types can be built up out of existing ones� In fact� ML goesfurther and allows a composite type to be built up not only out of preexistingtypes but also from the composite type itself� Such types� naturally enough� aresaid to be recursive� They are declared using the type keyword followed by anequation indicating how the new type is built up from existing ones and itself�We will illustrate this by a few examples� The �rst one is the de�nition of a sumtype� intended to correspond to the disjoint union of two existing types�
type ��a��b�sum � inl of �a � inr of �b��Type sum defined�
Roughly� an object of type ��a��b�sum is either something of type �a orsomething of type �b� More formally� however� all these things have di�erenttypes� The type declaration also declares the so�called constructors inl and inr�These are functions that take objects of the component types and inject theminto the new type� Indeed� we can see their types in the ML system and applythem to objects�
inl��� � �a � ��a� �b� sum � �fun inr��� � �a � ��b� �a� sum � �fun inl ���� � �int� �a� sum � inl � inr false��� � ��a� bool� sum � inr false
We can visualize the situation via the following diagram� Given two existingtypes � and �� the type �� ��sum is composed precisely of separate copies of �and �� and the two constructors map onto the respective copies�
�
�
�� ��sum
������
������
��
inl
inr
XXXXXXXXXXXXXz
��� TYPE DEFINITIONS ��
This is similar to a union in C� but in ML the copies of the component typesare kept apart and one always knows which of these an element of the unionbelongs to� By contrast� in C the component types are overlapped� and theprogrammer is responsible for this book�keeping�
����� Pattern matching
The constructors in such a de�nition have three very important properties�
� They are exhaustive� i�e� every element of the new type is obtainable eitherby inl x for some x or inr y for some y� That is� the new type containsnothing besides copies of the component types�
� They are injective� i�e� an equality test inl x � inl y is true if and onlyif x � y� and similarly for inr� That is� the new type contains a faithfulcopy of each component type without identifying any elements�
� They are distinct� i�e� their ranges are disjoint� More concretely this meansin the above example that inl x � inr y is false whatever x and y mightbe� That is� the copy of each component type is kept apart in the new type�
The second and third properties of constructors justify our using patternmatching� This is done by using more general varstructs as the arguments ina lambda� e�g�
fun �inl n� � n �� �inr b� � b��
� � �int� bool� sum � bool � �fun
This function has the property� naturally enough� that when applied to inl n itreturns n � � and when applied to inr b it returns b� It is precisely because ofthe second and third properties of the constructors that we know this does givea wellde�ned function� Because the constructors are injective� we can uniquelyrecover n from inl n and b from inr b� Because the constructors are distinct�we know that the two clauses cannot be mutually inconsistent� since no value cancorrespond to both patterns�
In addition� because the constructors are exhaustive� we know that each valuewill fall under one pattern or the other� so the function is de�ned everywhere�Actually� it is permissible to relax this last property by omitting certain patterns�though the ML system then issues a warning�
fun �inr b� � b��Toplevel input�fun �inr b� � b������������������Warning� this matching is not exhaustive�� � ��a� �b� sum � �b � �fun
�� CHAPTER � FURTHER ML
If this function is applied to something of the form inl x� then it will notwork�
let f � fun �inr b� � b��Toplevel input�let f � fun �inr b� � b�� ����������������Warning� this matching is not exhaustive�f � ��a� �b� sum � �b � �fun f �inl ����Uncaught exception� Match�failure ���� ���� ����
Though booleans are built into ML� they are e�ectively de�ned by a rathertrivial instance of a recursive type� often called an enumerated type� where theconstructors take no arguments�
type bool � false � true��
Indeed� it is perfectly permissible to de�ne things by matching over the truthvalues� The following two phrases are completely equivalent�
if � � � then � else ��� � int � �fun true � � � false � � �� � ����� � int �
Pattern matching is� however� not limited to casewise de�nitions over elementsof recursive types� though it is particularly convenient there� For example� wecan de�ne a function that tells us whether an integer is zero as follows�
fun � true � n � false��� � int � bool � �fun �fun � true � n � false� ��� � bool � true �fun � true � n � false� ���� � bool � false
In this case we no longer have mutual exclusivity of patterns� since � matcheseither pattern� The patterns are examined in order� one by one� and the �rstmatching one is used� Note carefully that unless the matches are mutually ex�clusive� there is no guarantee that each clause holds as a mathematical equation�For example in the above� the function does not return false for any n� so thesecond clause is not universally valid�
Note that only constructors may be used in the above special way as com�ponents of patterns� Ordinary constants will be treated as new variables boundinside the pattern� For example� consider the following�
��� TYPE DEFINITIONS ��
let true�� � true��true�� � bool � true let false�� � false��false�� � bool � false �fun true�� � � � false�� � � �� � ����Toplevel input��fun true�� � � � false�� � � �� � ���� �������Warning� this matching case is unused�� � int � �
In general� the unit element ��� the truth values� the integer numerals� thestring constants and the pairing operation in�x comma� have constructor status�as well as other constructors from prede�ned recursive types� When they occurin a pattern the target value must correspond� All other identi�ers match anyexpression and in the process become bound�
As well as the varstructs in lambda expressions� there are other ways of per�forming pattern matching� Instead of creating a function via pattern matchingand applying it to an expression� one can perform pattern�matching over theexpression directly using the following construction�
match expression with pattern���E� j � � � j patternn��En
The simplest alternative of all is to use
let pattern � expression
but in this case only a single pattern is allowed�
����� Recursive types
The previous examples have all been recursive only vacuously� in that we havenot de�ned a type in terms of itself� For a more interesting example� we willdeclare a type of lists �nite ordered sequences� of elements of type �a�
type ��a�list � Nil � Cons of �a � ��a�list��Type list defined�
Let us examine the types of the constructors�
Nil��� � �a list � Nil Cons��� � �a � �a list � �a list � �fun
�� CHAPTER � FURTHER ML
The constructor Nil� which takes no arguments� simply creates some object oftype ��a�list which is to be thought of as the empty list� The other constructorCons takes an element of type �a and an element of the new type ��a�list andgives another� which we think of as arising from the old list by adding one elementto the front of it� For example� we can consider the following�
Nil��� � �a list � Nil Cons���Nil���� � int list � Cons ��� Nil� Cons���Cons���Nil����� � int list � Cons ��� Cons ��� Nil�� Cons���Cons���Cons���Nil������ � int list � Cons ��� Cons ��� Cons ��� Nil���
Because the constructors are distinct and injective� it is easy to see thatall these values� which we think of as lists � � �� � ��� � and ��� �� � � are distinct�Indeed� purely from these properties of the constructors� it follows that arbitrarilylong lists of elements may be encoded in the new type� Actually� ML already hasa type list just like this one de�ned� The only di�erence is syntactic� the emptylist is written �� and the recursive constructor ��� has in�x status� Thus� theabove lists are actually written�
!��� � �a list � ! ��� !��� � int list � �! ������ !��� � int list � �� �! ��������� !��� � int list � �� �� �!
The lists are printed in an even more natural notation� and this is also allowedfor input� Nevertheless� when the exact expression in terms of constructors isneeded� it must be remembered that this is only a surface syntax� For example�we can de�ne functions to take the head and tail of a list� using pattern matching�
let hd �h��t� � h��Toplevel input�let hd �h��t� � h�� �������������Warning� this matching is not exhaustive�hd � �a list � �a � �fun let tl �h��t� � t��Toplevel input�let tl �h��t� � t�� �������������Warning� this matching is not exhaustive�tl � �a list � �a list � �fun
��� TYPE DEFINITIONS ��
The compiler warns us that these both fail when applied to the empty list�since there is no pattern to cover it remember that the constructors are distinct��Let us see them in action�
hd �����!��� � int � � tl �����!��� � int list � �� �! hd !��Uncaught exception� Match�failure
Note that the following is not a correct de�nition of hd� In fact� it constrainsthe input list to have exactly two elements for matching to succeed� as can beseen by thinking of the version in terms of the constructors�
let hd x�y! � x��Toplevel input�let hd x�y! � x�� ������������Warning� this matching is not exhaustive�hd � �a list � �a � �fun hd ���!��� � int � � hd �����!��Uncaught exception� Match�failure
Pattern matching can be combined with recursion� For example� here is afunction to return the length of a list�
let rec length �fun ! �
� �h��t� � � � length t��length � �a list � int � �fun length !��� � int � length �����!��� � int � �
Alternatively� this can be written in terms of our earlier �destructor functionshd and tl�
let rec length l �if l � ! then else � � length�tl l���
This latter style of function de�nition is more usual in many languages� no�tably LISP� but the direct use of pattern matching is often more elegant�
�� CHAPTER � FURTHER ML
����� Tree structures
It is often helpful to visualize the elements of recursive types as tree structures�with the recursive constructors at the branch nodes and the other datatypes atthe leaves� The recursiveness merely says that plugging subtrees together givesanother tree� In the case of lists the �trees are all rather spindly and one�sided�with the list ����� being represented as�
��
�
������
��
����
��
����
��
�
�
�
� ��
It is not di�cult to de�ne recursive types which allow more balanced trees�e�g�
type ��a�btree � Leaf of �a� Branch of ��a�btree � ��a�btree��
In general� there can be several di�erent recursive constructors� each with adi�erent number of descendants� This gives a very natural way of representingthe syntax trees of programming and other formal� languages� For example� hereis a type to represent arithmetical expressions built up from integers by additionand multiplication�
type expression � Integer of int� Sum of expression � expression� Product of expression � expression��
and here is a recursive function to evaluate such expressions�
let rec eval �fun �Integer i� � i� �Sum�e��e��� � eval e� � eval e�� �Product�e��e��� � eval e� � eval e���
eval � expression � int � �fun eval �Product�Sum�Integer ��Integer ���Integer ������ � int � ��
��� TYPE DEFINITIONS ��
Such abstract syntax trees are a useful representation which allows all sorts ofmanipulations� Often the �rst step programming language compilers and relatedtools take is to translate the input text into an �abstract syntax tree accordingto the parsing rules� Note that conventions such as precedences and bracketingsare not needed once we have reached the level of abstract syntax� the tree struc�ture makes these explicit� We will use such techniques to write ML versions ofthe formal rules of lambda calculus that we were considering earlier� First� thefollowing type is used to represent lambda terms�
type term � Var of string� Const of string� Comb of term � term� Abs of string � term��
Type term defined�
Note that we make Abs take a variable name and a term rather than twoterms in order to prohibit the formation of terms that aren t valid� For example�we represent the term �x y� y x x y� by�
Abs��x��Abs��y��Comb�Var �y��Comb�Comb�Var �x��Var �x���Var �y�����
Here is the recursive de�nition of a function free in to decide whether avariable is free in a term�
let rec free�in x �fun �Var v� � x � v
� �Const c� � false� �Comb�s�t�� � free�in x s or free�in x t� �Abs�v�t�� � not x � v " free�in x t��
free�in � string � term � bool � �fun free�in �x� �Comb�Var �f��Var �x������ � bool � true free�in �x� �Abs��x��Comb�Var �x��Var �y������� � bool � false
Similarly� we can de�ne substitution by recursion� First� though� we needa function to rename variables to avoid name clashes� We want the ability toconvert an existing variable name to a new one that is not free in some givenexpression� The renaming is done by adding prime characters�
let rec variant x t �if free�in x t then variant �x����� telse x��
variant � string � term � string � �fun variant �x� �Comb�Var �f��Var �x������ � string � �x�� variant �x� �Abs��x��Comb�Var �x��Var �y������� � string � �x� variant �x� �Comb�Var �f��Comb�Var �x��Var �x�������� � string � �x���
� CHAPTER � FURTHER ML
Now we can de�ne substitution just as we did in abstract terms�
let rec subst u �t�x� �match u withVar y � if x � y then t else Var y
� Const c � Const c� Comb�s��s�� � Comb�subst s� �t�x��subst s� �t�x��� Abs�y�s� � if x � y then Abs�y�s�
else if free�in x s " free�in y t thenlet z � variant y �Comb�s�t�� inAbs�z�subst �subst s �Var y�z�� �t�x��
else Abs�y�subst s �t�x����subst � term � term � string � term � �fun
Note the very close parallels between the standard mathematical presentationof lambda calculus� as we presented earlier� and the ML version here� All we reallyneed to make the analogy complete is a means of reading and writing the termsin some more readable form� rather than by explicitly invoking the constructors�We will return to this issue later� and see how to add these features�
����� The subtlety of recursive types
A recursive type may contain nested instances of other type constructors� includ�ing the function space constructor� For example� consider the following�
type ��a�embedding � K of ��a�embedding��a��Type embedding defined�
If we stop to think about the underlying semantics� this looks disquieting�Consider for example the special case when �a is bool� We then have an injec�tive function K���bool�embedding��bool����bool�embedding� This directlycontradicts Cantor s theorem that the set of all subsets of X cannot be injectedinto X�� Hence we need to be more careful with the semantics of types� Infact � � � cannot be interpreted as the full function space� or recursive typeconstructions like the above are inconsistent� However� since all functions wecan actually create are computable� it is reasonable to restrict ourselves to com�putable functions only� With that restriction� a consistent semantics is possible�although the details are complicated�
The above de�nition also has interesting consequences for the type system�For example� we can now de�ne the Y combinator� by using K as a kind of typecast� Note that if the Ks are deleted� this is essentially the usual de�nition of
�Proof� consider C � fi�s j s � ��X and i�s �� sg� If i � ��X � X is injective we havei�C � C � i�C �� C a contradiction� This is similar to the Russell paradox and in factprobably inspired it� The analogy is even closer if we consider the equivalent form that thereis no surjection j � X � ��X and prove it by considering fs j s �� j�sg�
��� TYPE DEFINITIONS ��
the Y combinator in untyped lambda calculus� The use of let is only used forthe sake of e�ciency� but we do need the �redex involving z in order to preventlooping under ML s evaluation strategy�
let Y h �let g �K x� z � h �x �K x�� z ing �K g���
Y � ���a � �b� � �a � �b� � �a � �b � �fun let fact � Y �fun f n � if n � then � else n � f�n � �����fact � int � int � �fun fact ���� � int � ��
Thus� recursive types are a powerful addition to the language� and allow usto back o� the change of making the recursion operator a primitive�
Exercises
�� What goes wrong in our de�nition of subst if we replace
Var y �� if x � y then t else Var y
by these separate patterns�
Var x �� t # Var y �� Var y
�� It would be possible� though ine�cient� to de�ne natural numbers using thefollowing recursive type�
type num � Zero � Suc of num��
De�ne functions to perform arithmetic on natural numbers in this form�
�� Use the type de�nition for the syntax of lambda terms that we gave above�Write a function to convert an arbitrary lambda term into an equivalentin terms of the S� K and I combinators� based on the technique we gaveearlier� Represent the combinators by Const �S� etc�
�� Extend the syntax of lambda terms to a incorporate Church�style typing�Write functions that check the typing conditions before allowing terms tobe formed� but otherwise map down to the primitive constructors�
�� Consider a type of binary trees with strings at the nodes de�ned by�
type stree � Leaf � Branch of stree � string � stree��
� CHAPTER � FURTHER ML
Write a function strings to return a list of the strings occurring in the treein left�to�right order� i�e� at each node it appends the list resulting from theleft branch� the singleton list at the current node� and the list occurring atthe right branch� in that order� Write a function that takes a tree where thecorresponding list is alphabetically sorted and a new string� and returns atree with the new string inserted in the appropriate place�
�� �� Re�ne the above functions so that the tree remains almost balancedat each stage� i�e� the maximum depths occurring at the left and rightbranches di�er by at most one��
�� �� Can you characterize the types of expressions that can be written downin simply typed lambda calculus� without a recursion operator�� What isthe type of the following recursive function�
let rec f i � �i o f� �i o i���
Show how it can be used to write down ML expressions with completelypolymorphic type �� Can you write a terminating expression with thisproperty� What about an arbitrary function type �� ��
�See Adel�son�Vel�skii and Landis ���� and various algorithm books for more details ofthese �AVL trees� and similar methods for balanced trees�
�The answer lies in the �Curry�Howard isomorphism� � see Girard Lafont and Taylor����� for example�
Chapter �
Proving programs correct
As programmers know through bitter personal experience� it can be very di�cultto write a program that is correct� i�e� performs its intended function� Most largeprograms have bugs� The consequences of these bugs can vary widely� Somebugs are harmless� some merely irritating� Some are deadly� For example theprograms inside heart pacemakers� aircraft autopilots� car engine managementsystems and antilock braking systems� radiation therapy machines and nuclearreactor controllers are safety critical� An error in one of them can lead directly toloss of life� possibly on a large scale� As computers become ever more pervasivein society� the number of ways in which program bugs can kill people increases��
How can we make sure a program is correct� One very useful method is simplyto test it in a variety of situations� One tries the program out on inputs designedto exercise many di�erent parts of it and reveal possible bugs� But usually thereare too many possible situations to try them all exhaustively� so there may stillbe bugs lying undetected� As repeatedly emphasized by various writers� notablyDijkstra� program testing can be very useful for demonstrating the presence ofbugs� but it is only in a few unusual cases where it can demonstrate their absence�
Instead of testing a program� one can try to verify it mathematically� Aprogram in a su�ciently precisely de�ned programming language has a de�nitemathematical meaning� Similarly� the requirements on a program can be ex�pressed using mathematics and logic as a precise speci�cation� One can then tryto perform a mathematical proof that the program meets its speci�cation�
One can contrast testing and veri�cation of programs by considering the fol�lowing simple mathematical assertion�
�Nn��n �
NN " ��
�It is claimed that this holds for any N � We can certainly test it for any
number of particular values of N � This may lead to our intuitive con�dencein the formula� such that we can hardly believe it could ever fail� However in
�Similar remarks apply to hardware but we will focus on software�
�
� CHAPTER � PROVING PROGRAMS CORRECT
general a preponderance of numerical evidence can be deceptive� there are wellknown cases in number theory where things turned out to be false against theweight of many particular cases�� A more reliable procedure is to prove the aboveformula mathematically� This can easily be done using induction on N � In thesame way� we hope to replace testing a program on a range of inputs by somecompletely general proof that it always works correctly� It is important� however�to appreciate two limitations of program proofs�
� One has no guarantee that the execution of the program on the computercorresponds exactly to the abstract mathematical model� One must beon the lookout for discrepancies� caused perhaps by bugs in compilers andoperating systems� or physical defects in the underlying technology� This is�of course� a general problem in applications of mathematics to science� Forexample� assuming that arithmetic operations correspond exactly to theirmathematical counterparts is a natural simplifying assumption� just as onemight neglect air resistance in the analysis of a simple dynamical system�But both can be invalid in certain cases�
� The program is to be veri�ed by proof against a mathematical speci�cation�It is possible that this speci�cation does not capture accurately what isactually wanted in the real world� In fact� it can be remarkably di�cult toarrive at a precise mathematical version of informal requirements� There isevidence that many of the serious problems in computer systems are causednot by coding errors in the usual sense� but by a mistaken impression ofwhat is actually wanted� Just formalizing the requirements is often valuablein itself�
We can represent this situation by the following diagram�
Actual system
Mathematical model
Mathematical speci�cation
Actual requirements
�
�
�
We are trying to establish a link between the bottom and top boxes� i�e� theactual system and the actual requirements in real life� To do this� we proceed by
�For example Littlewood proved in ���� that ��n � li�n changes sign in�nitely often where ��n is the number of primes � n and li�n �
Rn
du�ln�u� This came as a surprise
since not a single sign change had been found despite extensive testing up to ����
��� FUNCTIONAL PROGRAMS AS MATHEMATICAL OBJECTS �
producing a mathematical version of each� It is only the central link� betweenthe mathematical model of the system and the mathematical version of the spec�i�cation� that is mathematically precise� The other links remain informal� Allwe can do is try to keep them small by using a realistic model of the system anda high�level mathematical speci�cation that is reasonably readable�
These reservations notwithstanding� program proving has considerable merit�Compared with testing� it establishes correctness once and for all� possibly for awhole class of programs at once e�g� controlled by the setting of some parameter��Moreover because it is a more principled analytical approach� the process ofproving a program or failing to do so� can lead to a deeper appreciation of theprogram and of the task in hand�
��� Functional programs as mathematical ob
jects
In the introduction� we remarked that pure� functional programs really corre�spond directly to functions in the mathematical sense� For this reason� it isoften suggested that they are more amenable to formal proof than imperativeprograms� Even if this is true� and many would dispute it� it is worth realizingthat the gap between this mathematical abstraction and the �nal execution inhardware is greater than for typical imperative languages� In particular� therecan be substantial storage requirements hidden in the eventual realization on thecomputer� So one may just be getting easier proofs because they prove less aboutwhat really matters� We won t settle this argument here� but we want to showthat reasoning about simple functional programs is often fairly straightforward�
The example at the end of the previous chapter showed that a naive associ�ation of function spaces of ML and function spaces of mathematics is not quiteaccurate� However� if we stay away from the higher reaches of recursive types andare not concerned with the space of all functions� only with some particular ones�we can safely identify the objects of ML with those of abstract mathematicalrealms��
We intend to model functional programs as mathematical functions� Giventhe equations � typically recursive � that de�ne an ML function� we want tointerpret them as �axioms about the corresponding mathematical objects� Forexample� we would assume from the de�nition of the factorial function�
let rec fact � fun � �� n � n � fact�n � ����
that fact�� � � and that for all n �� � we have factn� � n � factn� ��� This isright� but we need to tread carefully because of the question of termination� For
�We will neglect arithmetic over�ow whenever we use arithmetic� There is a facility in CAMLfor arbitrary precision arithmetic at the cost of in principle unlimited storage requirements�
� CHAPTER � PROVING PROGRAMS CORRECT
negative n� the program fails to terminate� so the second equation is true onlyin the vacuous sense that both sides are �unde�ned � It is much easier to reasonabout functions when we know they are total� so sometimes one uses a separateargument to establish termination� In the examples that follow� this is done inparallel with the correctness proof� i�e� we prove together that for each argument�the function terminates and the result satis�es our speci�cation�
As for the actual proof that the function obeys the speci�cation� this canbe a mathematical proof in the most general sense� However it often happensthat when functions are de�ned by recursion� properties of them� dually� canbe proved by induction� this includes termination� Moreover� the exact form ofinduction arithmetical� structural� wellfounded� usually corresponds to the modeof recursion used in the de�nition� This should become clearer when we have seena few examples�
��� Exponentiation
Recall our simple de�nition of exponentiation�
let rec exp x n �if n � then �else x � exp x �n � ����
We will prove the following property of exp�
Theorem ��� For all n � � and x� exp x n is de�ned and exp x n � xn�Proof� The ML function exp was de�ned by primitive� step�by�step� recursion�It is therefore no surprise that our proof proceeds by ordinary� step�by�step induc�tion� We prove that it holds for n � � and then that if it holds for any n � � italso holds for n " ��
�� If n � �� then by de�nition exp x n � �� By de�nition� for any integerx� we have x� � �� so the desired fact is established� Note that we assume�� � �� an example of how one must state the speci�cation precisely whatdoes xn mean in general� and how it might surprise some people�
�� Suppose that for n � � we have exp x n � xn� Since n � �� we haven " � �� � and so by de�nition exp x n " �� � x � exp x n " �� � ���Therefore�
exp x n " �� � x � exp x n " ��� ��
� x � exp x n
� x � xn
� xn�
��� GREATEST COMMON DIVISOR �
Q�E�D�
��� Greatest common divisor
Recall our function to calculate the greatest common divisor gcdm�n� of twonatural numbers m and n�
let rec gcd x y �if y � then x else gcd y �x mod y���
In fact� we claim this works for any integers� not just positive ones� However�one needs to understand the precise de�nition of gcd in that case� Let us de�nethe relation �ujv � or �u divides v for any two integers u and v to mean �v is anintegral multiple of u � i�e� there is some integer d with v � du� For example� �j���j��� ��j� but � � j�� � � j�� We will say that d is a greatest common divisor of xand y precisely if�
� djx and djy
� For any other integer d�� if d�jx and d�jy then d�jd�
Note that we say d�jd not d� � d� This belies the use of �greatest � but it is infact the standard de�nition in algebra books� Observe that any pair of numbersexcept � and �� has two gcds� since if d is a gcd of x and y� so is �d�
Our speci�cation is now� for any integers x and y� d � gcd x y is a gcd ofx and y� This is another good example of how providing a precise speci�cation�even for such a simple function� is harder than it looks� This speci�cation alsoexempli�es the common property that it does not completely specify the answer�merely places certain constraints on it� If we de�ned�
let rec ngcd x y � ��gcd x y���
then the function ngcd would satisfy the speci�cation just as well as gcd� Ofcourse� we are free to tighten the speci�cation if we wish� e�g� insist that if x andy are positive� so is the gcd returned�
The gcd function is not de�ned simply by primitive recursion� In fact� gcdx y is de�ned in terms of gcd y �x mod y� in the step case� Correspondingly�we do not use step�by�step induction� but wellfounded induction� We want awellfounded relation that decreases over recursive calls� this will guarantee ter�mination and act as a handle for proofs by wellfounded induction� In general�complicated relations on the arguments are required� But often it is easy toconcoct a measure that maps the arguments to natural numbers such that themeasure decreases over recursive calls� Such is the case here� the measure is jyj�
� CHAPTER � PROVING PROGRAMS CORRECT
Theorem ��� For any integers x and y� gcd x y terminates in a gcd of x and y�Proof� Take some arbitrary n� and suppose that for all x and y with jyj n wehave that gcd x y terminates with a gcd of x and y� and try to prove that thesame holds for all x and y with jyj � n� This suces for the main result� sinceany y will have some n with jyj � n� So� let us consider an x and y with jyj � n�There are two cases to consider� according to the case split used in the de�nition�
� Suppose y � �� Then gcd x y � x by de�nition� Now trivially xjx andxj�� so it is a common divisor� Suppose d is another common divisor� i�e�djx and dj�� Then indeed we have djx immediately� so x must be a greatestcommon divisor�
� Suppose y �� �� We want to apply the inductive hypothesis to gcd y x mod y��We will write r � x mod y for short� The basic property of the mod functionis that� since y �� �� for some integer q we have x � qy " r and jrj jyj�Since jrj jyj the inductive hypothesis tells us that d � gcd y x mod y� isa gcd of y and r� We just need to show that it is a gcd of x and y� It iscertainly a common divisor� since if djy and djr we have djx� as x � qy"r�Now suppose d�jx and d�jy� By the same equation� we �nd that d�jr� Thusd� is a common divisor of y and r� but then by the inductive hypothesis� d�jdas required�
Q�E�D�
Note that the basic property of the modulus operation that we used needscareful checking against the speci�cation of mod in the CAML manual� Thereare well known di�erences between languages� and between implementations ofthe same language� when moduli involving negative numbers are concerned� Ifever one is in doubt over the validity of a necessary assumption� it can be madeexplicit in the theorem� �if � � � then � � � �
��� Appending
We will now have an example of a function de�ned over lists� The functionappend is intended� not surprisingly� to append� or join together� two lists� Forexample� if we append ��� �� � and ��� �� � we get ��� �� �� �� �� � �
let rec append l� l� �match l� with ! � l�
� �h��t� � h���append t l����
This is de�ned by primitive recursion over the type of lists� It is de�nedon the empty list� and then for h �� t in terms of the value for t� Consequently�
��� REVERSING �
when proving theorems about it� it is natural to use the corresponding principle ofstructural induction for lists� if a property holds for the empty list� and wheneverit holds for t it holds for any h �� t� then it holds for any list� However this is notobligatory� and if preferred� we could proceed by mathematical induction on thelength of the list� We will aim at proving that the append operation is associative�
Theorem ��� For any three lists l�� l� and l� we have�
append l� append l� l�� � append append l� l�� l�
Proof� By structural induction on l�� we prove this holds for any l� and l��
� If l� � � then�
append l� append l� l�� � append � append l� l��
� append l� l�
� append append � l�� l�
� append append l� l�� l�
� Now suppose l� � h �� t� We may assume that for any l� and l� we have�
append t append l� l�� � append append t l�� l�
Therefore�
append l� append l� l�� � append h �� t� append l� l��
� h �� append t append l� l���
� h �� append append t l�� l��
� append h �� append t l��� l��
� append append h �� t� l�� l��
� append append l� l�� l��
Q�E�D�
�� Reversing
It is not di�cult to de�ne a function to reverse a list�
CHAPTER � PROVING PROGRAMS CORRECT
let rec rev �fun ! � !� �h��t� � append �rev t� h!��
rev � �a list � �a list � �fun rev �����!��� � int list � �� �� �!
We will prove that rev is an involution� i�e� that
revrev l� � l
However� if we try to tackle this directly by list induction� we �nd that weneed a couple of additional lemmas� We will prove these �rst�
Lemma ��� For any list l we have append l � � l�Proof� Structural induction on l�
� If l � � we have�
append l � � append � �
� �
� l
� Now suppose l � h �� t and we know that append t � � t� We �nd�
append l � � append h �� t� �
� h �� append t � �
� h �� t
� l
Q�E�D�
Lemma �� For any lists l� and l� we have
revappend l� l�� � append rev l�� rev l��
Proof� Structural induction on l��
� If l� � � we have�
revappend l� l�� � revappend � l��
� rev l�
� append rev l���
� append rev l�� rev � �
��� REVERSING �
� Now suppose l� � h �� t and we know that
revappend t l�� � append rev l�� rev t�
then we �nd�
revappend l� l�� � revappend h �� t� l��
� revh �� append t l���
� append revappend t l��� �h
� append append rev l�� rev t�� �h
� append rev l�� append rev t� �h �
� append rev l�� rev h �� t��
� append rev l�� rev l��
Q�E�D�
Theorem ��� For any list l we have revrev l� � l�Proof� Structural induction on l�
� If l � � we have�
revrev l� � revrev � �
� rev �
� �
� l
� Now suppose l � h �� t and we know that
revrev t� � t
then we �nd�
revrev l� � revrev h �� t��
� revappend rev t� �h �
� append rev �h � revrev t��
� append rev �h � t
� append rev h �� � �� t
�� CHAPTER � PROVING PROGRAMS CORRECT
� append append rev � � �h � t
� append append � �h � t
� append �h t
� append h �� � � t
� h �� append � t�
� h �� t
� l
Q�E�D�
There are lots of theorems relating the list operations that can be proved ina similar style� Generally� one proceeds by list induction� In subtler cases� onehas to split o� lemmas which are themselves proved inductively� and sometimesto generalize the theorem before proceeding by induction� There are plenty ofexamples in the exercises for readers to try their hand at�
Further reading
Neumann ����� catalogues the dangers arising from the use of computers insociety� including those arising from software bugs� A very readable discussionof the issues� with extensive discussion of some interesting examples� is given byPeterson ������ The general issue of software veri�cation was at one time con�troversial� with DeMillo� Lipton� and Perlis ����� among others arguing againstit� A cogent discussion is given by Barwise ����� There is now a large liter�ature in software veri�cation� though much of it is concerned with imperativeprograms� Many introductory functional programming books such as Paulson����� and Reade ���� give some basic examples like the ones here� It is alsoworth looking at the work of Boyer and Moore ����� on verifying properties ofpure LISP functions expressed inside their theorem prover� One of the largestreal proofs of the type we consider here is the veri�cation by Aagaard and Leeser����� of a boolean simpli�er used in VLSI logic synthesis�
Exercises
�� Prove the correctness of the more e�cient program we gave for performingexponentiation�
��� REVERSING ��
let square x � x � x�� let rec exp x n �if n � then �else if n mod � � then square�exp x �n � ���else x � square�exp x �n � �����
�� Recall the de�nition of length�
let rec length �fun ! �
� �h��t� � � � length t��
Prove that lengthrev l� � length l and that lengthappend l� l�� �length l� " length l��
�� De�ne the map function� which applies a function to each element of a list�as follows�
let rec map f �fun ! � !
� �h��t� � �f h����map f t���
Prove that if l �� � then hdmap f l� � fhd l�� and that map f rev l� �rev map f l�� Recall the de�nition of function composition�
let o f g � fun x � f�g x��� infix �o���
Prove that map f map g l� � map f � g� l�
�� McCarthy s ��� function can be de�ned by�
let rec f x � if x � then x � �else f�f�x � ������
Prove that for n � ���� we have fn� � ��� Pay careful attention toestablishing termination� Hint� a possible measure is ���� x��
�� CHAPTER � PROVING PROGRAMS CORRECT
�� The problem of the Dutch National Flag is to sort a list of �colours red�white and blue� so that the reds come �rst� then the whites and then theblues� However� the only method permitted is to swap adjacent elements�Here is an ML solution� The function dnf returns �true i� it has made achange� and this function is then repeated until no change occurs�
type colour � Red � White � Blue��
let rec dnf �fun ! � !�false� �White��Red��rest� � Red��White��rest�true� �Blue��Red��rest� � Red��Blue��rest�true� �Blue��White��rest� � White��Blue��rest�true� �x��rest� � let fl�ch � dnf rest in x��fl�ch��
let rec flag l �let l��changed � dnf l inif changed then flag l� else l���
For example�
flag White� Red� Blue� Blue� Red� White� White� Blue� Red!��� � colour list � Red� Red� Red� White� White� White� Blue� Blue� Blue!
Prove that the function flag always terminates with a correctly sorted list�
�� �� De�ne the following functions�
��� REVERSING ��
let rec sorted �fun ! � true
� h! � true� �h���h���t� � h� �� h� " sorted�h���t���
let rec filter p �fun ! � !
� �h��t� � let t� � filter p t inif p h then h��t� else t���
let sameprop p l� l� �length�filter p l�� � length�filter p l����
let rec permutes l� l� �fun ! � true
� �h��t� � sameprop �fun x � x � h� l� l� "permutes l� l� t��
let permuted l� l� � permutes l� l� l���
What do they all do� Implement a sorting function sort and prove thatfor all lists l one has sortedsort l� � true and permuted l sort l� � true�
Chapter
Eective ML
In this chapter� we discuss some of the techniques and tricks that ML program�mers can use to make programs more elegant and more e�cient� We then go onto discuss some additional imperative features that can be used when the purelyfunctional style seems inappropriate�
��� Useful combinators
The �exibility of higher order functions often means that one can write some veryuseful little functions that can be re�used for a variety of related tasks� These areoften called combinators� and not simply because they can be regarded as lambdaterms with no free variables� It often turns out that these functions are so �exiblethat practically anything can be implemented by plugging them together� ratherthan� say� explicitly making a recursive de�nition� In this way they correspond tothe original view of combinators as ubiquitous building blocks for mathematicalexpressions�
For example� a very useful combinator for list operations� often called �itlist or �fold � performs the following operation�
itlist f �x�� x�� � � � � xn b � f x� f x� f x� � � � f xn b����
A straightforward de�nition in ML is�
let rec itlist f �fun ! b � b� �h��t� b � f h �itlist f t b���
itlist � ��a � �b � �b� � �a list � �b � �b � �fun
Quite commonly� when de�ning a recursive function over lists� all one is doingis repeatedly applying some operator in this manner� By using itlist withthe appropriate argument� one can implement such functions very easily withoutexplicit use of recursion� A typical use is a function to add all the elements of alist of numbers�
��
���� USEFUL COMBINATORS ��
let sum l � itlist �fun x sum � x � sum� l ��sum � int list � int � �fun sum ���������!��� � int � �� sum !��� � int � sum �������!��� � int � �
Those especially keen on brevity might prefer to code sum as�
let sum l � itlist �prefix �� l ��
It is easy to modify this function to form a product rather than a sum�
let prod l � itlist �prefix �� l ���
Many useful list operations can be implemented in this way� For example hereis a function to �lter out only those elements of a list satisfying a predicate�
let filter p l � itlist �fun x s � if p x then x��s else s� l !��filter � ��a � bool� � �a list � �a list � �fun filter �fun x � x mod � � � ���������������!��� � int list � �� �� �!
Here are functions to �nd whether either all or some of the elements of a listsatisfy a predicate�
let forall p l � itlist �fun h a � p�h� " a� l true��forall � ��a � bool� � �a list � bool � �fun let exists p l � itlist �fun h a � p�h� or a� l false��exists � ��a � bool� � �a list � bool � �fun forall �fun x � x � �� ���!��� � bool � true forall �fun x � x � �� �����!��� � bool � false
and here are alternative versions of old favourites length� append and map�
let length l � itlist �fun x s � s � �� l ��length � �a list � int � �fun let append l m � itlist �fun h t � h��t� l m��append � �a list � �a list � �a list � �fun let map f l � itlist �fun x s � �f x���s� l !��map � ��a � �b� � �a list � �b list � �fun
Some of these functions can themselves become useful combinators� and so onupwards� For example� if we are interested in treating lists as sets� i�e� avoidingduplicate elements� then many of the standard set operations can be expressedvery simply in terms of the combinators above�
�� CHAPTER �� EFFECTIVE ML
let mem x l � exists �fun y � y � x� l��mem � �a � �a list � bool � �fun let insert x l �if mem x l then l else x��l��
insert � �a � �a list � �a list � �fun let union l� l� � itlist insert l� l���union � �a list � �a list � �a list � �fun let setify l � union l !��setify � �a list � �a list � �fun let Union l � itlist union l !��Union � �a list list � �a list � �fun let intersect l� l� � filter �fun x � mem x l�� l���intersect � �a list � �a list � �a list � �fun let subtract l� l� � filter �fun x � not mem x l�� l���subtract � �a list � �a list � �a list � �fun let subset l� l� � forall �fun t � mem t l�� l���subset � �a list � �a list � bool � �fun
The setify function is supposed to turn a list into a set by eliminating anyduplicate elements�
��� Writing e�cient code
Here we accumulate some common tricks of the trade� which can often make MLprograms substantially more e�cient� In order to justify some of them� we needto sketch in general terms how certain constructs are executed in hardware�
���� Tail recursion and accumulators
The principal control mechanism in functional programs is recursion� If we areinterested in e�cient programs� it behoves us to think a little about how recursionis implemented on conventional hardware� In fact� there is not� in this respectat least� much di�erence between the implementation of ML and many otherlanguages with dynamic variables� such as C�
If functions cannot be called recursively� then we are safe in storing their localvariables which includes the values of arguments� at a �xed place in memory� this is what FORTRAN does� However� this is not possible in general if thefunction can be called recursively� A call to a function f with one set of argumentsmay include within it a call to f with a di�erent set of arguments� The old oneswould be overwritten� even if the outer version of f needs to refer to them againafter the inner call has �nished� For example� consider the factorial function yetagain�
let rec fact n � if n � then �else n � fact�n � ����
���� WRITING EFFICIENT CODE ��
A call to fact � causes another call to fact � and beyond�� but when thiscall is �nished and the value of fact � is obtained� we still need the original valueof n� namely �� in order to do the multiplication yielding the �nal result� Whatnormally happens in implementations is that each function call is allocated a newframe on a stack� Every new function call moves the stack pointer further down�
the stack� creating space for new variables� When the function call is �nishedthe stack pointer moves up and so the unwanted inner variables are discardedautomatically� A diagram may make this clearer�
SP � n � �
n � �
n � �
n � �
n � �
n � �
n � �
This is an imagined snapshot of the stack during execution of the innermostrecursive call� i�e� fact �� All the local variables for the upper stages are stackedup above� with each instance of the function having its own stack frame� andwhen the calls are �nished the stack pointer SP moves back up�
Therefore� our implementation of fact requires n stack frames when appliedto argument n� By contrast� consider the following implementation of the factorialfunction�
let rec tfact x n �if n � then xelse tfact �x � n� �n � ����
tfact � int � int � int � �fun let fact n � tfact � n��fact � int � int � �fun fact ���� � int � ��
Although tfact is also recursive� the recursive call is the whole expression�it does not occur as a proper subexpression of some other expression involvingvalues of variables� Such a call is said to be a tail call because it is the very lastthing the calling function does�� and a function where all recursive calls are tailcalls is said to be tail recursive�
�Despite the name stacks conventionally grow downwards�
� CHAPTER �� EFFECTIVE ML
What is signi�cant about tail calls� When making a recursive call to tfact�there is no need to preserve the old values of the local variables� Exactly the same��xed� area of storage can be used� This of course depends on the compiler s beingintelligent enough to recognize the fact� but most compilers� including CAMLLight� are� Consequently� re�coding a function so that the recursive core of it istail recursive can dramatically cut down the use of storage� For functions like thefactorial� it is hardly likely that they will be called with large enough values ofn to make the stack over�ow� However the naive implementations of many listfunctions can cause such an e�ect when the argument lists are long�
The additional argument x of the tfact function is called an accumulator�because it accumulates the result as the recursive calls rack up� and is thenreturned at the end� Working in this way� rather than modifying the return valueon the way back up� is a common way of making functions tail recursive�
We have remarked that a �xed area of storage can be used for the argumentsto a tail recursive function� On this view� one can look at a tail recursive functionas a thinly�veiled imperative implementation� There is an obvious parallel withour C implementation of the factorial as an iterative function�
int fact�int n�
� int x �
while �n � ��
� x � x n
n � n �
�
return x
�
The initialization x � corresponds to our setting of x to � by an outerwrapper function fact� The central while loop corresponds to the recursivecalls� the only di�erence being that the arguments to the tail recursive functionmake explicit that part of the state we are interested in assigning to� Ratherthan assigning and looping� we make a recursive call with the variables updated�Using similar tricks and making the state explicit� one can easily write essentiallyimperative code in an ostensibly functional style� with the knowledge that understandard compiler optimizations� the e�ect inside the machine will� in fact� bemuch the same�
���� Minimizing consing
We have already considered the use of stack space� But various constructs infunctional programs use another kind of store� usually allocated from an areacalled the heap� Whereas the stack grows and shrinks in a sequential mannerbased on the �ow of control between functions� other storage used by the ML
���� WRITING EFFICIENT CODE ��
system cannot be reclaimed in such a simple way� Instead� the runtime systemoccasionally needs to check which bits of allocated memory aren t being usedany more� and reclaim them for future applications� a process known as garbagecollection� A particularly important example is the space used by constructors forrecursive types� e�g� ��� For example� when the following fragment is executed�
let l � ��� ! in tl l��
a new block of memory� called a �cons cell � is allocated to store the instance ofthe �� constructor� Typically this might be three words of storage� one being anidenti�er for the constructor� and the other two being pointers to the head andtail of the list� Now in general� it is di�cult to decide when this memory can bereclaimed� In the above example� we immediately select the tail of the list� so it isclear that the cons cell can be recycled immediately� But in general this can t bedecided by looking at the program� since l might be passed to various functionsthat may or may not just look at the components of the list� Instead� one needs toanalyze the memory usage dynamically and perform garbage collection of what isno longer needed� Otherwise one would eventually run out of storage even whenonly a small amount is ever needed simultaneously�
Implementors of functional languages work hard on making garbage collectione�cient� Some claim that automatic memory allocation and garbage collectionoften works out faster than typical uses of explicit memory allocation in languageslike C malloc etc�� While we wouldn t go that far� it is certainly very convenientthat memory allocation is always done automatically� It avoids a lot of tediousand notoriously error�prone parts of programming�
Many constructs beloved of functional programmers use storage that needs tobe reclaimed by garbage collection� While worrying too much about this wouldcripple the style of functional programs� there are some simple measures that canbe taken to avoid gratuitous consing creation of cons cells�� One very simplerule of thumb is to avoid using append if possible� As can be seen by consideringthe way the recursive calls unroll according to the de�nition
let rec append l� l� �match l� with
! � l�� �h��t� � h���append t l����
this typically generates n cons cells where n is the length of the �rst argument list�There are often ways of avoiding appending� such as adding extra accumulatorarguments to functions that can be augmented by direct use of consing� A strikingexample is the list reversal function� which we coded earlier as�
let rec rev �fun ! � !
� �h��t� � append �rev t� h!��
��� CHAPTER �� EFFECTIVE ML
This typically generates about n��� cons cells� where n is the length of thelist� The following alternative� using an accumulator� only generates n of them�
let rev �let rec reverse acc �fun ! � acc� �h��t� � reverse �h��acc� t in
reverse !��
Moreover� the recursive core reverse is tail recursive� so we also save stackspace� and win twice over�
For another typical situation where we can avoid appending by judicious useof accumulators� consider the problem of returning the fringe of a binary tree� i�e�a list of the leaves in left�to�right order� If we de�ne the type of binary trees as�
type btree � Leaf of string� Branch of btree � btree��
then a simple coding is the following
let rec fringe �fun �Leaf s� � s!� �Branch�l�r�� � append �fringe l� �fringe r���
However the following more re�ned version performs fewer conses�
let fringe �let rec fr t acc �match t with�Leaf s� � s��acc
� �Branch�l�r�� � fr l �fr r acc� infun t � fr t !��
Note that we have written the accumulator as the second argument� so thatthe recursive call has a more natural left�to�right reading� Here is a simple ex�ample of how either version of fringe may be used�
fringe �Branch�Branch�Leaf �a��Leaf �b���Branch�Leaf �c��Leaf �d������
� � string list � �a�� �b�� �c�� �d�!
The �rst version creates � cons cells� the second only �� On larger trees thee�ect can be more dramatic� Another situation where gratuitous consing cancrop up is in pattern matching� For example� consider the code fragment�
fun ! � !� �h��t� � if h � then t else h��t��
���� WRITING EFFICIENT CODE ���
The �else arm creates a cons cell even though what it constructs was in factthe argument to the function� That is� it is taking the argument apart and thenrebuilding it� One simple way of avoiding this is to recode the function as�
fun l �match l with ! � !
� �h��t� � if h � then t else l��
However ML o�ers a more �exible alternative� using the as keyword� a namemay be identi�ed with certain components of the pattern� so that it never needsto be rebuilt� For example�
fun ! � !� �h��t as l� � if h � then t else l��
���� Forcing evaluation
We have emphasized that� since ML does not evaluate underneath lambdas� onecan use lambdas to delay evaluation� We will see some interesting exampleslater� Conversely� however� it can happen that one wants to force evaluation ofexpressions that are hidden underneath lambdas� For example� recall the tailrecursive factorial above�
let rec tfact x n �if n � then xelse tfact �x � n� �n � ����
let fact n � tfact � n��
Since we never really want to use tfact directly� it seems a pity to bind it toa name� Instead� we can make it local to the factorial function�
let fact� n �let rec tfact x n �if n � then xelse tfact �x � n� �n � �� in
tfact � n��
This� however� has the defect that the local recursive de�nition is only eval�uated after fact receives its argument� since before that it is hidden under alambda� Moreover� it is then reevaluated each time fact is called� We can changethis as follows
let fact� �let rec tfact x n �if n � then xelse tfact �x � n� �n � �� in
tfact ���
��� CHAPTER �� EFFECTIVE ML
Now the local binding is only evaluated once� at the point of declaration offact�� According to our tests� the second version of fact is about ��* fasterwhen called on the argument �� The additional evaluation doesn t amount tomuch in this case� more or less just unravelling a recursive de�nition� yet thespeedup is signi�cant� In instances where there is a lot of computation involvedin evaluating the local binding� the di�erence can be spectacular� In fact� thereis a sophisticated research �eld of �partial evaluation devoted to performing op�timizations like this� and much more sophisticated ones besides� automatically�In a sense� it is a generalization of standard compiler optimizations for ordinarylanguages such as �constant folding � In production ML systems� however� it isnormally the responsibility of the user to force it� as here�
We might note� in passing� that if functions are implemented by pluggingtogether combinators� with fewer explicit lambdas� there is more chance that asmuch of the expression as possible will be evaluated at declaration time� To take atrivial example� f �g will perform any evaluation of f and g that may be possible�whereas �x� fg x� will perform none at all until it receives its argument� On theother side of the coin� when we actually want to delay evaluation� we really needlambdas� so a purely combinatory version is impossible�
��� Imperative features
ML has a fairly full complement of imperative features� We will not spend muchtime on the imperative style of programming� since that is not the focus of thiscourse� and we assume readers already have su�cient experience� Therefore� wetreat these topics fairly quickly with few illustrative examples� However someimperative features are used in later examples� and some knowledge of what isavailable will stand the reader in good stead for writing practical ML code�
���� Exceptions
We have seen on occasion that certain evaluations fail� e�g� through a failure inpattern matching� There are other reasons for failure� e�g� attempts to divide byzero�
� � ��Uncaught exception� Division�by�zero
In all these cases the compiler complains about an �uncaught exception � Anexception is a kind of error indication� but they need not always be propagatedto the top level� There is a type exn of exceptions� which is e�ectively a recursivetype� though it is usually recursive only vacuously� Unlike with ordinary types�one can add new constructors for the type exn at any point in the program viaan exception declaration� e�g�
���� IMPERATIVE FEATURES ���
exception Died��Exception Died defined� exception Failed of string��Exception Failed defined�
While certain built�in operations generate one usually says raise� exceptions�this can also be done explicitly using the raise construct� e�g�
raise �Failed �I don�t know why����Uncaught exception� Failed �I don�t know why�
For example� we might invent our own exception to cover the case of takingthe head of an empty list�
exception Head�of�empty��Exception Head�of�empty defined� let hd � fun ! � raise Head�of�empty
� �h��t� � h��hd � �a list � �a � �fun hd !��Uncaught exception� Head�of�empty
Normally exceptions propagate out to the top� but they can be �caught insidean outer expression by using try ���with followed by a series of patterns tomatch exceptions� e�g�
let headstring sl �try hd slwith Head�of�empty � ��
� Failed s � �Failure because ��s��headstring � string list � string � �fun headstring �hi�� �there�!��� � string � �hi� headstring !��� � string � ��
It is a matter of opinion whether exceptions are really an imperative feature�On one view� functions just return elements of a disjoint sum consisting of theirvisible return type and the type of exceptions� and all operations implicitly passback exceptions� Another view is that exceptions are a highly non�local control�ow perversion� analogous to goto�� Whatever the semantic view one takes�exceptions can often be quite useful�
�Perhaps more precisely to C�s setjmp and longjmp�
��� CHAPTER �� EFFECTIVE ML
���� References and arrays
ML does have real assignable variables� and expressions can� as a side�e�ect�modify the values of these variables� They are explicitly accessed via referencespointers in C parlance� and the references themselves behave more like ordinaryML values� Actually this approach is quite common in C too� For example� if onewants so�called �variable parameters in C� where changes to the local variablespropagate outside� the only way to do it is to pass a pointer� so that the functioncan dereference it� Similar techniques are often used where the function is to passback composite data�
In ML� one sets up a new assignable memory cell with the initial contents x bywriting ref x� Initialization is compulsory�� This expression yields a referencepointer� to the cell� Subsequent access to the contents of the cell requires anexplicit dereference using the $ operator� similar to unary in C� The cell isassigned to using a conventional�looking assignment statement� For example�
let x � ref ���x � int ref � ref � #x��� � int � � x �� ���� � unit � �� #x��� � int � � x �� #x � #x��� � unit � �� x��� � int ref � ref � #x��� � int � �
Note that in most respects ref behaves like a type constructor� so one canpattern�match against it� Thus one could actually de�ne an indirection operatorlike $�
let contents�of �ref x� � x��contents�of � �a ref � �a � �fun contents�of x��� � int � �
As well as being mutable� references are sometimes useful for creating ex�plicitly shared data structures� One can easily create graph structures wherenumerous nodes contain a pointer to some single subgraph�
Apart from single cells� one can also use arrays in ML� In CAML these arecalled vectors� An array of elements of type � has type � vect� A fresh vectorof size n� with each element initialized to x � once again the initialization iscompulsory � is created using the following call�
���� IMPERATIVE FEATURES ���
make�vect n x��
One can then read element m of a vector v using�
vect�item v m��
and write value y to element m of v using�
vect�assign v m y��
These operations correspond to the expressions v�m� and v�m� � y in C� Theelements of an array are numbered from zero� For example�
let v � make�vect � ��v � int vect � �� � � � �! vect�item v ���� � int � vect�assign v � ���� � unit � �� v��� � int vect � �� �� � � �! vect�item v ���� � int � �
All reading and writing is constrained by bounds checking� e�g�
vect�item v ���Uncaught exception� Invalid�argument �vect�item�
���� Sequencing
There is no need for an explicit sequencing operation in ML� since the normalrules of evaluation allow one to impose an order� For example one can do�
let � � x �� #x � � inlet � � x �� #x � � inlet � � x �� #x � � inlet � � x �� #x � � in����
and the expressions are evaluated in the expected order� Here we use a specialpattern which throws away the value� but we could use a dummy variable nameinstead� Nevertheless� it is more attractive to use the conventional notation forsequencing� and this is possible in ML by using a single semicolon�
x �� #x � ��x �� #x � ��x �� #x � ��x �� #x � ���
��� CHAPTER �� EFFECTIVE ML
���� Interaction with the type system
While polymorphism works very well for the pure functional core of ML� it hasunfortunate interactions with some imperative features� For example� considerthe following�
let l � ref !��
Then l would seem to have polymorphic type � list ref � In accordance withthe usual rules of let�polymorphism we should be able to use it with two di�erenttypes� e�g� �rst
l �� �!��
and then
hd�#l� � true��
But this isn t reasonable� because we would actually be writing something asan object of type int then reading it as an object of type bool� Consequently�some restriction on the usual rule of let polymorphism is called for where ref�erences are concerned� There have been many attempts to arrive at a soundbut convenient restriction of the ML type system� some of them very compli�cated� Recently� di�erent versions of ML seem to be converging on a relativelysimple method� called the value restriction� due to Wright ������ and CAMLimplements this restriction� with a twist regarding toplevel bindings� Indeed� theabove sequence fails� But the intermediate behaviour is interesting� If we look atthe �rst line we see�
let l � ref !��l � ��a list ref � ref !
The underscore on the type variable indicates that l is not polymorphic inthe usual sense� rather� it has a single �xed type� although that type is as yetundetermined� The second line works �ne�
l �� �!��� � unit � ��
but if we now look at the type of l� we see that�
l��� � int list ref � ref �!
The pseudo�polymorphic type has now been �xed� Granted this� it is clearthat the last line must fail�
���� IMPERATIVE FEATURES ���
hd�#l� � true��Toplevel input�hd�#l� � true�� ����This expression has type bool�but is used with type int�
So far� this seems quite reasonable� but we haven t yet explained why the sameunderscored type variables oocur in apparently quite innocent purely functionalexpressions� and why� moreover� they often disappear on eta�expansion� e�g�
let I x � x��I � �a � �a � �fun I o I��it � ��a � ��a � �fun let I� � I o I in fun x � I� x��� � ��a � ��a � �fun fun x � �I o I� x��it � �a � �a � �fun
Other techniques for polymorphic references often rely on encoding in thetypes the fact that an expression may involve references� This seems natural�but it can lead to the types of functions becoming cluttered with this specialinformation� It is unattractive that the particular implementation of the function�e�g� imperative or functional� should be re�ected in its type�
Wright s solution� on the other hand� uses just the basic syntax of the ex�pression being let�bound� insisting that it is a so�called value before generalizingthe type� What is really wanted is knowledge of whether the expression maycause side�e�ects when evaluated� However since this is undecidable in general�the simple syntactic criterion of its being or not being a value is used� Roughlyspeaking� an expression is a value if it admits no further evaluation accordingto the ML rules � this is why an expression can often be made into a valueby performing a reverse eta conversion� Unfortunately this works against thetechniques for forcing evaluation�
Exercises
�� De�ne the C combinator as�
let C f x y � f y x��
What does the following function do�
�� CHAPTER �� EFFECTIVE ML
fun f l� l� � itlist �union o C map l� o f� l� !��
�� What does this function do� Write a more e�cient version�
let rec upto n � if n � � then ! else append �upto �n���� n!��
�� De�ne a function to calculate Fibonacci numbers as follows�
let rec fib �fun � � � � �� n � fib�n � �� � fib�n � ����
Why is this very ine�cient� Write a similar but better version�
�� �� Can you think of any uses for this exception or similar recursive ones�
exception Recurse of exn��
�� Implement a simple version of quicksort using arrays� The array is �rstpartitioned about some pivot� then two recursive calls sort the left andright portions� Which recursive calls are tail calls� What is the worst�casespace usage� How can this be improved dramatically by a simple change�
�� Prove that the two versions of rev that we have quoted� ine�cient ande�cient� always give the same answer�
Chapter �
Examples
As we have said� ML was originally designed as a metalanguage for a computertheorem prover� However it is suitable for plenty of other applications� mostlyfrom the same general �eld of �symbolic computation � In this chapter we will givesome examples of characteristic uses of ML� It is not necessary that the readershould understand every detail of the particular examples� e�g� the analyses ofreal number approximations given below� However it is worth trying out theseprograms and similar examples for yourself� and doing some of the exercises�There is no better way of getting a feel for how ML is actually used�
��� Symbolic di�erentiation
The phrase �symbolic computation is a bit vague� but roughly speaking it coversapplications where manipulation of mathematical expressions� in general con�taining variables� is emphasized at the expense of actual numerical calculation�There are several successful �computer algebra systems such as Axiom� Mapleand Mathematica� which can do certain symbolic operations that are useful inmathematics� e�g� factorizing polynomials and di�erentiating and integrating ex�pressions� They are also capable of purely numerical work�� We will illustratehow this sort of application can be tackled using ML�
We use symbolic di�erentiation as our example� because there is an algorithmto do it which is fairly simple and well known� The reader is probably familiarwith certain derivatives of basic operations� e�g d
dxsinx� � cosx�� as well as
results such as the Chain Rule and Product Rule for calculating the derivativesof composite expressions in terms of the derivatives of the components� Justas one can use these systematically in order to �nd derivatives� it is possible toprogram a fairly straightforward algorithm to do it in ML�
���
��� CHAPTER �� EXAMPLES
���� First order terms
We will allow mathematical expressions in a fairly general form� They may bebuilt up from variables and constants by the application of operators� Theseoperators may be unary� binary� ternary� or in general n�ary� We represent thisusing the following recursive datatype�
type term � Var of string� Const of string� Fn of string � �term list���
Type term defined�
For example� the expression�
sinx " y��cosx� expy��� ln� " x�
is represented by�
Fn����� Fn����� Fn��sin�� Fn����� Var �x�� Var �y�!�!��Fn��cos�� Fn����� Var �x�� Fn��exp�� Var �y�!�!�!�!��
Fn��ln�� Fn����� Const ���� Var �x�!�!�!���
���� Printing
Reading and writing expressions in their raw form is rather unpleasant� This is ageneral problem in all symbolic computation systems� and typically the system sinterface contains a parser and prettyprinter to allow respectively input and out�put in a readable form� A detailed discussion of parsing is postponed� since itis worth a section of its own� but we will now write a very simple printer forexpressions� so that at least we can see what s going on� We want the printer tosupport ordinary human conventions� i�e�
� Variables and constants are just written as their names�
� Ordinary n�ary functions applied to arguments are written by juxtaposingthe function and a bracketed list of arguments� e�g� fx�� � � � � xn��
� In�x binary functions like " are written in between their arguments�
� Brackets are used where necessary for disambiguation�
� In�x operators have a notion of precedence to reduce the need for brackets�
First we declare a list of in�xes� which is a list of pairs� each operator nametogether with its precedence�
let infixes � ������ ������ ������ �����!��
���� SYMBOLIC DIFFERENTIATION ���
It is common to use lists in this way to represent �nite partial functions� sinceit allows more �exibility� They are commonly called association lists� and arevery common in functional programming�� In order to convert the list into apartial function we use assoc�
let rec assoc a ��x�y���rest� � if a � x then y else assoc a rest��Toplevel input�let rec assoc a ��x�y���rest� � if a � x then y else assoc a rest�� ���������������������������������������������������������Warning� this matching is not exhaustive�assoc � �a � ��a � �b� list � �b � �fun
The compiler warns us that if the appropriate data is not found in the list�the function will fail� Now we can de�ne a function to get the in�x precedenceof a function�
let get�precedence s � assoc s infixes��get�precedence � string � int � �fun get�precedence ������ � int � � get�precedence ������ � int � � get�precedence �$���Uncaught exception� Match�failure ���� ����� ����
Note that if we ever change this list of in�xes� then any functions such asget precedence that use it need to be redeclared� This is the main reason whymany LISP programmers prefer dynamic binding� However we can make the setof in�xes arbitrarily extensible by making it into a reference and dereferencing iton each application�
let infixes � ref ������ ������ ������ �����!��infixes � �string � int� list ref �
ref ���� �� ���� �� ���� �� ���� �! let get�precedence s � assoc s �#infixes���get�precedence � string � int � �fun get�precedence �����Uncaught exception� Match�failure ���� ����� ���� infixes �� ����������#infixes���� � unit � �� get�precedence ������ � int � �
�For more heavyweight applications some alternative such as hash tables is much better but for simple applications where the lists don�t get too long this kind of linear list is simpleand adequate�
��� CHAPTER �� EXAMPLES
Note that by the ML evaluation rules� the dereferencing is only applied afterthe function get precedence receives its argument� and hence results in theset of in�xes at that time� We can also de�ne a function to decide whether afunction has any data associated with it� One way is simply to try the functionget precedence and see if it works�
let is�infix s �try get�precedence s� truewith � � false��
An alternative is to use a general function can that �nds out whether anotherfunction succeeds�
let can f x � try f x� true with � � false��can � ��a � �b� � �a � bool � �fun let is�infix � can get�precedence��is�infix � string � bool � �fun
We will use the following functions that we have already considered�
let hd�h��t� � h�� let tl�h��t� � t�� let rec length l � if l � ! then else � � length�tl l���
Without further ado� here is a function to convert a term into a string�
let rec string�of�term prec �fun �Var s� � s
� �Const c� � c� �Fn�f�args�� �
if length args � � " is�infix f thenlet prec� � get�precedence f inlet s� � string�of�term prec� �hd args�and s� � string�of�term prec� �hd�tl args�� inlet ss � s��� ��f�� ��s� inif prec� �� prec then ����ss���� else ss
elsef������string�of�terms args�����
and string�of�terms t �match t with
! � ��� t! � string�of�term t� �h��t� � �string�of�term h�������string�of�terms t���
The �rst argument prec indicates the precedence level of the operator of whichthe currently considered expression is an immediate subterm� Now if the currentexpression has an in�x binary operator at the top level� parentheses are needed
���� SYMBOLIC DIFFERENTIATION ���
if the precedence of this operator is lower� e�g� if we are considering the righthand subexpression of x � y " z�� Actually we use parentheses even when theyare equal� in order to make groupings clear� In the case of associative operatorslike " this doesn t matter� but we must distinguish x� y � z� and x� y�� z�A more sophisticated scheme would give each operator an associativity�� Thesecond� mutually recursive� function string of terms is used to print a comma�separated list of terms� as they occur for non�in�x function applications of theform ft�� � � � � tn�� Let us see this in action�
let t �Fn����� Fn����� Fn��sin�� Fn����� Var �x�� Var �y�!�!��
Fn��cos�� Fn����� Var �x��Fn��exp�� Var �y�!�!�!�!��
Fn��ln�� Fn����� Const ���� Var �x�!�!�!���t � term �Fn����� Fn
����� Fn ��sin�� Fn ����� Var �x�� Var �y�!�!��Fn ��cos�� Fn ����� Var �x�� Fn ��exp�� Var �y�!�!�!�!��
Fn ��ln�� Fn ����� Const ���� Var �x�!�!�!� string�of�term t��� � string � �sin�x � y� � cos�x � exp�y�� � ln�� � x��
In fact� we don t need to convert a term to a string ourselves� CAML Lightallows us to set up our own printer so that it is used to print anything of typeterm in the standard read�eval�print loop� This needs a few special commandsto ensure that we interact properly with the toplevel loop�
open �format��� let print�term s �
open�hvbox �print�string��%���string�of�term s���%���close�box����
print�term � term � unit � �fun install�printer �print�term���� � unit � ��
Now compare the e�ect�
let t �Fn����� Fn����� Fn��sin�� Fn����� Var �x�� Var �y�!�!��
Fn��cos�� Fn����� Var �x��Fn��exp�� Var �y�!�!�!�!��
Fn��ln�� Fn����� Const ���� Var �x�!�!�!���t � term � %sin�x � y� � cos�x � exp�y�� � ln�� � x�% let x � tx � term � %sin�x � y� � cos�x � exp�y�� � ln�� � x�%
��� CHAPTER �� EXAMPLES
Once the new printer is installed� it is used whenever CAML wants to printsomething of type term� even if it is a composite datastructure such as a pair�
�x�t���� � term � term �%sin�x � y� � cos�x � exp�y�� � ln�� � x�%�%sin�x � y� � cos�x � exp�y�� � ln�� � x�%
or a list�
x� t� x!��� � term list � %sin�x � y� � cos�x � exp�y�� � ln�� � x�%�%sin�x � y� � cos�x � exp�y�� � ln�� � x�%�%sin�x � y� � cos�x � exp�y�� � ln�� � x�%!
This printer is rather crude in that it will not break up large expressionsintelligently across lines� The CAML library format� from which we used a fewfunctions above� provides for a better approach� Instead of converting the terminto a string then printing it in one piece� we can make separate printing calls fordi�erent parts of it� and intersperse these with some special function calls thathelp the printing mechanism to understand where to break the lines� Many ofthe principles used in this and similar �prettyprinting engines are due to Oppen����� We will not consider this in detail here � see the CAML manual fordetails��
���� Derivatives
Now it is a fairly simple matter to de�ne the derivative function� First let usrecall the systematic method we are taught in school�
� If the expression is one of the standard functions applied to an argumentthat is the variable of di�erentiation� e�g� sinx�� return the known deriva�tive�
� If the expression is of the form fx� " gx� then apply the rule for sums�returning f �x� " g�x�� Likewise for subtraction etc�
� If the expression is of the form fx� � gx� then apply the product rule� i�e�return f �x� � gx� " fx� � g�x��
� If the expression is one of the standard functions applied to a compositeargument� say fgx�� then apply the Chain Rule and so give g�x��f �gx��
�There is also a tutorial written by Pierre Weis available online as part of the CAML page��http���pauillac�inria�fr�caml�FAQ�format�eng�html��
���� SYMBOLIC DIFFERENTIATION ���
This is nothing but a recursive algorithm� though that terminology is seldomused in schools� We can program it in ML very directly�
let rec differentiate x tm � match tm withVar y � if y � x then Const ��� else Const ��
� Const c � Const ��� Fn����� t!� � Fn����� differentiate x t!�� Fn����� t��t�!� � Fn����� differentiate x t��
differentiate x t�!�� Fn����� t��t�!� � Fn����� differentiate x t��
differentiate x t�!�� Fn����� t��t�!� �
Fn����� Fn����� differentiate x t�� t�!��Fn����� t�� differentiate x t�!�!�
� Fn��inv�� t!� � chain x t�Fn����� Fn��inv�� Fn����� t�Const ���!�!�!��
� Fn����� t�n!� � chain x t�Fn����� n� Fn����� t� Fn����� n� Const ���!�!�!��
� Fn��exp�� t!� � chain x t tm� Fn��ln�� t!� � chain x t �Fn��inv�� t!��� Fn��sin�� t!� � chain x t �Fn��cos�� t!��� Fn��cos�� t!� � chain x t
�Fn����� Fn��sin�� t!�!��� Fn����� t��t�!� � differentiate x
�Fn����� t�� Fn��inv�� t�!�!��� Fn��tan�� t!� � differentiate x
�Fn����� Fn��sin�� t!�� Fn��cos�� t!�!��and chain x t u � Fn����� differentiate x t� u!���
The chain function just avoids repeating the chain rule construction for manyoperations� Apart from that� we just systematically apply rules for sums� prod�ucts etc� and the known derivatives of standard functions� Of course� we couldadd other functions such as the hyperbolic functions and inverse trigonometricfunctions if desired� A couple of cases� namely division and tan� avoid a compli�cated de�nition by applying the di�erentiator to an alternative formulation�
���� Simpli�cation
If we try the di�erentiator out on our running example� then it works quite well�
��� CHAPTER �� EXAMPLES
t��� � term � %sin�x � y� � cos�x � exp�y�� � ln�� � x�% differentiate �x� t��� � term �%���� � � � cos�x � y�� � inv�cos�x � exp�y��� �sin�x � y� � ���� � � exp�y�� � ��sin�x � exp�y���� ���inv�cos�x � exp�y�� � ����� � � � �� � inv�� � x�%
differentiate �y� t��� � term �%��� � �� � cos�x � y�� � inv�cos�x � exp�y��� �sin�x � y� � ��� � � � exp�y�� � ��sin�x � exp�y���� ���inv�cos�x � exp�y�� � ����� � � � � � inv�� � x�%
differentiate �z� t��� � term �%��� � � � cos�x � y�� � inv�cos�x � exp�y��� �sin�x � y� � ��� � � exp�y�� � ��sin�x � exp�y���� ���inv�cos�x � exp�y�� � ����� � � � � � inv�� � x�%
However� it fails to make various obvious simpli�cations such as � � x � �and x " � � x� Some of these redundant expressions are created by the ratherunsubtle di�erentiation strategy which� for instance� applies the chain rule to ft�even where t is just the variable of di�erentiation� However� some would be hardto avoid without making the di�erentiator much more complicated� Accordingly�let us de�ne a separate simpli�cation function that gets rid of some of theseunnecessary expressions�
let simp �fun �Fn����� Const ��� t!�� � t� �Fn����� t� Const ��!�� � t� �Fn����� t� Const ��!�� � t� �Fn����� Const ��� t!�� � Fn����� t!�� �Fn����� t�� Fn����� t�!�!�� � Fn����� t�� t�!�� �Fn����� Const ��� t!�� � Const ��� �Fn����� t� Const ��!�� � Const ��� �Fn����� Const ���� t!�� � t� �Fn����� t� Const ���!�� � t� �Fn����� Fn����� t�!�� Fn����� t�!�!�� � Fn����� t�� t�!�� �Fn����� Fn����� t�!�� t�!�� � Fn����� Fn����� t�� t�!�!�� �Fn����� t�� Fn����� t�!�!�� � Fn����� Fn����� t�� t�!�!�� �Fn����� Fn����� t!�!�� � t� t � t��
This just applies the simpli�cation at the top level of the term� We need toexecute it in a bottom�up sweep� This will also bring negations upwards throughproducts� In some cases it would be more e�cient to simplify in a top�downsweep� e�g� � � t for a complicated expression t� but this would need repeatingfor terms like � " �� � �� The choice is rather like that between normal andapplicative order reduction in lambda calculus�
���� SYMBOLIC DIFFERENTIATION ���
let rec dsimp �fun �Fn�fn�args�� � simp�Fn�fn�map dsimp args��
� t � simp t��
Now we get better results�
dsimp�differentiate �x� t���� � term �%�cos�x � y� � inv�cos�x � exp�y��� �sin�x � y� � �sin�x � exp�y�� �inv�cos�x � exp�y�� � ���� � inv�� � x�%
dsimp�differentiate �y� t���� � term �%cos�x � y� � inv�cos�x � exp�y��� �sin�x � y� � ��exp�y� � sin�x � exp�y��� �inv�cos�x � exp�y�� � ���%
dsimp�differentiate �z� t���� � term � %%
In general� one can always add more and more sophisticated simpli�cation�For example� consider�
let t� � Fn��tan�� Var �x�!���t� � term � %tan�x�% differentiate �x� t���� � term �%�� � cos�x�� � inv�cos�x�� �sin�x� � ��� � ��sin�x��� � ��inv�cos�x� � ����%
dsimp�differentiate �x� t����� � term � %cos�x� � inv�cos�x�� �
sin�x� � �sin�x� � inv�cos�x� � ���%
We would like to simplify cos�x� inv�cos�x�� simply to � ignoring� asmost commercial computer algebra systems do� the possibility that cos�x� mightbe zero� We can certainly add new clauses to the simpli�er for that� Similarly�we might like to collect up terms in the second summand� However here we startto see that human psychology plays a role� Which of the following is preferable�It is hard for the machine to decide unaided�
sin�x� � �sin�x� � inv�cos�x� � ���
sin�x� � � � inv�cos�x� � ��
sin�x� � � � cos�x� � �
�sin�x� � cos�x�� � �
tan�x� � �
�� CHAPTER �� EXAMPLES
And having chosen the last one� should it then convert
� � tan�x� � �
into
sec�x� � �
Certainly� it is shorter and more conventional� However it relies on the factthat we know the de�nitions of these fairly obscure trig functions like sec� What�ever the machine decides to do� it is likely that some user will �nd it upsetting�
��� Parsing
A defect of our previous example was that the input had to be written explicitlyin terms of type constructors� Here we will show how to write a parser for thissituation� and make our code su�ciently general that it can easily be appliedto similar formal languages� Generally speaking� the problem of parsing is asfollows� A formal language is de�ned by a grammar� which is a set of productionrules for each syntactic category� For a language of terms with two in�x operators� and and only alphanumeric variables and numeral �� � etc�� constants� itmight look like this�
term �� name�termlist�
j name
j �term�
j numeral
j �term
j term � term
j term term
termlist �� term�termlist
j term
We will assume for the moment that we know what names and numeralsare� but we could have similar rules de�ning them in terms of single characters�Now this set of production rules gives us a way of generating the concrete� linearrepresentation for all terms� Note that name� numeral� term and termlist aremeant to be variables in the above� whereas elements like �� and �� written inbold type are actual characters� For example we have�
���� PARSING ���
term �� term term
�� term �term�
�� term �term � term�
�� term �term � term term�
�� numeral �name � name name�
so� following similar rules for name and numeral� we can generate for example�
� �x � y z�
The task of parsing is to reverse this process of applying production rules� i�e�to take a string and discover how it could have been generated by the productionrules for a syntactic category� Typically� the output is a parse tree� i�e� a treestructure that shows clearly the sequence of applications of the rules�
One problem with parsing is that the grammar may be ambiguous� i�e� a givenstring can often be generated in several di�erent ways� This is the case in ourexample above� for the string could also have been generated by�
term �� term term
�� term �term�
�� term �term term�
�� term �term � term term�
�� numeral �name � name name�
The obvious way to �x this is to specify rules of precedence and associativityfor in�x operators� However� we can make the grammar unambiguous withoutintroducing additional mechanisms at the cost of adding new syntactic categories�
atom �� name�termlist�
j name
j numeral
j �term�
j �atom
mulexp �� atom mulexp
j atom
term �� mulexp � term
j mulexp
termlist �� term�termlist
j term
��� CHAPTER �� EXAMPLES
Note that this makes both in�x operators right associative� For a more com�plicated example of this technique in action� look at the formal de�nition ofexpressions in the ANSI C Standard�
���� Recursive descent parsing
The production rules suggest a very simple way of going about parsing using aseries of mutually recursive functions� The idea is to have a function for eachsyntactic category� and a recursive structure that re�ects exactly the mutual re�cursion found in the grammar itself� For example� the procedure for parsingterms� say term will� on encountering a � symbol� make a recursive call to itselfto parse the subterm� and on encountering a name followed by an opening paren�thesis� will make a recursive call to termlist� This in itself will make at leastone recursive call to term� and so on� Such a style of programming is particularlynatural in a language like ML where recursion is the principal control mechanism�
In our ML implementation� we suppose that we have some input list of objectsof arbitrary type � for the parser to consume� These might simply be characters�but in fact there is usually a separate level of lexical analysis which collectscharacters together into tokens such as �x� � ��� and �%� � so by the time wereach this level the input is a list of tokens� We avoid committing ourselvesto any particular type� Similarly� we will allow the parser to construct objectsof arbitrary type � from its input� These might for example be parse trees asmembers of a recursive datatype� or might simply be numbers if the parser is totake an expression and evaluate it� Now in general� a parser might not consumeall its input� so we also make it output the remaining tokens� Therefore� a parserwill have type
��list� � � ��list
For example� when given the input characters �x � y� z the function atom
should process the characters �x � y� and leave the remaining characters z� Itmight return a parse tree for the processed expression using our earlier recursivetype� and hence we would have�
atom ��x � y� � z� � Fn����� Var �x�� Var �y�!���� z�
Since any call of the function atom must be from within the function mulexp�this second function will use the result of atom and deal with the input thatremains� making a second call to atom to process the expression �z �
���� Parser combinators
Another reason why this technique works particularly well in ML is that we cande�ne some useful combinators for plugging parsers together� In fact� by giving
���� PARSING ���
them in�x status� we can make the ML parser program look quite similar instructure to the original grammar de�ning the language�
First we declare an exception to be used where parsing fails� Then we de�nean in�x �� that applies two parsers in sequence� pairing up their results� and andin�x ## which tries �rst one parser� then the other� We then de�ne a many�foldversion of �� that repeats a parser as far as possible� putting the results into alist� Finally� the in�x �� is used to modify the results of a parser according to afunction�
The CAML rules for symbolic identi�ers such as �� make them in�x automat�ically� so we suppress this during the de�nition using the prefix keyword� Theirprecedences are also automatic� following from the usual precedence of their �rstcharacters as arithmetic operations etc� That is� �� is the strongest� �� falls inthe middle� and ## is the weakest� this is exactly what we want�
exception Noparse��
let prefix �� parser� parser� input �try parser� inputwith Noparse � parser� input��
let prefix �� parser� parser� input �let result��rest� � parser� input inlet result��rest� � parser� rest� in�result��result���rest���
let rec many parser input �try let result�next � parser input in
let results�rest � many parser next in�result��results��rest
with Noparse � !�input��
let prefix parser treatment input �let result�rest � parser input intreatment�result��rest��
We will� in what follows� use the following other functions� Most of these havebeen de�ned above� the main exception being explode which converts a stringinto a list of ��elements strings� It uses the inbuilt functions sub&string andstring&length� we do not describe them in detail but their function should beeasy to grasp from the example�
��� CHAPTER �� EXAMPLES
let rec itlist f �fun ! b � b� �h��t� b � f h �itlist f t b���
let uncurry f�x�y� � f x y��
let K x y � x��
let C f x y � f y x��
let o f g x � f�g x��� infix �o���
let explode s �let rec exap n l �
if n � then l elseexap �n � �� ��sub�string s n ����l� in
exap �string�length s � �� !��
In order to get started� we de�ne a few �atomic parsers� The function some
accepts any item satisfying a predicate and returns it� The function a is similar�but demands a particular item� and �nally finished just makes sure there areno items left to be parsed�
let some p �fun ! � raise Noparse� �h��t� � if p h then �h�t� else raise Noparse��
let a tok � some �fun item � item � tok���
let finished input �if input � ! then �input else raise Noparse��
���� Lexical analysis
Our parsing combinators can easily be used� in conjunction with a few simplecharacter discrimination functions� to construct a lexical analyzer for our lan�guage of terms� We de�ne a type of tokens or lexemes�� and the lexer translatesthe input string into a list of tokens� The �other cases cover symbolic identi�ers�they are all straightforward as they consist of just a single character� rather thancomposite ones like ��� �
���� PARSING ���
type token � Name of string � Num of string � Other of string��
let lex �let several p � many �some p� inlet lowercase�letter s � �a� �� s " s �� �z� inlet uppercase�letter s � �A� �� s " s �� �Z� inlet letter s � lowercase�letter s or uppercase�letter s inlet alpha s � letter s or s � ��� or s � ��� inlet digit s � �� �� s " s �� ��� inlet alphanum s � alpha s or digit s inlet space s � s � � � or s � �&n� or s � �&t� inlet collect�h�t� � h��itlist �prefix �� t ��� inlet rawname �
some alpha �� several alphanum �Name o collect� inlet rawnumeral �
some digit �� several digit �Num o collect� inlet rawother � some �K true� Other inlet token ��rawname �� rawnumeral �� rawother� �� several space fst in
let tokens � �several space �� many token� snd inlet alltokens � �tokens �� finished� fst infst o alltokens o explode��
For example�
lex �sin�x � y� � cos�� � x � y����� � token list � Name �sin�� Other ���� Name �x�� Other ���� Name �y�� Other ����Other ���� Name �cos�� Other ���� Num ���� Other ���� Name �x��Other ���� Name �y�� Other ���!
���� Parsing terms
Since we are now working at the level of tokens� we de�ne trivial parsers to accepttokens of only� a certain kind�
let name �fun �Name s��rest� � s�rest� � � raise Noparse��
let numeral �fun �Num s��rest� � s�rest� � � raise Noparse��
let other �fun �Other s��rest� � s�rest� � � raise Noparse��
Now we can de�ne a parser for terms� in a form very similar to the originalgrammar� The main di�erence is that each production rule has associated withit some sort of special action to take as a result of parsing�
��� CHAPTER �� EXAMPLES
let rec atom input� �name �� a �Other ���� �� termlist �� a �Other ����
�fun ���name����args���� � Fn�name�args���� name
�fun s � Var s��� numeral
�fun s � Const s��� a �Other ���� �� term �� a �Other ����
�snd o fst��� a �Other ���� �� atom
snd� inputand mulexp input
� �atom �� a�Other ���� �� mulexp �fun ��a����m� � Fn����� a�m!��
�� atom� inputand term input
� �mulexp �� a�Other ���� �� term �fun ��a����m� � Fn����� a�m!��
�� mulexp� inputand termlist input
� �term �� a �Other ���� �� termlist �fun ��h����t� � h��t�
�� term �fun h � h!�� input��
Let us package everything up as a single parsing function�
let parser � fst o �term �� finished fst� o lex��
To see it in action� we try with and without the printer see above� installed�
parser �sin�x � y� � cos�� � x � y����� � term �Fn����� Fn ��sin�� Fn ����� Var �x�� Var �y�!�!��Fn ��cos�� Fn ����� Fn ����� Const ���� Var �x�!��
Var �y�!�!�!� install�printer �print�term���� � unit � �� parser �sin�x � y� � cos�� � x � y����� � term � %sin�x � y� � cos�� � x � y�%
���� Automatic precedence parsing
In the above parser� we �hardwired a parser for the two in�x operators� Howeverit would be nicer� and consistent with our approach to printing� if changes to theset of in�xes could propagate to the parser� Moreover� even with just two di�er�ent binary operators� our production rules and consequent parser were already
���� PARSING ���
starting to look arti�cial� imagine if we had a dozen or so� What we would like isto automate the production of a layer of parsers� one for each operator� in orderof precedence� This can easily be done� First� we de�ne a general function thattakes a parser for what� at this level� are regarded as atomic expressions� andproduces a new parser for such expressions separated by a given operator� Notethat this could also be de�ned using parsing combinators� but the function belowis simple and more e�cient�
let rec binop op parser input �let atom��rest� as result � parser input inif not rest� � ! " hd rest� � Other op thenlet atom��rest� � binop op parser �tl rest�� inFn�op� atom�� atom�!��rest�
else result��
Now we de�ne functions that pick out the in�x operator in our associationlist with equal� lowest precedence� and delete it from the list� If we maintainedthe list already sorted in precedence order� we could simply take the head andtail of the list�
let findmin l � itlist�fun ���pr� as p�� ���pr� as p�� � if pr� �� pr� then p� else p���tl l� �hd l���
let rec delete x �h��t� � if h � x then t else h���delete x t���
Now we can de�ne the generic precedence parser�
let rec precedence ilist parser input �if ilist � ! then parser input elselet opp � findmin ilist inlet ilist� � delete opp ilist inbinop �fst opp� �precedence ilist� parser� input��
and hence can modify the main parser to be both simpler and more general�
��� CHAPTER �� EXAMPLES
let rec atom input� �name �� a �Other ���� �� termlist �� a �Other ����
�fun ���name����args���� � Fn�name�args���� name
�fun s � Var s��� numeral
�fun s � Const s��� a �Other ���� �� term �� a �Other ����
�snd o fst��� a �Other ���� �� atom
snd� inputand term input � precedence �#infixes� atom inputand termlist input
� �term �� a �Other ���� �� termlist �fun ��h����t� � h��t�
�� term �fun h � h!�� input��
let parser � fst o �term �� finished fst� o lex��
for example�
parser �� � sin�x��� � � � sin�y��� � ����� � term �Fn����� Fn ����� Const ���� Fn ����� Fn ��sin�� Var �x�!��
Const ���!�!��Fn����� Fn ����� Const ���� Fn ����� Fn ��sin�� Var �y�!��
Const ���!�!��Const ���!�!�
install�printer �print�term���� � unit � �� parser �� � sin�x��� � � � sin�y��� � ����� � term � %� � sin�x� � � � �� � sin�y� � � � ��%
���� Defects of our approach
Our approach to lexing and parsing is not particularly e�cient� In fact� CAMLand some other ML implementations include a generator for LR parsers similar toYACC Yet Another Compiler Compiler� which is often used to generate parsersin C under Unix� Not only are these more general in the grammars that theycan handle� but the resulting parsers are more e�cient� However we believethe present approach is rather clear� adequate for most purposes� and a goodintroduction to programming with higher order functions�
The ine�ciency of our approach can be reduced by avoiding a few particularlywasteful features� Note that if two di�erent productions for the same syntactic
���� PARSING ���
category have a common pre�x� then we should avoid parsing it more than once�unless it is guaranteed to be trivial� Our production rules for term have thisproperty�
term �� name�termlist�
j name
j � � �
We carefully put the longer production �rst in our actual implementation�otherwise success in reading a name would cause the abandonment of attempts toread a parenthesized list of arguments� Nevertheless� because of the redundancywe have mentioned� this way of constructing the parser is wasteful� In this caseit doesn t matter much� because the duplicated call only analyzes one token�However the case for termlist is more signi�cant�
let ���and termlist input
� �term �� a �Other ���� �� termlist �fun ��h����t� � h��t�
�� term �fun h � h!�� input��
Here we could reparse a whole term� which might be arbitrarily large� We canavoid this in several ways� One is to move the ## alternation to after the initialterm has been read� A convenient way to code it is to eschew explicit recursionin favour of many�
let ���and termlist input
� term �� many �a �Other ���� �� term snd� �fun �h�t� � h��t� input��
Therefore the �nal version of the parser is�
let rec atom input� �name �� a �Other ���� �� termlist �� a �Other ����
�fun ���name����args���� � Fn�name�args���� name
�fun s � Var s��� numeral
�fun s � Const s��� a �Other ���� �� term �� a �Other ����
�snd o fst��� a �Other ���� �� atom
snd� inputand term input � precedence �#infixes� atom inputand termlist input
� �term �� many �a �Other ���� �� term snd� �fun �h�t� � h��t�� input��
�� CHAPTER �� EXAMPLES
A general defect of our approach� and recursive descent parsing generally� isthat so�called left recursion in productions causes problems� This is where a givencategory expands to a sequence including the category itself as the �rst item� Forexample� if we had wanted to make the addition operator left�associative in ourearlier grammar� we could have used�
term �� term � mulexp
j mulexp
The naive transcription into ML would loop inde�nitely� calling term on thesame subterm over and over� There are various slightly ad hoc ways of dealingwith this problem� For example� one can convert this use of recursion� which insome ways is rather gratuitous� into the explicit use of repetition� We can do thisusing standard combinators like many to return a list of �mulexp s and then runover the list building the tree left�associated�
A �nal problem is that we do not handle errors very gracefully� We alwaysraise the exception Noparse� and this� when caught� initiates backtracking� How�ever this is not always appropriate for typical grammars� At best� it can causeexpensive reparsing� and can lead to ba+ing error indications at the top level�For example� on encountering � �� we expect a term following the �� and if wedon t �nd one� we should generate a comprehensible exception for the user� ratherthan raise Noparse and perhaps initiate several other pointless attempts to parsethe expression in a di�erent way�
��� Exact real arithmetic
Real arithmetic on computers is normally done via �oating point approximations�In general� we can only manipulate a real number� either ourselves or inside acomputer� via some sort of �nite representation� Some question how numbers canbe said to �exist if they have no �nite representation� For example� Kroneckeraccepted integers and rationals because they can be written down explicitly� andeven algebraic numbers� because they can be represented using the polynomialsof which they are solutions� However he rejected transcendental numbers becauseapparently they could not be represented �nitely� Allegedly he he greeted thefamous proof by Lindemann ��� that � is transcendental with the remark thatthis was �interesting� except that � does not exist �
However� given our modern perspective� we can say that after all many morenumbers than Kronecker would have accepted do have a �nite representation�namely the program� or in some more abstract sense the rule� used to calculate
�Algebraic numbers are those that are roots of polynomials with integer coe�cients e�g�p� and transcendental numbers are ones that aren�t�
���� EXACT REAL ARITHMETIC ���
them to greater and greater precision� For example� we can write a programthat will produce for any argument n the �rst n digits of �� Alternatively it canproduce a rational number r such that j� � rj ��n� Whatever approach istaken to the successive approximation of a real number� the key point is that itsrepresentation� the program itself� is �nite�
This works particularly nicely in a language like ML where higher order func�tions are available� What we have called a �program above will just be an MLfunction� We can actually represent the arithmetic operations on numbers ashigher order functions that� given functions for approximating x and y� will pro�duce new ones for approximating x " y� xy� sinx� and so on� for a wide rangeof functions� In an ordinary programming language� we would need to de�nea concrete representation for programs� e�g� �G&odel numbering � and write aninterpreter for this representation��
���� Representation of real numbers
Our approach is to represent a real x by a function fx � N � Z that for eachn � N returns an approximation of x to within ��n� appropriately scaled� In fact�we have�
jfxn�� �nxj �
This is of course equivalent to jfx n��n
� xj ��n
� We could return a rationalapproximation directly� but it would still be convenient for the rationals to havedenominators that are all powers of some number� so that common denominatorsarising during addition don t get too big� Performing the scaling directly seemssimpler� requiring only integer arithmetic� Our choice of the base � is largelyarbitrary� A lower base is normally advantageous because it minimizes the gran�ularity of the achievable accuracies� For example� if we used base ��� then evenif we only need to increase the accuracy of an approximation slightly� we have tostep up the requested accuracy by a factor of ���
���� Arbitrary�precision integers
CAML s standard integers type int� have a severely limited range� so �rst of allwe need to set up a type of unlimited�precision integers� This is not so di�cultto program� but is slightly tedious� Fortunately� the CAML release includes alibrary that gives a fast implementation of arbitrary precision integer and in factrational� arithmetic� A version of CAML Light with this library pre�loaded isinstalled on Thor� Assuming you have used the following path setting�
�E�ectively we are manipulating functions �in extension� rather than �in intension� i�e� notconcerning ourselves with their representation�
��� CHAPTER �� EXAMPLES
PATH���PATH��home�jrh���caml�bin�export PATH
then the augmented version of CAML Light can be �red up using�
� camllight my�little�caml
When the system returns its prompt� use the �open directive to make thelibrary functions available�
open �num���
This library sets up a new type num of arbitrary precision rational numbers�we will just use the integer subset� CAML does not provide overloading� so it isnecessary to use di�erent symbols for the usual arithmetic operations over num�
Note that small constants of type num must be written as Int k rather thansimply k� In fact Int is a type constructor for num used in the case when thenumber is a machine�representable integer� Larger ones use Big int�
The unary negation on type num is written minus num� Another handy unaryoperator is abs num� which �nds absolute values� i�e� given x returns jxj�
The usual binary operations are also available� The truncating� divisionand modulus function� called quo num and mod num� do not have in�x status�However most of the other binary operators are in�xes� and the names are derivedfrom the usual ones by adding a slash �� � It is important to realize that ingeneral� one must use �� to compare numbers for equality� This is because theconstructors of the type num are� as always� distinct by de�nition� yet in termsof the underlying meaning they may overlap� For example we get di�erent resultfrom using truncating division and true division on ��� and �� even though theyare numerically the same� One uses type constructor Int and the other Ratio�
�Int � ��� Int �� �� Int � � quo�num �Int � ��� Int �� �Int ����it � bool � false �Int � ��� Int �� �� Int � �� quo�num �Int � ��� Int �� �Int ����it � bool � true
Here is a full list of the main in�x binary operators�
Operator Type Meaning � num �� num �� num Exponentiation � num �� num �� num Multiplication�� num �� num �� num Addition�� num �� num �� num Subtraction�� num �� num �� bool Equality��� num �� num �� bool Inequality�� num �� num �� bool Less than��� num �� num �� bool Less than or equal�� num �� num �� bool Greater than��� num �� num �� bool Greater than or equal
���� EXACT REAL ARITHMETIC ���
Let us see some further examples�
Int � �� Int ����it � num � Int � Int � ��� Int ���it � num � Big�int �abstr �Int � ��� Int �� �� Int ���it � num � Ratio �abstr quo�num �Int � ��� Int �� �Int ����it � num � Int ��������
Note that large numbers are not printed� However we can always convert oneto a string using string of num�
string�of�num�Int � �� Int ������ � string � ����������������������������������������������
This also has an inverse� called naturally enough num of string�
���� Basic operations
Recall that our real numbers are supposed to be represented by� functions Z� Z�In ML we will actually use int �� num� since the inbuilt type of integers ismore than adequate for indexing the level of accuracy� Now we can de�ne someoperations on reals� The most basic operation� which gets us started� is to producethe real number corresponding to an integer� This is easy�
let real�of�int k n � �Int � ��� Int n� �� Int k��real�of�int � int � int � num � �fun real�of�int ����� � int � num � �fun
It is obvious that for any k this obeys the desired approximation criterion�
jfkn�� �nkj � j�nk � �nkj � � �
Now we can de�ne the �rst nontrivial operation� that of unary negation�
let real�neg f n � minus�num�f n���
The compiler generalizes the type more than intended� but this will not troubleus� It is almost as easy to see that the approximation criterion is preserved� Ifwe know that for each n�
jfxn�� �nxj �
then we have for any n�
��� CHAPTER �� EXAMPLES
jf�xn�� �n�x�j � j � fxn�� �n�x�j
� j � fxn�� �nx�j
� jfxn�� �nxj
�
Similarly� we can de�ne an �absolute value function on real numbers� usingthe corresponding function abs num on numbers�
let real�abs f n � abs�num �f n���
The correctness of this is again straightforward� using the fact that jjxj�jyjj �jx� yj� Now consider the problem of adding two real numbers� Suppose x and yare represented by fx and fy respectively� We could de�ne�
fxyn� � fxn� " fyn�
However this gives no guarantee that the approximation criterion is main�tained� we would have�
jfxyn�� �nx " y�j � jfxn� " fyn�� �nx " y�j
� jfxn�� �nxj" jfyn�� �nyj
We can guarantee that the sum on the right is less than �� but not that it isless than � as required� Therefore� we need in this case to evaluate x and y togreater accuracy than required in the answer� Suppose we de�ne�
fxyn� � fxn " �� " fyn " �����
Now we have�
jfxyn�� �nx " y�j � jfxn " �� " fyn " ������ �nx " y�j
� jfxn " ����� �nxj " jfyn " ����� �nyj
��
�jfxn " ��� �n�xj"
�
�jfyn " ��� �n�yj
�
�� "
�
�� � �
Apparently this just gives the accuracy required� However we have implicitlyused real mathematical division above� Since the function is supposed to yield aninteger� we are obliged to round the quotient to an integer� If we just use quo num�the error from rounding this might be almost �� after which we could never
���� EXACT REAL ARITHMETIC ���
guarantee the bound we want� however accurately we evaluate the arguments�However with a bit more care we can de�ne a division function that alwaysreturns the integer closest to the true result or one of them in the case of twoequally close ones�� so that the rounding error never exceeds �
�� This could be
done directly in integer arithmetic� but the most straightforward coding is to userational division followed by rounding to the nearest integer� as there are built�infunctions for both these operations�
let ndiv x y � round�num�x �� y���ndiv � num � num � num � �fun infix �ndiv��� �Int ��� ndiv �Int ����� � num � Int � �Int ��� ndiv �Int ����� � num � Int � �Int������ ndiv �Int ����� � num � Int �� �Int����� ndiv �Int ����� � num � Int ��
Now if we de�ne�
fxyn� � fxn " �� " fyn " ��� ndiv �
everything works�
jfxyn�� �nx " y�j � jfxn " �� " fyn " ��� ndiv ��� �nx " y�j
��
�" jfxn " �� " fyn " ������ �nx " y�j
��
�"
�
�jfxn " �� " fyn " ���� �n�x " y�j
��
�"
�
�jfxn " ��� �n�xj"
�
�jfyn " ��� �n�yj
�
�"
�
�� "
�
��
� �
Accordingly we make our de�nition�
let real�add f g n ��f�n � �� �� g�n � ��� ndiv �Int ����
We can de�ne subtraction similarly� but the simplest approach is to build itout of the functions that we already have�
let real�sub f g � real�add f �real�neg g���real�sub � �num � num� � �num � num� � num � num � �fun
��� CHAPTER �� EXAMPLES
It is a bit trickier to de�ne multiplication� inverses and division� Howeverthe cases where we multiply or divide by an integer are much easier and quitecommon� It is worthwhile implementing these separately as they can be mademore e�cient� We de�ne�
fmxn� � mfxn " p " ��� ndiv �p�
where p is chosen so that �p � jmj� For correctness� we have�
jfmxn�� �nmx�j ��
�" j
mfxn " p " ��
�p�� �nmx�j
��
�"jmj
�p�jfxn " p " ��� �np�xj
�
�"jmj
�p�
��
�"
�
�
jmj
�p
��
�"
�
�� �
In order to implement this� we need a function to �nd the appropriate p� Thefollowing is crude but adequate�
let log� �let rec log� x y �if x �� Int � then yelse log� �quo�num x �Int ��� �y � �� in
fun x � log� �x �� Int �� ��
The implementation is simply�
let real�intmul m x n �let p � log� �abs�num m� inlet p� � p � � in�m �� x�n � p��� ndiv �Int � ��� Int p����
For division by an integer� we de�ne�
fx�mn� � fxn� ndiv m
For correctness� we can ignore the trivial cases when m � �� which shouldnever be used� and when m � ��� since then the result is exact� Otherwise� weassume jfxn�� �nxj �� so jfxn��m� �nx�mj �
jmj� �
�� which� together with
the fact that jfxn� ndiv m� fxn��mj � ��� yields the result� So we have�
let real�intdiv m x n �x�n� ndiv �Int m���
���� EXACT REAL ARITHMETIC ���
���� General multiplication
General multiplication is harder because the error in approximation for one num�ber is multiplied by the magnitude of the second number� Therefore� beforethe �nal accuracies are calculated� a preliminary evaluation of the arguments isrequired to determine their approximate magnitude� We proceed as follows� Sup�pose that we want to evaluate x " y to precision n� First we choose r and s sothat jr � sj � � and r " s � n " �� That is� both r and s are slightly more thanhalf the required precision� We now evaluate fxr� and fys�� and select naturalnumbers p and q that are the corresponding �binary logarithms � i�e� jfxr�j � �p
and jfys�j � �q� If both p and q are zero� then it is easy to see that we mayreturn just �� Otherwise� remember that either p � � or q � � as we will needthis later� Now set�
k � n " q � s " � � q " r " �
l � n " p� r " � � p " s " �
m � k " l�� n � p " q " �
We claim that fxyn� � fxk�fyl�� ndiv �m has the right error behaviour�i�e� jfxyn�� �nxy�j �� If we write�
�kx � fxk� " �
�ly � fyl� " �
with j�j � and j�j �� we have�
jfxyn�� �nxy�j ��
�" j
fxk�fyl�
�m� �nxy�j
��
�" ��mjfxk�fyl�� �klxyj
��
�" ��mjfxk�fyl�� fxk� " ��fyl� " ��j
��
�" ��mj�fyl� " �fxk� " ��j
��
�" ��mj�fyl�j" j�fxk�j" j��j�
��
�" ��mjfyl�j" jfxk�j" j��j�
�
�" ��mjfyl�j" jfxk�j" ��
Now we have jfxr�j � �p� so j�rxj �p " �� Thus j�kxj �q��p " ���so jfxk�j �q��p " �� " �� i�e� jfxk�j � �q��p " ��� Similarly jfyl�j ��p��q " ��� Consequently�
��� CHAPTER �� EXAMPLES
jfyl�j" jfxk�j" � � �p��q " �� " �q��p " �� " �
� �pq� " �p� " �pq� " �q� " �
� �pq� " �p� " �q� " �
Now for our error bound we require jfyl�j " jfxk�j " � � �m��� or dividingby � and using the discreteness of the integers�
�pq� " �p " �q �pq�
We can write this as �pq " �p� " �pq " �q� �pq� " �pq�� which is truebecause we have either p � � or q � �� So at last we are justi�ed in de�ning�
let real�mul x y n �let n� � n � � inlet r � n� � � inlet s � n� � r inlet xr � x�r�and ys � y�s� inlet p � log� xrand q � log� ys inif p � " q � then Int elselet k � q � r � �and l � p � s � �and m � p � q � � in�x�k� �� y�l�� ndiv �Int � ��� Int m���
���� Multiplicative inverse
Next we will de�ne the multiplicative inverse function� In order to get any sortof upper bound on this� let alone a good approximation� we need to get a lowerbound for the argument� In general� there is no better way than to keep evaluatingthe argument with greater and greater accuracy until we can bound it away fromzero� We will use the following lemma to justify the correctness of our procedure�
Lemma �� If �e � n " k " �� jfxk�j � �e and jfxk�� �kxj �� where fxk�is an integer and e� n and k are natural numbers� then if we de�ne
fyn� � �nk ndiv fxk�
we have jfyn�� �nx��j �� i�e� the required bound�Proof� The proof is rather tedious and will not be given in full� We just sketchthe necessary case splits� If jfxk�j � �e then a straightforward analysis gives theresult� the rounding in ndiv gives an error of at most �
�� and the remaining error
is ��� If jfxk�j � �e but n " k � e� then although the second component of the
���� EXACT REAL ARITHMETIC ���
error may now be twice as much� i�e� �� there is no rounding error becausefxk� � ��e divides into �nk exactly� �We use here the fact that �e � � � �e���because since �e � n " k " �� e cannot be zero�� Finally� if jfxk�j � �e andn" k e� we have jfyn�� �n �
xj � because both jfyn�j � � and � j�n �
xj ��
and both these numbers have the same sign� Q�E�D�
Now suppose we wish to �nd the inverse of x to accuracy n� First we evaluatefx��� There are two cases to distinguish�
�� If jfx��j � �r for some natural number r� then choose the least naturalnumber k which may well be zero� such that �r " k � n " �� and sete � r " k� We now return �nk ndiv fxk�� It is easy to see that theconditions of the lemma are satis�ed� Since jfx��j � �r " � we havejxj � �r� and so j�kxj � �rk� This means jfxk�j � �rk � �� and asfxk� is an integer� jfxk�j � �rk � �e as required� The condition that�e � n � k " � is easy to check� Note that if r � n we can immediatelydeduce that fyn� � � is a valid approximation�
�� If jfx��j � �� then we call the function �msd that returns the least p suchthat jfxp�j � �� Note that this may in general fail to terminate if x � ��Now we set e � n" p" � and k � e" p� and return �nk ndiv fxk�� Onceagain the conditions for the lemma are satis�ed� Since jfxp�j � �� we havej�pxj � �� i�e� jxj � �
�p� Hence j�kxj � �k�p � �e� and so jfxk�j � �e � ��
i�e� jfxk�j � �e�
To implement this� we �rst de�ne the msd function�
let msd �let rec msd n x �if abs�num�x�n�� � Int � then n else msd �n � �� x in
msd ��
and then translate the above mathematics into a simple program�
let real�inv x n �let x � x�� inlet k �if x � Int � then
let r � log� x � � inlet k � n � � � � � r inif k � then else k
elselet p � msd x inn � � � p � � in
�Int � ��� Int �n � k�� ndiv �x�k����
Now� of course� it is straightforward to de�ne division�
let real�div x y � real�mul x �real�inv y���
�� CHAPTER �� EXAMPLES
���� Ordering relations
The interesting things about all these functions is that they are in fact uncom�putable in general� The essential point is that it is impossible to decide whethera given number is zero� We can keep approximating it to greater and greateraccuracy� but if the approximations always return �� we cannot tell whether it isgoing to do so inde�nitely or whether the next step will give a nonzero answer��
If x is not zero� then the search will eventually terminate� but if x is zero� it willrun forever�
Accepting this� it is not di�cult to de�ne the relational operators� To decidethe ordering relation of x and y it su�ces to �nd an n such that jxn � ynj � ��For example� if xn � yn " � we have
�nx � xn � � � yn " � � �ny
and so x � y� We �rst write a general procedure to perform this search� and thende�ne all the orderings in terms of it� Note that the only way to arrive at there�exive orderings is to justify the irre�exive version!
let separate �let rec separate n x y �let d � x�n� �� y�n� inif abs�num�d� � Int � then delse separate �n � �� x y in
separate ��
let real�gt x y � separate x y � Int ��let real�ge x y � real�gt x y��let real�lt x y � separate x y �� Int ��let real�le x y � real�lt x y��
���� Caching
In order to try out all the above functions� we would like to be able to printout a decimal representation of some approximation to a real� This is not hard�using some more standard facilities in the CAML library� If d decimal places aredesired� then we make n such that �n � ��d and hence the accuracy correspondsat least to the number of digits printed�
let view x d �let n � � � d inlet out � x�n� �� �Int � ��� Int n� inapprox�num�fix d out��
�For a proof that it is uncomputable simply wrap up the halting problem by de�ningf�n � � if a certain Turing machine has halted after no more than n iterations and f�n � �otherwise�
���� EXACT REAL ARITHMETIC ���
Now we can test out some simple examples� which seem to work�
let x � real�of�int ���x � int � num � �fun let xi � real�inv x��xi � int � num � �fun let wun � real�mul x xi��wun � int � num � �fun view x ���it � string � ���� view xi ���it � string � ����������������������� view wun ���it � string � ����
However there is a subtle and serious bug in our implementation� which showsitself if we try larger expressions� The problem is that subexpressions can beevaluated many times� Most obviously� this happens if they occur several timesin the input expression� But even if they don t� the multiplication� and even moreso� the inverse� require trial evaluations at di�ering accuracies� As the evaluationpropagates down to the bottom level� there is often an exponential buildup ofreevaluations� For example� the following is slow � it takes several seconds�
let x� � real�of�int � inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inview x� ���� � string � �����
We can �x this problem using the idea of caching or memo functions Michie����� We give each function a reference cell to remember the most accurateversion already calculated� If there is a second request for the same accuracy� itcan be returned immediately without further evaluation� What s more� we canalways construct a lower�level approximation� say n� from a higher one� say n"kwith k � �� If we know that jfxn " k�� �nkxj � then we have�
jfxn " k� ndiv �k � �nxj ��
�" j
fxn " k�
�k� �nxj
��
�"
�
�kjfxn " k�� �nkxj
�
�"
�
�k
� �
��� CHAPTER �� EXAMPLES
Hence we are always safe in returning fxn " k� ndiv �k� We can implementthis memo technique via a generic function memo to be inserted in each of theprevious functions�
let real�of�int k � memo �fun n � �Int � ��� Int n� �� Int k���
let real�neg f � memo �fun n � minus�num�f n����
let real�abs f � memo �fun n � abs�num �f n����
let real�add f g � memo �fun n ��f�n � �� �� g�n � ��� ndiv �Int �����
let real�sub f g � real�add f �real�neg g���
let real�intmul m x � memo �fun n �let p � log� �abs�num m� inlet p� � p � � in�m �� x�n � p��� ndiv �Int � ��� Int p�����
let real�intdiv m x � memo �fun n �x�n� ndiv �Int m����
let real�mul x y � memo �fun n �let n� � n � � inlet r � n� � � inlet s � n� � r inlet xr � x�r�and ys � y�s� inlet p � log� xrand q � log� ys inif p � " q � then Int elselet k � q � r � �and l � p � s � �and m � p � q � � in�x�k� �� y�l�� ndiv �Int � ��� Int m����
let real�inv x � memo �fun n �let x � x�� inlet k �if x � Int � then
let r � log� x � � inlet k � n � � � � � r inif k � then else k
elselet p � msd x inn � � � p � � in
�Int � ��� Int �n � k�� ndiv �x�k�����
let real�div x y � real�mul x �real�inv y���
where
���� PROLOG AND THEOREM PROVING ���
let memo f �let mem � ref ����Int � infun n � let �m�res� � #mem in
if n �� m thenif m � n then reselse res ndiv �Int � ��� Int�m � n��
elselet res � f n inmem �� �n�res�� res��
Now the above sequence of multiplications is instantaneous� Here are a fewmore examples�
let pi� � real�div �real�of�int ��� �real�of�int ����pi� � int � num � �fun view pi� ���it � string � ��������������� let pi� � real�div �real�of�int ���� �real�of�int ������pi� � int � num � �fun view pi� ���it � string � �������������� let pidiff � real�sub pi� pi���pidiff � int � num � �fun view pidiff ���it � string � �������������������� let ipidiff � real�inv pidiff��ipidiff � int � num � �fun view ipidiff ���it � string � �������
Of course� everything we have done to date could be done with rational arith�metic� Actually� it may happen that our approach is more e�cient in certainsituations since we avoid calculating numbers that might have immense numer�ators and denominators when we just need an approximation� Nevertheless� thepresent approach really comes into its own when we want to de�ne transcendentalfunctions like exp� sin etc� We will not cover this in detail for lack of space� butit makes an interesting exercise� One approach is to use truncated Taylor series�Note that �nite summations can be evaluated directly rather than by iteratingthe addition function� this leads to much better error behaviour�
��� Prolog and theorem proving
The language Prolog is popular in Arti�cial Intelligence research� and is used invarious practical applications such as expert systems and intelligent databases�Here we will show how the main mechanism of Prolog� namely depth��rst searchthrough a database of rules using uni�cation and backtracking� can be imple�mented in ML� We do not pretend that this is a full�blown implementation of
��� CHAPTER �� EXAMPLES
Prolog� but it gives an accurate picture of the general �avour of the language�and we will be able to run some simple examples in our system�
���� Prolog terms
Prolog code and data is represented using a uniform system of �rst order terms�We have already de�ned a type of terms for our mathematical expressions� andassociated parsers and printers� Here we will use something similar� but not quitethe same� First of all� it simpli�es some of the code if we treat constants as nullaryfunctions� i�e� functions that take an empty list of arguments� Accordingly wede�ne�
type term � Var of string� Fn of string � �term list���
Where we would formerly have used Const s� we will now use Fn�s����� Notethat we will treat functions of di�erent arities di�erent numbers of arguments�as distinct� even if they have the same name� Thus there is no danger of ourconfusing constants with true functions�
���� Lexical analysis
In order to follow Prolog conventions� which include case sensitivity� we alsomodify lexical analysis� We will not attempt to conform exactly to the full detailsof Prolog� but one feature is very important� alphanumeric identi�ers that beginwith an upper case letter or an underscore are treated as variables� while otheralphanumeric identi�ers� along with numerals� are treated as constants� Forexample X and Answer are variables while x and john are constants� We willlump all symbolic identi�ers together as constants too� but we will distinguishthe punctuation symbols� left and right brackets� commas and semicolons� Non�punctuation symbols are collected together into the longest strings possible� sosymbolic identi�ers need not consist only of one character�
type token � Variable of string� Constant of string� Punct of string��
The lexer therefore looks like this�
���� PROLOG AND THEOREM PROVING ���
let lex �let several p � many �some p� inlet collect�h�t� � h��itlist �prefix �� t ��� inlet upper�alpha s � �A� �� s " s �� �Z� or s � ���and lower�alpha s � �a� �� s " s �� �z� or �� �� s " s �� ���and punct s � s � ��� or s � ��� or s � � � or s � �!�
or s � ��� or s � ���and space s � s � � � or s � �&n� or s � �&t� inlet alphanumeric s � upper�alpha s or lower�alpha s inlet symbolic s � not space s " not alphanumeric s " not punct s inlet rawvariable �some upper�alpha �� several alphanumeric �Variable o collect�
and rawconstant ��some lower�alpha �� several alphanumeric ��some symbolic �� several symbolic� �Constant o collect�
and rawpunct � some punct Punct inlet token ��rawvariable �� rawconstant �� rawpunct� ��several space fst in
let tokens � �several space �� many token� snd inlet alltokens � �tokens �� finished� fst infst o alltokens o explode��
For example�
lex �add�X�Y�Z� �� X is Y�Z����� � token list � Constant �add�� Punct ���� Variable �X�� Punct ����Variable �Y�� Punct ���� Variable �Z�� Punct ����Constant ����� Variable �X�� Constant �is�� Variable �Y��Constant ���� Variable �Z�� Punct ���!
���� Parsing
The basic parser is pretty much the same as before� the printer is exactly thesame� The main modi�cation to the printer is that we allow Prolog lists to bewritten in a more convenient notation� Prolog has �� and �nil corresponding toML s ��� and ��� � and we set up the parser so that lists can be written muchas in ML� e�g� ������� � We also allow the Prolog notation ��H#T� instead of�cons�H�T� � typically used for pattern�matching� After the basic functions�
let variable �fun �Variable s��rest� � s�rest� � � raise Noparse��
let constant �fun �Constant s��rest� � s�rest� � � raise Noparse��
��� CHAPTER �� EXAMPLES
we have a parser for terms and also for Prolog rules� of these forms�
term�
term �� term� � � � � term�
The parsers are as follows�
let rec atom input� �constant �� a �Punct ���� �� termlist �� a �Punct ����
�fun ���name����args���� � Fn�name�args���� constant
�fun s � Fn�s� !���� variable
�fun s � Var s��� a �Punct ���� �� term �� a �Punct ����
�snd o fst��� a �Punct � �� �� list
snd� inputand term input � precedence �#infixes� atom inputand termlist input
� �term �� a �Punct ���� �� termlist �fun ��h����t� � h��t�
�� term �fun h � h!�� input
and list input� �term �� �a �Constant ���� �� term �� a �Punct �!��
�snd o fst��� a �Punct ���� �� list
snd�� a �Punct �!��
�K �Fn�� !�� !���� �fun �h�t� � Fn����� h� t!��
�� a �Punct �!�� �K �Fn�� !�� !���� input
and rule input� �term �� �a �Punct ����
�K !��� a �Constant ����� �� term ��
many �a �Punct ���� �� term snd� ��a �Punct ����
�fun �����h��t���� � h��t��� input��
let parse�term � fst o �term �� finished fst� o lex��
let parse�rules � fst o �many rule �� finished fst� o lex��
���� Uni�cation
Prolog uses a set of rules to solve a current goal by trying to match one of the rulesagainst the goal� A rule consisting of a single term can solve a goal immediately�
���� PROLOG AND THEOREM PROVING ���
In the case of a rule term �� term�� � � � � termn�� if the goal matches term� thenProlog needs to solve each termi as a subgoal in order to �nish the original goal�
However� goals and rules do not have to be exactly the same� Instead� Prologassigns variables in both the goal and the rule to make them match up� a processknown as uni�cation� This means that we can end up proving a special case ofthe original goal� e�g� P fX�� instead of P Y �� For example�
� To unify fgX�� Y � and fga�� X� we can set X � a and Y � a� Thenboth terms are fga�� a��
� To unify fa�X� Y � and fX� a� Z� we can set X � a and Y � Z� and thenboth terms are fa� a� Z��
� It is impossible to unify fX� and X�
In general� uni�ers are not unique� For example� in the second example wecould choose to set Y � fb� and Z � fb�� However one can always choose onethat is most general� i�e� any other uni�cation can be reached from it by furtherinstantiation compare most general types in ML�� In order to �nd it� roughlyspeaking� one need only descend the two terms recursively in parallel� and on�nding a variable on either side� assigns it to whatever the term on the otherside is� One also needs to check that the variable hasn t already been assignedto something else� and that it doesn t occur in the term being assigned to it asin the last example above�� A simple implementation of this idea follows� Wemaintain an association list giving instantiations already made� and we look eachvariable up to see if it is already assigned before proceeding� We use the existinginstantiations as an accumulator�
let rec unify tm� tm� insts �match tm� withVar�x� �
�try let tm�� � assoc x insts inunify tm�� tm� insts
with Not�found �augment �x�tm�� insts�
� Fn�f��args�� �match tm� withVar�y� ��try let tm�� � assoc y insts in
unify tm� tm�� instswith Not�found �
augment �y�tm�� insts�� Fn�f��args�� �
if f� � f�then itlist� unify args� args� instselse raise �error �functions do not match����
��� CHAPTER �� EXAMPLES
where the instantiation lists need to be augmented with some care to avoid theso�called �occurs check problem� We must disallow instantiation of X to a non�trivial term involving X� as in the third example above� Most real Prologs ignorethis� either for claimed� e�ciency reasons or in order to allow weird cyclic datas�tructures instead of simple �rst order terms�
let rec occurs�in x �fun �Var y� � x � y� �Fn���args�� � exists �occurs�in x� args��
let rec subst insts � fun�Var y� � �try assoc y insts with Not�found � tm�
� �Fn�f�args�� � Fn�f�map �subst insts� args���
let raw�augment �let augment� theta �x�s� �let s� � subst theta s inif occurs�in x s " not�s � Var�x��then raise �error �Occurs check��else �x�s�� in
fun p insts � p���map �augment� p!� insts���
let augment �v�t� insts �let t� � subst insts t in match t� withVar�w� � if w �� v then
if w � v then instselse raw�augment �v�t�� insts
else raw�augment �w�Var�v�� insts� � � if occurs�in v t�
then raise �error �Occurs check��else raw�augment �v�t�� insts��
���� Backtracking
Prolog proceeds by depth��rst search� but it may backtrack� even if a rule uni��es successfully� if all the remaining goals cannot be solved under the resultinginstantiations� then another initial rule is tried� Thus we consider the whole listof goals rather than one at a time� to give the right control strategy�
���� PROLOG AND THEOREM PROVING ���
let rec first f �fun ! � raise �error �No rules applicable��� �h��t� � try f h with error � � first f t��
let rec expand n rules insts goals �first �fun rule �if goals � ! then insts elselet conc�asms �
rename�rule �string�of�int n� rule inlet insts� � unify conc �hd goals� insts inlet local�global � partition
�fun �v��� � occurs�in v conc orexists �occurs�in v� asms� insts� in
let goals� � �map �subst local� asms� '�tl goals� in
expand �n � �� rules global goals�� rules��
Here we use a function �rename to generate fresh variables for the rules eachtime�
let rec rename s �fun �Var v� � Var��(��v�s�� �Fn�f�args�� � Fn�f�map �rename s� args���
let rename�rule s �conc�asms� ��rename s conc�map �rename s� asms���
Finally� we can package everything together in a function prolog that triesto solve a given goal using the given rules�
type outcome � No � Yes of �string � term� list��
let prolog rules goal �try let insts � expand rules ! goal! in
Yes�filter �fun �v��� � occurs�in v goal�insts�
with error � � No��
This says either that the goal cannot be solved� or that it can be solved withthe given instantiations� Note that we only return one answer in this latter case�but this is easy to change if desired�
���� Examples
We can use the Prolog interpreter just written to try some simple examples fromProlog textbooks� For example�
�� CHAPTER �� EXAMPLES
let rules � parse�rules�male�albert��male�edward��female�alice��female�victoria��parents�edward�victoria�albert��parents�alice�victoria�albert��sister�of�X�Y� ��female�X��parents�X�M�F��parents�Y�M�F�����
rules � �term � term list� list � %male�albert�%� !� %male�edward�%� !�%female�alice�%� !� %female�victoria�%� !�%parents�edward�victoria�albert�%� !�%parents�alice�victoria�albert�%� !�%sister�of�X�Y�%� %female�X�%� %parents�X�M�F�%� %parents�Y�M�F�%!!
prolog rules ��sister�of�alice�edward������ � outcome � Yes ! prolog rules �parse�term �sister�of�alice�X������ � outcome � Yes �X�� %edward%! prolog rules �parse�term �sister�of�X�Y������ � outcome � Yes �Y�� %edward%� �X�� %alice%!
The following are similar to some elementary ML list operations� Since Prologis relational rather than functional� it is possible to use Prolog queries in a more�exible way� e�g� ask what arguments would give a certain result� rather thanvice versa�
let r � parse�rules�append� !�L�L��append� H�T!�L� H�A!� �� append�T�L�A�����
r � �term � term list� list � %append� !�L�L�%� !�%append�H � T�L�H � A�%� %append�T�L�A�%!!
prolog r �parse�term �append� ���!� �!� �����!������ � outcome � Yes ! prolog r �parse�term �append� ���!� ���!�X������ � outcome � Yes �X�� %� � �� � �� � �� � !���%! prolog r �parse�term �append� ���!�X�X������ � outcome � No prolog r �parse�term �append� ���!�X�Y������ � outcome � Yes �Y�� %� � �� � X�%!
In such cases� Prolog seems to be showing an impressive degree of intelligence�However under the surface it is just using a simple search strategy� and this caneasily be thwarted� For example� the following loops inde�nitely�
prolog r �parse�term �append�X� ���!�X�����
���� PROLOG AND THEOREM PROVING ���
���� Theorem proving
Prolog is acting as a simple theorem prover� using a database of logical factsthe rules� in order to prove a goal� However it is rather limited in the factsit can prove� partly because its depth��rst search strategy is incomplete� andpartly because it can only make logical deductions in certain patterns� It ispossible to make Prolog�like systems that are more powerful� e�g� see Stickel���� In what follows� we will just show how to build a more capable theoremprover using essentially similar technology� including the uni�cation code and anidentical backtracking strategy�
In particular� uni�cation is an e�ective way of deciding how to specializeuniversally quanti�ed variables� For example� given the facts that X� pX� �qX� and pfa��� we can unify the two expressions involving p and thus discoverthat we need to set X to fa�� By contrast� the very earliest theorem proverstried all possible terms built up from the available constants and functions the�Herbrand base ��
Usually� depth��rst search would go into an in�nite loop� so we need to modifythe Prolog strategy slightly� We will use depth �rst iterative deepening� Thismeans that the search depth has a hard limit� and attempts to exceed it causebacktracking� However� if no proof is found at a given depth� the bound isincreased and another attempt is made� Thus� �rst one searches for proofs ofdepth �� and if that fails� searches for one of depth �� then depth � and so on�For �depth one can use various parameters� e�g� the height or overall size of thesearch tree� we will use the number of uni�able variables introduced�
Manipulating formulas
We will simply use our standard �rst order terms to denote formulas� introducingnew constants for the logical operators� Many of these are written in�x�
Operator Meaning'p� not pp " q p and qp # q p or qp ��� q p implies qp ��� q p if and only if qforallX�p� for all X� pexistsX�p� there exists an X such that p
An alternative would be to introduce a separate type of formulas� but thiswould require separate parsing and printing support� We will avoid this� for thesake of simplicity�
��� CHAPTER �� EXAMPLES
Preprocessing formulas
It s convenient if the main part of the prover need not cope with implicationsand �if and only if s� Therefore we �rst de�ne a function that eliminates these infavour of the other connectives�
let rec proc tm �match tm withFn��(�� t!� � Fn��(�� proc t!�
� Fn��"�� t�� t�!� � Fn��"�� proc t�� proc t�!�� Fn����� t�� t�!� � Fn����� proc t�� proc t�!�� Fn������ t�� t�!� �
proc �Fn����� Fn��(�� t�!�� t�!��� Fn������ t�� t�!� �
proc �Fn��"�� Fn������ t�� t�!��Fn������ t�� t�!�!��
� Fn��forall�� x� t!� � Fn��forall�� x� proc t!�� Fn��exists�� x� t!� � Fn��exists�� x� proc t!�� t � t��
The next step is to push the negations down the formula� putting it into so�called �negation normal form NNF�� We de�ne two mutually recursive functionsthat create NNF for a formula� and its negation�
let rec nnf�p tm �match tm withFn��(�� t!� � nnf�n t
� Fn��"�� t�� t�!� � Fn��"�� nnf�p t�� nnf�p t�!�� Fn����� t�� t�!� � Fn����� nnf�p t�� nnf�p t�!�� Fn��forall�� x� t!� � Fn��forall�� x� nnf�p t!�� Fn��exists�� x� t!� � Fn��exists�� x� nnf�p t!�� t � t
and nnf�n tm �match tm withFn��(�� t!� � nnf�p t
� Fn��"�� t�� t�!� � Fn����� nnf�n t�� nnf�n t�!�� Fn����� t�� t�!� � Fn��"�� nnf�n t�� nnf�n t�!�� Fn��forall�� x� t!� � Fn��exists�� x� nnf�n t!�� Fn��exists�� x� t!� � Fn��forall�� x� nnf�n t!�� t � Fn��(�� t!���
We will convert the negation of the input formula into negation normal form�and the main prover will then try to derive a contradiction from it� This willsu�ce to prove the original formula�
The main prover
At each stage� the prover has a current formula� a list of formulas yet to beconsidered� and a list of literals� It is trying to reach a contradiction� Thefollowing strategy is employed�
���� PROLOG AND THEOREM PROVING ���
� If the current formula is �p " q � then consider �p and �q separately� i�e�make �p the current formula and add �q to the formulas to be considered�
� If the current formula is �p # q � then try to get a contradiction from �p and then one from �q �
� If the current formula is �forallX�p� � invent a new variable to replace �X �the right value can be discovered later by uni�cation�
� If the current formula is �existsX�p� � invent a new constant to replace �X �
� Otherwise� the formula must be a literal� so try to unify it with a contra�dictory literal�
� If this fails� add it to the list of literals� and proceed with the next formula�
We desire a similar backtracking strategy to Prolog� only if the current in�stantiation allow all remaining goals to be solved do we accept it� We could uselists again� but instead we use continuations� A continuation is a function thatis passed to another function and can be called from within it to �perform therest of the computation � In our case� it takes a list of instantiations and tries tosolve the remaining goals under these instantiations� Thus� rather than explicitlytrying to solve all remaining goals� we simply try calling the continuation�
let rec prove fm unexp pl nl n cont i �if n � then raise �error �No proof�� elsematch fm withFn��"�� p�q!� �
prove p �q��unexp� pl nl n cont i� Fn����� p�q!� �
prove p unexp pl nl n�prove q unexp pl nl n cont� i
� Fn��forall�� Var x� p!� �let v � mkvar�� inprove �subst x�Var v! p� �unexp' fm!� pl nl �n � �� cont i
� Fn��exists�� Var x� p!� �let v � mkvar�� inprove �subst x�Fn�v� !�! p� unexp pl nl �n � �� cont i
� Fn��(�� t!� ��try first �fun t� � let i� � unify t t� i in
cont i�� plwith error � �
prove �hd unexp� �tl unexp� pl �t��nl� n cont i�� t �
�try first �fun t� � let i� � unify t t� i incont i�� nl
with error � �prove �hd unexp� �tl unexp� �t��pl� nl n cont i���
��� CHAPTER �� EXAMPLES
We set up the �nal prover as follows�
let prover �let rec prove�iter n t �try let insts � prove t ! ! ! n I ! in
let globinsts � filter�fun �v��� � occurs�in v t� insts in
n�globinstswith error � � prove�iter �n � �� t in
fun t � prove�iter �nnf�n�proc�parse�term t�����
This implements the iterative deepening strategy� It tries to �nd the proofwith the fewest universal instantiations� if it succeeds� it returns the number re�quired and any toplevel instantiations� throwing away instantiations for internallycreated variables�
Examples
Here are some simple examples from Pelletier �����
let P� � prover �p �� q �� (�q� �� (�p����P� � int � �string � term� list � � ! let P�� � prover �p � q " r �� �p � q� " �p � r����P�� � int � �string � term� list � � ! let P�� � prover ��p �� q� � �q �� p����P�� � int � �string � term� list � � ! let P�� � prover �exists�Y�forall�X�p�Y���p�X������P�� � int � �string � term� list � �� ! let P�� � prover �exists�X�forall�Y�forall�Z�
�p�Y���q�Z����p�X���q�X�������P�� � int � �string � term� list � �� !
A bigger example is the following�
let P�� � prover�lives�agatha� " lives�butler� " lives�charles� "�killed�agatha�agatha� � killed�butler�agatha� �killed�charles�agatha�� "
�forall�X�forall�Y�killed�X�Y� �� hates�X�Y� " (�richer�X�Y����� "
�forall�X�hates�agatha�X��� (�hates�charles�X���� "
�hates�agatha�agatha� " hates�agatha�charles�� "�forall�X�lives�X� " (�richer�X�agatha��
�� hates�butler�X��� "�forall�X�hates�agatha�X� �� hates�butler�X��� "�forall�X�(�hates�X�agatha�� � (�hates�X�butler��
� (�hates�X�charles������ killed�agatha�agatha����
P�� � int � �string � term� list � �� !
���� PROLOG AND THEOREM PROVING ���
In fact� the prover can work out �whodunit �
let P��� � prover�lives�agatha� " lives�butler� " lives�charles� "�killed�agatha�agatha� � killed�butler�agatha� �killed�charles�agatha�� "�forall�X�forall�Y�
killed�X�Y� �� hates�X�Y� " (�richer�X�Y����� "�forall�X�hates�agatha�X�
�� (�hates�charles�X���� "�hates�agatha�agatha� " hates�agatha�charles�� "�forall�X�lives�X� " (�richer�X�agatha��
�� hates�butler�X��� "�forall�X�hates�agatha�X� �� hates�butler�X��� "�forall�X�(�hates�X�agatha�� � (�hates�X�butler��
� (�hates�X�charles������ killed�X�agatha����
P��� � int � �string � term� list � �� �X�� %agatha%!
Further reading
Symbolic di�erentiation is a classic application of functional languages� Othersymbolic operations are major research issues in their own right � Davenport�Siret� and Tournier ��� give a readable overview of what computer algebrasystems can do and how they work� Paulson ���� discusses the kind of simpli��cation strategy we use here in a much more general setting�
Parsing with higher order functions is another popular example� It seems tohave existed in the functional programming folklore for a long time� an earlytreatment is given by Burge ������ Our presentation has been in�uenced by thetreatments in Paulson ����� and Reade �����
The �rst de�nition of �computable real number in our sense was in the orig�inal paper of Turing ������ His approach was based on decimal expansions�but he needed to modify it Turing ����� for reasons explored in the exercisesbelow� The approach to real arithmetic given here follows the work of Boehm�Cartwright� O Donnel� and Riggle ����� More recently a high�performanceimplementation in CAML Light has been written by M,enissier�Morain ������whose thesis in French� contains detailed proofs of correctness for all the al�gorithms for elementary transcendental functions� For an alternative approachusing �linear fractional transformations � see Potts ������
Our approach to Prolog search and backtracking� either using lists or contin�uations� is fairly standard� For more on implementing Prolog� see for exampleBoizumault ������ while for more on the actual use of the language� the classictext by Clocksin and Mellish ���� is recommended� A detailed discussion ofcontinuations in their many guises is given by Reynolds ������ The uni�cationalgorithm given here is simple and purely functional� but rather ine�cient� For
��� CHAPTER �� EXAMPLES
faster imperative algorithms� see Martelli and Montanari ����� Our theorem
prover is based on leanTAP Beckert and Posegga ������ For another importanttheorem proving method conveniently based on Prolog�style search� look at theProlog Technology Theorem Prover Stickel ����
Exercises
�� Modify the printer so that each operator has an associativity� used whenprinting iterated cases of the operator to avoid excessive parentheses�
�� The di�erentiation rules for inverses ��gx� and quotients fx��gx� arenot valid when gx� � �� Several others for functions like ln are also trueonly under certain conditions� Actually most commercial computer algebrasystems ignore this fact� However� try to do better� by writing a new versionof the di�erentiator that not only produces the derivative but also a list ofconditions that must be assumed for the derivative to be valid�
�� Try programming a simple procedure for inde�nite integration� In general�it will not be able to do every integral� but try to make it understand a fewbasic principles��
�� Read the documentation for the format library� and try to implement aprinter that behaves better when the output is longer than one line�
�� What happens if the parser functions like term� atom etc� are all eta�reducedby deleting the word input� What happens if precedence is consistentlyeta�reduced by deleting input�
�� How well do precedence parsers generated by precedence treat distinctoperators with the same precedence� How can the behaviour be improved�Can you extend it to allow a mix of left and right associative operators�For example� one really wants subtraction to associate to the left�
�� Rewrite the parser so that it gives helpful error messages when parsing fails�
� Try representing a real number by the function that generates the �rst ndigits in a decimal expansion� Can you implement the addition function�
�� �� How can you retain a positional representation while making the basicoperations computable�
�In fact there are algorithms that can integrate all algebraic expressions �not involving sin ln etc� � see Davenport ������
���� PROLOG AND THEOREM PROVING ���
��� Implement multiprecision integer arithmetic operations yourself� Make yourmultiplication algorithm Onlog��� where n is the number of digits in thearguments�
��� Using memo as we have de�ned it� is it the case that whatever series ofarguments a function is applied to� it will always give the same answer forthe same argument� That is� are the functions true mathematical functionsdespite the internal use of references�
��� Write a parser�evaluator to read real�valued expressions and create the func�tional representation for the answer�
��� �� Extend the real arithmetic routines to some functions like exp and sin�
��� Our preprocessing and negation normal form process can take exponentialtime in the length of the input in the case of many �if and only if s� Modifyit so that it is linear in this number�
��� Extend the initial transformation into negation normal form so that it givesSkolem normal form�
��� �� Implement a more e�cient uni�cation algorithm� e�g� following Martelliand Montanari �����
��� �� Implement a version of the Prolog Technology Theorem Prover Stickel���
�Skolem normal form is covered in most elementary logic books e�g� Enderton ������
Bibliography
Aagaard� M� and Leeser� M� ����� Verifying a logic synthesis tool in Nuprl� Acase study in software veri�cation� In v� Bochmann� G� and Probst� D� K�eds��� Computer Aided Veri�cation� Proceedings of the Fourth InternationalWorkshop� CAV��� Volume ��� of Lecture Notes in Computer Science� Mon�treal� Canada� pp� ��-�� Springer Verlag�
Abramsky� S� ����� The lazy lambda�calculus� In Turner� D� A� ed��� ResearchTopics in Functional Programming� Year of Programming series� pp� ��-����Addison�Wesley�
Adel son�Vel skii� G� M� and Landis� E� M� ����� An algorithm for the organi�zation of information� Soviet Mathematics Doklady � �� ����-�����
Backus� J� ���� Can programming be liberated from the von Neumann style�A functional style and its algebra of programs� Communications of the ACM ���� ���-����
Barendregt� H� P� ���� The Lambda Calculus� Its Syntax and Semantics� Vol�ume ��� of Studies in Logic and the Foundations of Mathematics� North�Holland�
Barwise� J� ���� Mathematical proofs of computer correctness� Notices of theAmerican Mathematical Society � �� ��-���
Beckert� B� and Posegga� J� ����� leanTAP � Lean� tableau�based deduction� Jour�nal of Automated Reasoning � �� ���-��� Also available on the Web fromhttp���i�www�ira�uka�de�'posegga�LeanTaP�ps�Z�
Boehm� H� J�� Cartwright� R�� O Donnel� M� J�� and Riggle� M� ���� Exact realarithmetic� a case study in higher order programming� In Conference Record ofthe ���� ACM Symposium on LISP and Functional Programming� pp� ���-����Association for Computing Machinery�
Boizumault� P� ����� The implementation of Prolog� Princeton series incomputer science� Princeton University Press� Translated from �Prolog�l implantation by A� M� Djamboulian and J� Fattouh�
���
BIBLIOGRAPHY ���
Boyer� R� S� and Moore� J� S� ����� A Computational Logic� ACM MonographSeries� Academic Press�
Burge� W� H� ����� Recursive Programming Techniques� Addison�Wesley�
Church� A� ����� An unsolvable problem of elementary number�theory� Ameri�can Journal of Mathematics� � ���-����
Church� A� ����� A formulation of the Simple Theory of Types� Journal ofSymbolic Logic� � ��-��
Church� A� ����� The calculi of lambda�conversion� Volume � of Annals of Math�ematics Studies� Princeton University Press�
Clocksin� W� F� and Mellish� C� S� ���� Programming in Prolog �rd ed���Springer�Verlag�
Curry� H� B� ����� Grundlagen der Kombinatorischen Logik� American Journalof Mathematics� �� ���-���� ��-���
Davenport� J� H� ���� On the integration of algebraic functions� Volume ��� ofLecture Notes in Computer Science� Springer�Verlag�
Davenport� J� H�� Siret� Y�� and Tournier� E� ��� Computer algebra� systemsand algorithms for algebraic computation� Academic Press�
Davis� M� D�� Sigal� R�� and Weyuker� E� J� ����� Computability� complexity� andlanguages� fundamentals of theoretical computer science �nd ed��� AcademicPress�
de Bruijn� N� G� ����� Lambda calculus notation with nameless dummies� a toolfor automatic formula manipulation� with application to the Church�Rossertheorem� Indagationes Mathematicae� ��� ��-����
DeMillo� R�� Lipton� R�� and Perlis� A� ����� Social processes and proofs oftheorems and programs� Communications of the ACM � ��� ���-���
Dijkstra� E� W� ����� A Discipline of Programming� Prentice�Hall�
Enderton� H� B� ����� A Mathematical Introduction to Logic� Academic Press�
Frege� G� ���� Grundgesetze der Arithmetik begri�sschrift abgeleitet� Jena� Par�tial English translation by Montgomery Furth in �The basic laws of arithmetic�Exposition of the system � University of California Press� �����
Girard� J��Y�� Lafont� Y�� and Taylor� P� ���� Proofs and Types� Volume �of Cambridge Tracts in Theoretical Computer Science� Cambridge UniversityPress�
�� BIBLIOGRAPHY
Gordon� A� D� ����� Functional Programming and Input�Output� DistinguishedDissertations in Computer Science� Cambridge University Press�
Gordon� M� J� C� ��� Programming Language Theory and its Implementa�tion� applicative and imperative paradigms� Prentice�Hall International Seriesin Computer Science� Prentice�Hall�
Gordon� M� J� C�� Milner� R�� and Wadsworth� C� P� ����� Edinburgh LCF� AMechanised Logic of Computation� Volume � of Lecture Notes in ComputerScience� Springer�Verlag�
Henson� M� C� ���� Elements of functional languages� Blackwell Scienti�c�
Hindley� J� R� and Seldin� J� P� ���� Introduction to Combinators and ��Calculus� Volume � of London Mathematical Society Student Texts� CambridgeUniversity Press�
Hudak� P� ���� Conception� evolution� and application of functional program�ming languages� ACM Computing Surveys� ��� ���-����
Huet� G� ���� Con�uent reductions� abstract properties and applications toterm rewriting systems� Journal of the ACM � ��� ���-���
Kleene� S� C� ����� A theory of positive integers in formal logic� AmericanJournal of Mathematics� �� ���-���� ���-����
Lagarias� J� ���� The �x " � problem and its generalizations� TheAmerican Mathematical Monthly � �� �-��� Available on the Web ashttp���www�cecm�sfu�ca�organics�papers�lagarias�index�html�
Landin� P� J� ����� The next ��� programming languages� Communications ofthe ACM � � ���-����
Lindemann� F� ��� &Uber die Zahl �� Mathematische Annalen� ���� ���-����
Mairson� H� G� ����� Deciding ML typability is complete for deterministic expo�nential time� In Conference Record of the Seventeenth Annual ACM Symposiumon Principles of Programming Languages �POPL�� San Francisco� pp� ��-����Association for Computing Machinery�
Mairson� H� G� ����� Outline of a proof theory of parametricity� In Hughes� J�ed��� ���� ACM Symposium on Functional Programming and Computer Archi�tecture� Volume ��� of Lecture Notes in Computer Science� Harvard University�pp� ���-����
Martelli� A� and Montanari� U� ���� An e�cient uni�cation algorithm� ACMTransactions on Programming Languages and Systems� �� ��-���
BIBLIOGRAPHY ���
Mauny� M� ����� Functional programming using CAML Light� Available on theWeb from http���pauillac�inria�fr�caml�tutorial�index�html�
M,enissier�Morain� V� ����� Arithm�etique exacte� conception� algorithmiqueet performances dune impl�ementation informatique en pr�ecision arbitraire�Th.ese� Universit,e Paris ��
Michie� D� ���� #Memo$ functions and machine learning� Nature� ��� ��-���
Milner� R� ���� A theory of type polymorphism in programming� Journal ofComputer and Systems Sciences� ��� ��-����
Mycroft� A� ���� Abstract interpretation and optimising transformations ofapplicative programs� Technical report CST������ Computer Science Depart�ment� Edinburgh University� King s Buildings� May�eld Road� Edinburgh EH��JZ� UK�
Neumann� P� G� ����� Computer�related risks� Addison�Wesley�
Oppen� D� ���� Prettyprinting� ACM Transactions on Programming Languagesand Systems� �� ���-���
Paulson� L� C� ���� A higher�order implementation of rewriting� Science ofComputer Programming � �� ���-����
Paulson� L� C� ����� ML for the Working Programmer� Cambridge UniversityPress�
Pelletier� F� J� ���� Seventy��ve problems for testing automatic theoremprovers� Journal of Automated Reasoning � �� ���-���� Errata� JAR � �������-����
Peterson� I� ����� Fatal Defect � Chasing Killer Computer Bugs� Arrow�
Potts� P� ����� Computable real arithmetic using linear fractional trans�formations� Unpublished draft for PhD thesis� available on the Web ashttp���theory�doc�ic�ac�uk�'pjp�pub�phd�draft�ps�gz�
Raphael� B� ����� The structure of programming languages� Communicationsof the ACM � � ���-����
Reade� C� ���� Elements of Functional Programming� Addison�Wesley�
Reynolds� J� C� ����� The discoveries of continuations� Lisp and Symbolic Com�putation� �� ���-����
��� BIBLIOGRAPHY
Robinson� J� A� ����� Logic� computers� Turing and von Neumann� In Furukawa�K�� Michie� D�� and Muggleton� S� eds��� Machine Intelligence ��� pp� �-���Clarendon Press�
Sch&on�nkel� M� ����� &Uber die Bausteine der mathematischen Logik� Mathe�matische Annalen� �� ���-���� English translation� �On the building blocksof mathematical logic in van Heijenoort ������ pp� ���-����
Schwichtenberg� H� ����� De�nierbare Funktionen im ��Kalk&ul mit Typen�Arkhiv f�ur mathematische Logik und Grundlagenforschung � ��� ���-����
Stickel� M� E� ��� A Prolog Technology Theorem Prover� Implementation byan extended Prolog compiler� Journal of Automated Reasoning � �� ���-���
Turing� A� M� ����� On computable numbers� with an application to theEntscheidungsproblem� Proceedings of the London Mathematical Society ������� ���-����
Turing� A� M� ����� Correction to Turing ������ Proceedings of the LondonMathematical Society ���� ��� ���-����
van Heijenoort� J� ed�� ����� From Frege to G�odel� A Source Book in Mathe�matical Logic ���������� Harvard University Press�
Whitehead� A� N� ����� An Introduction to Mathematics� Williams and Norgate�
Whitehead� A� N� and Russell� B� ����� Principia Mathematica �� vols�� Cam�bridge University Press�
Winskel� G� ����� The formal semantics of programming languages� an intro�duction� Foundations of computing� MIT Press�
Wittgenstein� L� ����� Tractatus Logico�Philosophicus� Routledge ) KeganPaul�
Wright� A� ����� Polymorphism for imperative languages without imperativetypes� Technical Report TR������� Rice University�