programming methodology - cornell · pdf fileprogramming methodology making a science out of...

PROGRAMMING METHODOLOGYMaking a Science Out of an Art

by David Gries and Fred B. SchneiderIt doesn 't take too long for an intel-ligent, scientifically oriented person tolearn to cobble programs together inFORTRAN, BASIC, or Pascal Sure,there are mistakes, but everyone makesmistakes, so one simply spends thenecessary time debugging. And one getsbetter at programming simply by doinglots of it. So what do they teach aboutprogramming? What is there to it?

This attitude is common and mayeven be reasonable for casual pro-gramming. For any serious program-ming, however, it invites disaster. Acasual program bears little resemblanceto the system of a thousand to a millionlines of codes that a professional mustbe able to write (and read) in concertwith as many as fifty other people. Sucha program must be correct, as simple aspossible, and capable of being readilyunderstood, modified, and used byothers.

How does one write programs thatsatisfy these requirements? That was thesubject of a NATO conference held in

23 Germany in 1968. At the conference the

term software crisis was often heard, forthere was indeed a crisis. The pro-gramming industry was being asked todevelop larger and more complicatedsystems of programs, and they didn'thave the expertise to do so effectively.The conference led to world-wide re-cognition that programming was indeeda difficult intellectual activity. The termsoftware engineering was coined thereto denote the collection of technical andmanagerial techniques used in the"software life cycle"—in the planning,analysis, design, implementation, test-ing, documentation, distribution, andmaintenance of a programming system—and research in all these aspects beganin earnest.

A SCIENCE CONCERNEDWITH MENTAL TOOLSAt Cornell, prompted partly by our lackof understanding of how to teachprogramming, we became involved inthe study of methods for developing andunderstanding programs, a field thathas become known as programmingmethodology.

Programming methodology has been

a central theme in the Cornell de-partment for fifteen years and hasinfluenced our work in other areas. Forexample, ideas about the process ofprogram development influence thoughton compiler construction, programming-language design, structured editors,debugging tools, "pretty printers"(which print a program in an indentedformat in accordance with the programstructure), and computer verification ofthe correctness of a program or, indeed,of any mathematical proof. Theserelated areas deal with supplementaltools used by the programmer; pro-gramming methodology in its narrowestsense is more concerned with the mentaltools that are needed.

Research done so far has convincedus that programming can become ascience, based on the knowledge andapplication of principles, rather than anart, which can be learned simply bywatching and doing. We have dis-covered that programming at its best isa mathematical activity, requiring fromthe programmer all the taste, elegance,and desire for simplicity that char-acterizes mathematicians. Our exper-

ience has greatly influenced how weteach programming and how we presentalgorithms in higher-level courses. Sofar, most of the research has dealt withsmall programs, but larger ones arebeing considered.

In this article we describe some of thebasic ideas involved in programmingmethodology. We use only one smallexample, but this should be enough towhet your appetite for more. Towardthe end we present a couple of problemswith the solutions we developed; weencourage you to try to solve thembefore looking at the solutions.

TAMING COMPLEXITY:THE FIRST NECESSITYAs any programmer will tell you, even aten-line program can be complex anddifficult to understand. Think, then, ofthe complexity of a ten-thousand-lineprogram! Somehow, the programmermust master the complexity, mustprevent it from rearing its ugly head.

The amount of work required tounderstand a program must be pro-portional to its length. This will only bethe case if the program structure andthe interactions between the programsegments are kept simple. And thelonger the program, the more importantit is to keep things simple. Computerscience already has a branch calledcomputational complexity, in contrast,we like to call the field of programmingmethodology computational simplicity.

How do we achieve simplicity? Thegeneral method is to introduce suitablenotations and use abstraction: variousaspects of a problem are brought to thefore and others are hidden in thebackground to be dealt with later. Newformalisms are developed, along with

notations that allow the expression ofconcepts and the manipulation offormulas in various ways in order toprove things about them. In essence,mathematics is used, as in any scientificfield, to master complexity.

In our research we have turnedmostly to formal logic to help usdetermine what is meant by correctnessof a program, for without knowing that,it is difficult to write correct programs.This has led to definitions of pro-gramming languages in terms of correct-ness rather than in terms of how aprogram is executed. And from thesemathematical definitions, theories andprinciples for developing programs havearisen.

This does not mean that everyprogram must be developed and provedcorrect in a formal manner. It doesmean, however, that the programmerwith a sound knowledge of the theoryand principles behind program correct-ness and program development can usethem in an informal manner, relying onthe formalism when it is needed—whenthe problems become more complex.

WHAT DOES PROGRAMCORRECTNESS MEAN?A program (or a segment of one) iscorrect if its execution, begun in any"reasonable" state, ends in a desiredfinal state. That is: if its input variableshave proper values, then so will itsoutput variables.

We describe sets of reasonable ordesired states by true-false statements,called assertions, about the programvariables. To illustrate, let us supposewe want a program S to store in anarray the cubes of the first n naturalnumbers, where integer value n is at

least 0 and the array is denoted byb[0..n-l]. (By convention, if n = 0 thearray is assumed to be empty.) Forexample, if we execute the programwith n = 4, the resulting array will beb[0] = 0, b[l] = 1, b[2] = 8, b[3] = 27.Below, we specify S by giving aprecondition P that describes the set ofpossible initial states and a post-condition R that describes the correspond-ing final states. In assertion R, thephrase 0 < i < n means we are interestedonly in integers at least 0 and less thann; for such integers i, b[i] = i3.

P:n>0R: (for all i: 0 < i < n: b\i\ - P)

We say that S is correct with respect toP and /?, written as {P} S {R}, ifexecution of S begun in a state in whichP is true terminates in a state in whichR is true. Nothing is said about executionof S begun in a state in which P is nottrue.

HOW CAN CORRECTNESSBE PROVED?It is difficult to prove {P} S {R} usingonly our operational understanding ofhow S is executed. Given some initialstate, we can execute the program byhand (or let the computer do it) todetermine what the final state is, but toprove correctness using this approachwe would have to execute the programonce for each possible initial state, andmost of us don't have time for that! No,a way must be found that allows us todeduce correctness without relying onthe notion of execution, and this callsfor a mathematical theory of correct-ness. For each kind of statement, weneed a definition that gives the pairs ofpre- and post-conditions related by it. 24

The theory will tell us, for example, thatthe following are true about the as-signments x :- 0 and x :- x + 1 to integervariable JC:

{OV = 0} JC := 0 {x^y = 0},{x+1 > 0 } JC := jc+l { J C > 0 } .

(Note that ()• y - 0 is always true, so thatprecondition is equivalent to true.Similarly, the second precondition isequivalent to JC > 0.)

The possible pre- and post-conditionsfor a statement should be related by asimple syntactic transformation, andnot only by meaning, so that one reallycan manipulate statements the way onedoes arithmetic or logical statements.For example, the statement JC := ey

which assigns the value of expression eto variable JC, is defined by the rule

{Rj!} x := e {R}(for all assertions R).

In this expression R£ is the assertionobtained by simultaneously replacingevery occurrence of "JC" in R by *V\Thus, given that R is to be true afterexecution of JC := e, we can determineeasily what has to be true beforeexecution: R*. We see that this holdsfor the two examples given above. Forexample, in

{P: 0\y = 0} JC := 0 {R: x\y = 0},

P is the result of substituting 0 for x inR.

It is rather neat that this simplenotion of textual substitution, which isa basic concept of mathematical logic,can be used so simply to define theassignment statement.

Other statements are defined similarly.For example, sequencing of two state-

25 ments SO and S\ is defined:

Problem 1COMPUTING CUBES

Write a program to store the cubes ofthe first n natural numbers in arrayb[0..n-\]. Use only addition oper-ations. (See page 26 for a solution.)

Problem 2THE MAXIMUM-SUM

SEGMENT

Suppose we are given integer arrayb[0..n~\] for n > 0. Let 5fJ denote thesum of the values of segment b[i\.j-\].(If / =j, the segment is empty and thesum is 0.) Write a program to store invariable s the largest sum SQ over allsegments b[i..j~\] of array b[0..n-l].(See page 27 for a solution.)

U {P} SO {Q} SI { lthen {P} SO; SI {R}(for any assertions P, Q, and R).

This definition allows us to compute theprecondition for a sequence of assign-ments simply by beginning with thepostcondition and iteratively working"backward", using the assignment state-ment definition. For example, it allowsus to prove that the sequence t := JC; JC .:=y\ y :- t exchanges the values ofvariables x and y. (Below, X and Ydenote the final values of JC and y,respectively.)

{y = X and x = Y}t := JC;

{v = X and t = Y}x :- y\{x = X and t = Y]

{JC = X and v = Y}

WHAT ABOUTPROGRAM DEVELOPMENT?Proving a program correct after it hasbeen written is difficult. It makes moresense to develop a program and itscorrectness proof hand-in-hand, withthe proof leading the way. When doingthis, it is important to write the programspecification as precisely as possible (interms of pre- and post-conditions)because the specification should driveprogram development. To convey thisidea through an example, we willconsider again the problem of writing aprogram to store the cubes of the first nnatural numbers in array b[0..n-l], withthe restriction that since exponentiationand multiplication are expensive, onlyadditive operations should be used inthe program. Try writing a program forPROBLEM 1 yourself before lookingat our development on the followingpage.

In the event that you have a littletrouble, we should point out thatSOLUTION 1 was developed usingvarious principles of programmingmethodology that have only been out-lined here. Naturally, you might havedifficulty applying them yourself at thispoint. However, any programmer wellversed in the methodology would deriveessentially the same program as the onein SOLUTION 1 in perhaps twentyminutes. How did your solutioncompare?

We give one more example withoutthe program development. Try todevelop PROBLEM 2 yourself beforereading our solution on a followingpage.

The program for PROBLEM 2 hasan interesting history. Jon Bentley atCarnegie-Mellon and Bell Laboratories

SOLUTION TO PROBLEM 1

The first step is to specify the program formally by writingpre-and post-conditions:

Precondition P: 0 < nPostcondition R: (for all i: 0 < i < n: b[i\ - i3)

Assuming the use of a loop to calculate the elements of array b,our correctness ideas require writing an assertion that indicateswhat is true of b just before and after each iteration of the loop.To find this assertion, we introduce a fresh variable k (say),which in this case will be what is often called a "loop counter",put suitable bounds on it, and replace n in R by k, yielding thefollowing two assertions PO and PI:

PO: 0 < k < nPI: (for all /: 0 < i < k: b[i\ = f)

We can make PO and PI true by setting k to 0. Also, whenk - n, R is true. And we write the following program:

fc:=0;while k ^ n do begin b[k] := k3;

k := k + 1end

PO and P\ are known as loop invariants, for they are"invariantly true" before and after each iteration of the loop.One understands the loop in four steps: (0) show that the loopinvariants are true just before execution of the loop; (1) showthat each iteration leaves them true, so that they are true beforeand after each iteration and thus upon loop termination; (2)show that the desired result R follows from the loop invariantsand the falsity of the loop condition; and (3) show that the loopterminates.

The process used here to argue about correctness should beused to argue about the correctness of every nontrivial loop. Itis the programmer's task to annotate each nontrivial loop withthe necessary loop invariants, because they are a necessary partof understanding the loop. It may seem like a lot of work, but,we maintain, it is simply formalizing what a programmer doesanyway when reasoning about why a loop works.

We now have a correct program. However, it uses

exponentiation k3. To get rid of it, we simply introduce a freshvariable x (say) and its definition:

PI: x = k3

With this new loop invariant, we can replace the assignmentb[k] := k3 by b[k] := x. However, PI also depends on k. Beforek is increased by 1, x has to be changed to contain (k + I)3 inorder to maintain the truth of PI. Since

(k + I)3 = k3 + 3*k2 + 3*k+ \

and since x - A:3 just before each iteration, x can be changed tocontain (k + I)3 by executing the assignment

x := x+ 3*k2 + 3*k + 1.

This yields the program

k := 0; x := 0;while k ¥=• n do begin b[k] := JC;

x := JC + 3«&2 + 3*£+ 1;k := k + 1

end

The program still contains exponentiation and multiplication.Repeating (twice) the process that introduced x yields theprogram

k :=0; JC := 0; j> := 1; z := 6;while k ¥" n do begin b[k] :- x;

x := x + y;y - y + z\z := z + 6;k := k + 1

end

where there are five loop invariants:

P0: 0 < k < nPI: (for all i: 0 < i < k: b[i\ = P)Fl:x^k3

P3: y - 3*k2 + 3*k + 1P4: z = 6*k + 6

26

SOLUTION TO PROBLEM 2

The formal specification, using somenotation that should be fairly obvious,

Precondition P: 0 < nPostcondition R: s -

MAX(iJ: 0 < i <j < n: Sitj)

The program is

k := 0; c := 0; s := 0;while & # « do begin

c := mfl.v(c + b[kl 0);5 := wa;c(5, c);A: := A: + 1

end

The program is understood in amanner similar to that described forunderstanding the cube program, usingthe following loop invariants:

P0: 0 < k < nP\: 5 = MAX(i,j: 0 < i < y < A: S,v)

(i: 0 < i < A: Sa)

discovered that a statistician was usinga program for this problem that re-quired time proportional to n3 —theprogram was actually computing thesums of all segments of the array.Several days later, Bentley returnedwith an algorithm that required timeproportional to A?2, and later one thatrequired time nm\og(n). Another statis-tician then showed Bentley a program,similar to ours, that required timeproportional only to n. During a visitto Cornell, Bentley asked us to write a

27 program for his problem. Several of us

who were experienced with the pro-gramming methodology came up withthe program, independently, in about ahalf-hour. (Of course, it took time topresent it as cleanly as SOLUTION 2 ispresented.) We never had to think ofthe n2 or nm\og(n) algorithms; themethodology led us quite directly to thesolution we have shown here.

WHERE TO LEARN MOREABOUT PROGRAMMINGSome of the concepts underlying pro-gramming methodology have foundtheir way down to undergraduatecourses at Cornell such as CS100 andCS211, although in an informal manner,and more will do so as we gainexperience and hone our skills. Pro-gramming methodology is taught at theupperclass level in course CS400,introduced this spring, as well as at thegraduate level; both courses are basedon The Science of Programming, whichwas written by Gries in 1981. Anadditional graduate course, CS613,extends the concepts to deal withconcurrency, which arises in operatingsystems, networks, and databases, wherevarious programs are executed simul-taneously and communicate throughshared data or message-passing.Schneider is writing a text on thesubject of this course.

Because of the youth of the field,computer science enjoys the problemthat many research results are incor-porated quite rapidly into education.Last year's research problem has movedinto this year's first-year graduate courseand will be in next year's senior-levelcourse. Our students, even in the earlyundergraduate years, are learning abouta relatively new discipine as it develops.

The practicality of theoretical researchis demonstrated every day not only inadvanced computing centers, but inuniversity classrooms where tomorrow'spractitioners are learning their skills.

David Gries is chairman of the Departmentof Computer Science and a specialist inprogramming methodology (see the bio-graphical sketch on page II).

Fred B. Schneider is an associate professorin the department. A specialist in concurrentprogramming, operating systems, and dis-tributed systems, he has also written anotherarticle in this issue in collaboration withthree of his colleagues (see pages 18 22). Heis on the College Board Committee onAdvanced Placement in Computer Science,which prepares an examination that reflectsmuch of what is taught at Cornell in theintroductory courses.

programming methodology - cornell · pdf fileprogramming methodology making a science out of...

Documents