cpp - literate programming

31
C++?? A Critique of C++ 2nd Edition Ian Joyner c/- Unisys - ACUS 115 Wicks Rd, North Ryde Australia 2113 Tel: +61-2-390 1328 Fax +61-2-390-1391 [email protected] © Ian Joyner 1992

Upload: others

Post on 12-Sep-2021

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CPP - Literate Programming

C++??A Critique of C++

2nd Edition

Ian Joyner

c/- Unisys - ACUS115 Wicks Rd, North Ryde

Australia 2113Tel: +61-2-390 1328 Fax +61-2-390-1391

[email protected]

© Ian Joyner 1992

Page 2: CPP - Literate Programming

1. Introduction......................................................................................................................12. The Role of a Programming Language..............................................................................1

2.1. Safety and Courtesy Concerns .............................................................................34. C++ Specific Criticisms ..................................................................................................4

3.1. Virtual Functions ................................................................................................43.2. Pure Virtual Functions ........................................................................................63.3. The Nature of Inheritance....................................................................................73.4. Function Overloading .........................................................................................73.5. Virtual Classes....................................................................................................83.6. Name overloading...............................................................................................83.7. Polymorphism and Inheritance ............................................................................93.8. ‘.’ and ‘->’ .........................................................................................................103.9. Anonymous parameters in Class Definitions .......................................................103.10. Nameless Constructors ......................................................................................113.11. Constructors and Temporaries ...........................................................................113.12. Optional Parameters ..........................................................................................113.13. Bad deletions.....................................................................................................113.14. Local entity declarations....................................................................................123.15. Members ...........................................................................................................123.16. Friends .............................................................................................................123.17. Static .................................................................................................................123.18. Union ...............................................................................................................133.19. Nested Classes..................................................................................................133.20. Global Environments........................................................................................133.21. Header Files .....................................................................................................143.22. Class Interfaces.................................................................................................143.23. Class header declarations ..................................................................................143.24. Garbage Collection ...........................................................................................153.25. Type-safe linkage .............................................................................................153.26. C++ and the software lifecycle..........................................................................163.27. Reusability and Communication .......................................................................173.28. Reusability and Trust........................................................................................173.29. Reusability and Compatibility ..........................................................................173.30. Reusability and Portability................................................................................173.31. Idiomatic Programming ....................................................................................173.32.Concurrent Programming ...................................................................................18

4. The role of Language.......................................................................................................185. On Writing ......................................................................................................................206. Generic C criticisms ........................................................................................................20

6.1. Pointers...............................................................................................................216.2. Arrays.................................................................................................................216.3. Function Parameters............................................................................................226.4. void * .................................................................................................................226.5. void fn ().............................................................................................................226.6. fn ().....................................................................................................................236.7. Metadata in Strings .............................................................................................246.8. ++, -- ..................................................................................................................246.9. Defines ...............................................................................................................256.10. NULL vs 0 ........................................................................................................256.11. Case Distinction ................................................................................................256.12. Assignment Operator ........................................................................................266.13. Type Casting ....................................................................................................266.14. Semicolons.......................................................................................................27

7. Conclusions......................................................................................................................278. Bibliography ....................................................................................................................29

Page 3: CPP - Literate Programming

1. Introduction most difficult to understand and technical sectionof the paper, but it is fundamental to theunderstanding of the weaknesses of C++.The C++ programming language is becoming

widely used. So it is important and timely toquestion its success. Two books are alreadypublished on the subject [Sakkinen 92] and[Yoshida 92]. This critique addresses thefollowing questions. How well does C++implement object-oriented concepts? Can it easilyimplement small, quick projects? Does it scale upwell for large projects? Does it support or hindergood programming practices? As a result, does itease the production of quality software? What isthe relationship between a language, compiler andsoftware developers; and between the language,compiler and the target system? This last questionaddresses issues of correctness, compatibility,portability, and efficiency.

Having said that, I hope that you find thiscritique useful, and enjoyable. If by any chanceyou do, please feel free to distribute it to yourmanagement, peers and friends.

2. The Role of a ProgrammingLanguage

A programming language functions at manydifferent levels and has many roles. It should becritiqued with respect to those levels and roles.Historically, programming languages had a verylimited role, that of writing executable programs.As programs have grown in complexity, this rolealone has proved insufficient. Many design andanalysis techniques have arisen to support othernecessary roles. The organisation of projects alsorequired tools external to the language andcompiler, like ‘make.’ Object-oriented techniqueshave arisen to help in the analysis and designphases, and object-oriented languages to supportthe implementation phase of OO. Traditional,tried and tested but failed software practices areinfiltrating the object-oriented world. Object-orientation, however, offers a better rationalapproach to software development. Thecomplementary roles of analysis, design,implementation and project organisation shouldbe better integrated in the object-oriented scheme.This results in economical software production.

A paper on the recommended practices for usein C++ [Ellemtel 92] suggests “C++ is a difficultlanguage in which there may be a very fine linebetween a feature and a bug. This places a largeresponsibility upon the programmer.” Is this aresponsibility or a costly burden? The ‘fine line’is a result of poor language definition. The C++standardisation committee warns “C++ is alreadytoo large and complicated for our taste” [X3J1692].

While it is true that C++ is immediatelyusable by many C programmers, and many seethis as a strength, the C base is C++’s greatestweakness. This is the engineering compromisethat C++ devotees talk about. Adoption of C++does not suddenly transform C programmers intoobject-oriented programmers. A complete changeof thinking is required, and C++ actually makesthis difficult. A critique of C++ cannot beseparated from criticism of the C base language,as it is essential for the C++ programmer to befluent in C. Many of C’s problems affect the waythat object-orientation is implemented and used inC++. This critique is not exhaustive of theweaknesses of C++, but it illustrates the practicalconsequences of these weaknesses with respect tothe timely and economic production of qualitysoftware.

C++ is an interesting experiment in adaptingthe advantages of object-orientation to atraditional programming language. BjarneStroustrup is to be applauded for having theinsight to put the two technologies together. C++,however, retains the problems of the old order ofsoftware production. C++ has an advantage overC as it supports many facets of object-orientation.These can be used for limited analysis and design.The processes of analysis, design, andorganisation, however, are still largely external toC++. Thus C++ has not realised the importantadvantages of object-orientation that will indeedlead to the economic production of software.

This critique criticises C++ in its own right,without comparison to other languages. Section 2considers the role of a programming language.Section 3 examines some specific aspects of C++.Section 4 examines the general role of language.Section 5 is a short comment on writing. Section6 looks specifically at C. The conclusionexamines where C++ has left us, and considers thefuture. The approach taken is to criticise specificaspects of C++ and C. Each section tries to be selfcontained. It is expected that not everyone willagree with all of the sections. It is probably best toapproach the paper, not by reading it entirely, butto read those sections that interest you. Onesection, however, is fundamental to the criticismof C++, that on virtual functions. This is also the

A language should not only be critiqued froma technical point of view, considering its syntacticand semantic features. It should also be critiquedfrom the viewpoint of its contribution to the entiresoftware development process. It should enablecommunication between project members actingat different levels, from management, who have arequirement for the product, to testers, who musttest the result. It should also enablecommunication between project membersseparated in space and time. Often oneprogrammer is not responsible for a task over itsentire lifetime.

C++?? 2nd Edition page 1

Page 4: CPP - Literate Programming

The primary purpose of any language iscommunication. A programming language shouldsupport the exchange of ideas, intentions, anddecisions between project members. Aprogramming language should provide a formal,yet readable, notation to support consistentdescriptions of systems that satisfy therequirements of diverse problems. A languageshould also provide methods for automatedproject tracking. This ensures that modules(classes and functionality) that satisfy projectrequirements are completed in a timely andeconomic fashion. A programming language aidsreasoning about the design, implementation,extension, correction, and optimisation of asystem.

techniques of schema checking are often criticisedas being restrictive and therefore unusable for realworld software. This is nonsense andmisunderstands of the power of these languages. Itis an immature conception; the best programmersrealise that programming is difficult. As a whole,the computing profession is still learning toprogram.

Another example of consistency checkingcomes from the user interface world. Instead ofcorrecting a user after an erroneous action, a gooduser interface will not offer the action as apossibility in the first place. It is cheaper to avoiderror than to fix it. Most people drive their carswith this principle in mind. Smash repair is timeconsuming and expensive.

A language definition should enable thedevelopment of integrated automated tools tosupport software development. For example,browsers, editors and debuggers. The compiler isanother such tool. The role of a compiler istwofold. Firstly, to generate code for the targetmachine. The role of the machine is to execute theproduced programs. A compiler has to check thata program conforms to the language syntax andgrammar, so it can ‘understand’ the program inorder to translate it into an executable form.Secondly, and more importantly, the compilershould check that the programmers expression ofthe system is complete, valid and consistent. Acompiler should perform semantics checking. Thisis checking that a program is internally consistent.Generating a system that has detectableinconsistencies is pointless.

Program development is a dynamic process. Aprogram description is constantly modified duringdevelopment. Modifications often lead toinconsistencies and error. Languages andcompilers that provide consistency checks helpprevent such ‘bugs’, which can creep into apreviously working system. These checks helpverify that as a program is modified, previousdecisions and work are not invalidated.

It is interesting to consider how muchchecking could be integrated in an editor. Thefocus of many current generation editors is text.What happens if we change this focus from text toprogram components? Such editors might checknot only syntax, but semantics. Alertingprogrammers of potential errors earlier andinteractively will shorten development times.Future languages should be defined very cleanlyin order to enable such editor technology.Semantics checking is done by ensuring that a

specification conforms to some schema. Forexample, the sentence “The boy drank thecomputer and switched on the glass of water” isgrammatically correct. But the sentence isnonsense. It does not conform to the mentalschema we have of computers and glasses ofwater. A programming language should includetechniques for the detection of similar nonsense.The language definition provides the frameworkthat makes this role of the compiler possible.

A programming language should provide aformal notation. During requirements analysis anddesign phases, formal and semi-formal notationsare required. Notations used in analysis, design,and implementation phases should becomplementary, rather than contradictory.Currently, analysis, design and modellingnotations are too far removed from programming,while programming languages are in general toolow level. Both designers and programmers mustcompromise to fill the gap. Current notationsprovide difficult transition paths between stages.This ‘semantic gap’ contributes to errors andomissions between the requirements, design andimplementation phases. Future programminglanguages will be an implementation extension ofthe high level notations used for requirementsanalysis and design. This will lead to improvedconsistency between analysis, design andimplementation. Object-oriented techniquesemphasise the importance of this, as abstractdefinition and concrete implementation can beseparate, yet provided by the same syntax.

Checking is often enabled by the specificationof redundant information. Declarations are anexample of redundancy that help check formisspellings. Declarations define the vocabularyof a program, ie the elements in its universe. Thecompiler uses redundant information forconsistency checking, and strips it away toproduce efficient executable systems. Type safetyis another technique. Declarations also associatean entity with a type, to define the entities role.Typing ensures that you can’t drink computers orswitch on glasses of water. C++ is animprovement over C in type safety.

It is a misconception that consistency checksare ‘training wheels’ for student programmers,and that ‘syntax’ errors are a hindrance toprofessional programmers. Languages that exploit

Programming languages also providenotations to formally document a system.Program source is the only reliable documentationof a system, so a language should explicitly

C++?? 2nd Edition page 2

Page 5: CPP - Literate Programming

support documentation. As with all language, theeffectiveness of communication is dependent uponthe skill of the writer. Good program writersrequire languages that support the role ofdocumentation. They require that the syntax of alanguage is perspicuous, and easy to learn. Thosenot trained in the skill of ‘writing’ programs, canread them to gain understanding of the system.After all, it is not necessary for newspaper readersto be journalists.

These quotes from Reade are a good summaryof the principles from which I criticise C++. WhatReade calls administrative tasks, I callbookkeeping. C and C++ are often criticised forbeing cryptic. The reason is that C concentrates onpoints 2 and 3, while the description of what is tobe computed is obscured. High level languagesdescribe ‘what’ is to be computed. This is theproblem domain. ‘How’ a computation isachieved is in the low-level machine-orienteddomain. The conflict between these aspects recursfrequently throughout this critique. Automatingthe bookkeeping tasks enhances correctness,compatibility, portability and efficiency.Bookkeeping tasks arise from having to specify‘how’ a computation is done. Specifying ‘how’things are done in some environments hindersportability to other platforms.

Chris Reade [Reade 89] gives the followingexplanation of programming and languages. “One,rather narrow, view is that a program is asequence of instructions for a machine. We hopeto show that there is much to be gained fromtaking the much broader view that programs aredescriptions of values, properties, methods, prob-lems and solutions. The role of the machine is tospeed up the manipulation of these descriptions toprovide solutions to particular problems. Aprogramming language is a convention forwriting descriptions which can be evaluated.”

The industry should be moving towards theseideals. They will help in the economic productionof software, rather than the costly techniques oftoday. We should consider what we need, andassess the problems of what we have against that.Object-orientation provides one solution to theseproblems. Its effectiveness, however, depends onthe quality of its implementation.

[Reade 89] also describes programming asbeing a “Separation of concerns”. He says:

“The programmer is having to do severalthings at the same time, namely,

It is relevant to ask if grafting OO conceptsonto a conventional language realises the fullbenefits of OO? Perhaps a biblical quote can beconsidered: “No one sews a patch of unshrunkcloth on to an old garment; if he does, the patchtears away from it, the new from the old, andleaves a bigger hole. No one puts new wine intoold wineskins; if he does, the wine will burst theskins, and then wine and skins are both lost. Newwine goes into fresh skins.” Mark 2:22

(1) describe what is to be computed;(2) organise the computation sequencing into

small steps;(3) organise memory management during the

computation.”Reade continues, “Ideally, the programmer

should be able to concentrate on the first of thethree tasks (describing what is to be computed)without being distracted by the other two, moreadministrative, tasks. Clearly, administration isimportant but by separating it from the main taskwe are likely to get more reliable results and wecan ease the programming problem by automatingmuch of the administration.

We must abandon disorganised and error-prone practices, not adapt them to new contexts.How well can hybrid languages support thesophisticated requirements of modern softwareproduction? Surely a basic premise of object-oriented programming is to enable thedevelopment of sophisticated systems through theadoption of the simplest techniques possible?Software development technologies andmethodologies should not impede the productionof such sophisticated systems.

“The separation of concerns has otheradvantages as well. For example, program provingbecomes much more feasible when details ofsequencing and memory management are absentfrom the program. Furthermore, descriptions ofwhat is to be computed should be free of suchdetailed step-by-step descriptions of how to do itif they are to be evaluated with different machinearchitectures. Sequences of small changes to adata object held in a store may be an inappropriatedescription of how to compute something when ahighly parallel machine is being used withthousands of processors distributed throughout themachine and local rather than global storagefacilities.

2.1. Safety and Courtesy ConcernsThis critique makes two general types of

criticism, about ‘safety’ concerns and ‘courtesy’concerns. These themes recur throughout thiscritique, as C and C++ have flaws thatcompromise them frequently. Safety concernsaffect the external perception of the quality of theprogram. Failure to meet safety concerns results inunfulfilled requirements and program crashes.“Automating the administrative aspects means

that the language implementor has to deal withthem, but he/she has far more opportunity to makeuse of very different computation mechanismswith different machine architectures.”

Courtesy concerns affect the internal view ofthe quality of a program in the development andmaintenance process. Courtesy concerns areusually stylistic and syntactic, whereas safetyconcerns are semantic. The two often go together.

C++?? 2nd Edition page 3

Page 6: CPP - Literate Programming

It is courtesy for an airline to keep its fleet wellmaintained. This courtesy concern is also verymuch a safety concern.

descendant classes are part of the same namespace as classes they inherit from. Theredeclaration of a name within the same scopeshould cause a name clash. Allowing two entitiesto have the same name within one scope causesambiguity and other problems. (See the section onname overloading.)

Courtesy issues are even more important inthe context of reusable software. Reusabilitydepends on the clear communication of thepurpose of a module. Courtesy is important toestablish social interactions, such as com-munication. Courtesy implies inconvenience tothe provider, but provides convenience to others.Courtesy issues include choosing meaningfulidentifiers, consistent layout and typography,meaningful and non-redundant commentary, etc.Courtesy issues are more than just a styleconsideration. A language design should directlysupport courtesy issues. A language, however,cannot enforce courtesy issues, and it is oftenpointed out that poor, discourteous programs canbe written in any language. But this is no reasonfor being careless about the languages that wedevelop and choose for software development.

The following example illustrates the secondproblem:

class A{

public:void nonvirt ();virtual void virt ();

}

class B : public A{

public:void nonvirt ();void virt ();

}3. C++ Specific CriticismsA a;B b;

3.1. Virtual Functions A *ap = &b;Polymorphism is a key concept of OOP.

Virtual functions are one way to implementpolymorphism. A language designer’s choice iswhether this should be specified in the parent orthe inheriting class. Is it the decision of thedesigner of the parent or descendant class? Casescan be made for both. They are not mutuallyexclusive and can be catered for quite easily in anobject-oriented language.

B *bp = &b;

bp->nonvirt (); // calls B::nonvirt // as you would // expectap->nonvirt (); // calls A::nonvirt, // even though this // object is of type // B.

There are three options, corresponding to‘must not’, ‘can’, and ‘must’ be redefined:

ap->virt (); // calls B::virt, the // correct version of

1) The redefinition of a routine is prohibited;descendant classes must use the routine as is.

// the routine for B // objects.

2) A routine could be redefined. Descendantclasses can use the routine as provided, or providetheir own implementation as long as it conformsto the original interface definition andaccomplishes at least as much.

In this example, class B has extended orreplaced routines in class A. B::nonvirt is theroutine that should be called for objects of type B.It could be pointed out that C++ gives the clientprogrammer flexibility to call either A::nonvirt orB::nonvirt. But this can be provided in a simplermore direct way. A::nonvirt and B::nonvirt shouldbe given different names. That way theprogrammer calls the correct routine explicitly,not by an obscure and error prone trick of thelanguage, as follows:

3) A routine is abstract. No implementationis provided and each non-abstract descendent classmust provide its own implementation. This ispolymorphism.

The base class designer must decide options 1and 3. Descendant class designers must decideoption 2. A language should provide direct syntaxfor these options. class B : public A

{Option 1 public:

C++ does not cater for the first option. Notusing a virtual function is the closest. But in thatcase the routine can be completely replaced. Thiscauses two problems. Firstly, a routine can beunintentionally replaced in a descendent. Thecompiler should report a syntax error due to‘duplicate declaration’. This is logical as

void b_nonvirt ();void virt ();

}

B b;B *bp = &b;

C++?? 2nd Edition page 4

Page 7: CPP - Literate Programming

bp->nonvirt (); // calls A::nonvirt whether the function f() is defined virtual or non-virtual in order to interpret exactly what a->f ()means. Therefore, the statement a->f () is notimplementation independent. A change in thedeclaration of f () will change the semantics of theinvocation. Implementation independence meansthat a change in the implementation DOES NOTchange the semantics, of executable statements.

bp->b_nonvirt (); // calls // B::b_nonvirt

Now the designer of class B has direct controlover B’s interface. The application requires thatclients of B can call both A::nonvirt, andB::b_nonvirt. B’s designer has explicitly providedfor this. This is good object-oriented design,which provides strongly defined interfaces. C++allows client programmers to play tricks with theclass interfaces, external to the class, and B’sdesigner cannot prevent A::nonvirt from beingcalled. This is opposite to good modular design.This shows the unsafeness C++’s virtualmechanism. Objects of class B have their ownspecialised ‘nonvirt’. But B’s designer does nothave control over B’s interface to ensure that thecorrect version of nonvirt is called.

If a change in the declaration changes thesemantics, this should generate a compilerdetected error. The programmer should make thestatement semantically consistent with thechanged declaration. This reflects the dynamicnature of software development, where theprogram text is subject to perpetual change.

For yet another case of the inconsistentsemantics of the statement a->f () vs constructors,consult section 10.9c, p 232 of the C++ ARM.[Sakkinen 92] points out that a descendant classcan redefine a private virtual function even thoughit cannot access that function in other ways. Whenthe ancestor class calls the function it insteadinvokes the function in the descendant class.

C++ also does not protect class B from otherchanges in the system. Suppose we need to write aclass C that needs ‘nonvirt’ to be virtual. Then‘nonvirt’ in A will be changed to virtual. But thisbreaks the B::nonvirt trick. The requirement ofclass C to have a virtual routine forces a change inthe base class. This has an effect on all otherdescendants of the base class, instead of thespecific new requirement being localised to thenew class. This is opposite to the reason for OOPhaving loosely coupled classes, so that newrequirements, and modifications will havelocalised effects, and not require changeselsewhere which can potentially break otherexisting parts of the system.

Option 2The second option should be left open for the

programmers of descendant classes. In C++,however, the decision must be made in the baseclass. In object-oriented design, the decisions youdecide not to make are as important as thedecisions you make. Decisions should be made aslate as possible. This strategy prevents mistakesbeing built into the system at early stages. Bymaking early decisions, you are often stuck withassumptions that later prove to be incorrect. C++requires the parent class to specify potentialpolymorphism by virtual (although anintermediate class in the inheritance chain canintroduce virtual). This prejudges that a routinemight be redefined in descendants. This can be aproblem because routines that aren’t actuallypolymorphic are accessed via the slightly lessefficient virtual table technique instead of astraight procedure call. (This is never a large over-head but object-oriented programs tend to usemore and smaller routines making routineinvocation a more significant overhead.) Thepolicy in C++ should be that routines that mightbe redefined should be declared virtual.

Rumbaugh et al, put their criticism of C++’svirtual as follows: “C++ contains facilities forinheritance and run-time method resolution, but aC++ data structure is not automatically object-oriented. Method resolution and the ability tooverride an operation in a subclass are onlyavailable if the operation is declared virtual in thesuperclass. Thus, the need to override a methodmust be anticipated and written into the originclass definition. Unfortunately, the writer of aclass may not expect the need to definespecialized subclasses or may not know whatoperations will have to be redefined by a subclass.This means that the superclass often must bemodified when a subclass is defined and places aserious restriction on the ability to reuse libraryclasses by creating subclasses, especially if thesource code library is not available. (Of course,you could declare all operations as virtual, at aslight cost in memory and function-callingoverhead.)” [RBPEL91]

Virtual, however, is the wrong mechanism forthe programmer to deal with. A compilationsystem can detect polymorphism, and generate theunderlying virtual code, where and only wherenecessary. Having to specify virtual burdens theprogrammer with another bookkeeping task. Thisis the main reason why C++ is a weak object-oriented language as the programmer mustconstantly be concerned with low level details.The compiler should take care of such detail andso relieve the programmer.

A further argument is that any statementshould consistently have the same semantics. Theobject-oriented interpretation of a statement likea->f () is that the most suitable implementation off() is invoked for the object referred to by ‘a’,whether the object is of type A, or a descendent ofA. In C++, however, the programmer must know

Another problem in C++ is mistakenredefinition. The base class routine can be

C++?? 2nd Edition page 5

Page 8: CPP - Literate Programming

redefined unwittingly. The compiler should reportan erroneous name redefinition within the samename space unless the descendant classprogrammer specifies that the routine redefinitionis really intended. The same name can be used,but the programmer must be conscious of this,and state this explicitly, especially inenvironments where systems are assembled out ofpreexisting components. Unless the programmerexplicitly overrides the original name a syntaxerror should report that the name is a duplicatedeclaration. C++, however, adopted the originalapproach of Simula. This approach has beenimproved upon, and other languages have adoptedbetter, more explicit approaches, that avoid theerror of mistaken redefinition.

3.2. Pure Virtual FunctionsAs mentioned above, pure virtual functions

provide a means of leaving a function undefinedand abstract. A class that has such an abstractfunction cannot be directly instantiated. A non-abstract descendant class must define the function.The C++ pure virtual syntax is:

virtual void fn () = 0;

This leaves the reader to guess its meaning,even those well versed in object-orientedconcepts. A better choice would have been akeyword such as ‘abstract’. Direct expression ofconcepts enhances communication, and the easewith which a language can be learnt. Whenlearning a language it is often important to use theindex of a text book. A keyword like ‘abstract’would be easily found in an index. But what doyou look for in the case of ‘= 0’? You might noteven realise it is significant. It should havesyntactic significance as abstract functions are avery important concept in object-oriented design.The C++ decision is in keeping with the Cphilosophy of avoiding keywords. This is often atthe expense of clarity. A keyword wouldimplement this concept more clearly. Forexample:

Eiffel and Object Pascal cater for this situationas the descendant class programmer is required tospecify that redefinition is intended. This has theextra benefit that a later reader or maintainer ofthe class can easily identify the routines that havebeen redefined, and that this definition is relatedto a definition in an ancestor class without havingto refer to ancestor class definitions. Thus option2 is exactly where it should be, in descendantclasses.

Option 3pure virtual void fn ();

The pure virtual function caters for the thirdoption. The routine is undefined, the class isabstract and cannot be directly instantiated. Adescendant class must define the routine if it is tobe instantiated. Any descendants that do notdefine the routine are also abstract classes. Thisconcept is correct, but see the section on purevirtual functions for a criticism of the syntax.

or

abstract void fn ();

The mathematical notation used in C++suggests that values other than zero could be used.What if the function is equated to 13? -

virtual void fn () = 13;Virtual is a difficult notion to grasp. The

related concepts of polymorphism and dynamicbinding, redefinition, and overloading are easier tograsp, being oriented towards the problemdomain. Virtual routines are an implementationmechanism for polymorphism. Polymorphism isthe ‘what’, and virtual is the ‘how’. Smalltalk andObjective-C use a different mechanism toimplement polymorphism. Virtual is an exampleof where C++ obscures the concepts of OOP. Theprogrammer has to come to terms with low levelconcepts, rather than the higher level object-oriented concepts. Interesting as underlyingmechanisms might be for the theoretician orcompiler implementer, the practitioner should notbe required to understand or use them to makesense of the higher level concepts. Having to usethem in practice is tedious and error-prone, andcan prevent the adaptation of software to furtheradvances in the underlying technology andexecution mechanisms (see concurrency).

A function is either pure, or it is not. This toany analyst suggests a boolean state, which asingle keyword conveys. A simple suggestion tofix this is to define ‘= 0’ as abstract:

#define abstract = 0

then

virtual void fn () abstract;

‘Pure virtual’ is also an abuse of naturallanguage. It is a combination of words that aresomewhat opposite in meaning. Pure meanssomething that really is what it appears to be. Forexample pure gold. Virtual means something thatappears to be what it actually is not. For examplevirtual memory. Perhaps virtual gold could befools gold. As has been said before, virtual is adifficult concept to grasp. When it is combinedwith a word such as ‘pure’, the meaning becomeseven more obscure. Modern language designersshould be very careful in the vocabulary theychoose.

C++?? 2nd Edition page 6

Page 9: CPP - Literate Programming

3.3. The Nature of Inheritance before. Assembling software components isbuilding a system that has never existed before.Inheritance is a close relationship. It provides

a fundamental way to assemble softwarecomponents. Objects that are instances of a classare also instances of all ancestors of that class. Foreffective object-oriented design the consistency ofthis relationship should be preserved. Eachredefinition in a subclass should be checked forconsistency with the original definition in anancestor class. A subclass should preserve therequirements of an ancestor class. Requirementsthat cannot be preserved indicate a design errorand perhaps inheritance is not appropriate.Consistency due to inheritance is fundamental toobject-oriented design. C++’s implementation ofnon-virtual overloading, and overloading bysignature (see below) means that the compilercannot check for this consistency. C++ does notrealise this aspect of object-oriented design. Thiscontributes to a wide and costly gap betweenanalysis and design, and implementation.

Inheritance in C++ is like a jig-saw where thepieces fit together, but the compiler has no way ofchecking that the resultant picture makes sense. Inother words C++ has provided the syntax forclasses and inheritance but not the semantics.Certainly, not very many reusable C++ librariesare available, which suggests that C++ might notsupport reusability as well as possible. C++ failsto provide this fundamental goal of object-oriented design and programming.

3.4. Function OverloadingC++ allows functions to be overloaded if the

arguments in the signature are of different types.Such overloading can be useful as these examplesshow:

max (int, int);max (real, real);

Inheritance has been classified as ‘syntactic’inheritance and ‘semantic’ inheritance. Saake et aldescribe these as follows : “Syntactic inheritancedenotes inheritance of structure or methoddefinitions and is therefore related to the reuse ofcode (and to overriding of code for inheritedmethods). Semantic inheritance denotes in-heritance of object semantics, ie of objectsthemselves. This kind of inheritance is knownfrom semantic data models, where it is used tomodel one object that appears in several roles inan application.” [SJE91]. Saake et al concentrateon the semantic form of inheritance. Behaviouralor semantic inheritance expresses the role of anobject within a system.

This will ensure that the best max routine forthe types int and real will be invoked. Object-oriented programming, however, provides avariant on this. Since the object is passed to theroutine as a hidden parameter (‘this’ in C++), anequivalent but more restricted form is alreadyimplicitly included in object-oriented concepts. Asimple example such as the above would beexpressed as:

int i, j;real r, s;

i.max (j);r.max (s);

Wegner, however, believes code inheritance tobe of more practical value. He classifies thedifference between syntactic and semanticinheritance as code and behaviour hierarchies[Weg90] (p43). He suggests these are rarelycompatible with each other and are oftennegatively correlated. Wegner also poses thequestion of “How should modification ofinherited attributes be constrained?” Codeinheritance provides a basis for modularisation.Behavioural inheritance provides modelling bythe ‘is-a’ relationship. Both are useful in theirplace. Both require consistency checks thatcombinations due to inheritance actually makesense.

but i.max (r) and r.max (j) result incompilation errors because the types of thearguments do not agree. (By operator overloadingof course, these can be better expressed, i max jand r max s, but min and max are peculiarfunctions that might want to accept two or moreparameters of the same type.)

The above shows that in most cases, theobject-oriented paradigm can consistently expressfunction overloading, without the need for thefunction overloading of C++. C++, however, doesmake the notion more general. The advantage isthat more than one parameter can overload afunction, not just the implicit current object pa-rameter.

It seems that inheritance is most powerful inthe most restrictive form of a semanticspreserving relationship. A subclass should notbreak the assumptions of an ancestor class.

The disadvantage is that C++ introduces someinconsistencies that the compiler cannot detect. Ifthe programmer intends to redefine a virtualroutine, but makes a mistake in the declaration ofthe function signature, the compiler willerroneously assume an overloaded function. Anycalls to the function using one or other of thesignatures will also fail to detect theinconsistency.

Software components are like jig-saw pieces.When assembling a jig-saw the shape of thepieces must fit, but more importantly, theresulting picture must make sense. Assemblingsoftware components is more difficult. A jig-sawis reassembling a picture that was complete When calling the routine, if the programmer

makes a mistake in supplying the actual

C++?? 2nd Edition page 7

Page 10: CPP - Literate Programming

parameters, a C++ compiler cannot be specificabout the error. It can only report that no functionwith a matching signature could be found.Programmers make this sort of mistake for subtlereasons, and it can be time consuming to pinpointthe parameter at fault. Secondly, the incorrectparameter might accidentally match, one of theother routines. In that case this error will bepropagated into the production code, and couldremain undetected a long time.

and attempting to do so considerably complicatesdesign.

3.6. Name overloadingNaming is fundamentally important in

producing self-documenting software. Naminghelps realise maintainable and reusable softwarecomponents. Names are fundamental in freeingprogrammers from low level manipulation ofaddresses. Naming is the basis for differentiatingbetween different entities in a software module.Name overloading allows the same name to referto two or more different entities. The problem iswhether the resultant ambiguity is useful, and howto resolve it, as ambiguity weakens the power ofnames to distinguish entities.

If it is felt that C++’s scheme of havingparameters of different types is useful, it shouldbe realised that object-oriented programmingprovides this in a more restricted and disciplinedform. This is done by specifying that theparameter needs to conform to a base class. Anyparameter passed to the routine can only be a typeof the base class, or a subclass of the base class.For example:

Name overloading is useful for two purposes.Firstly it allows programmers to work on two ormore modules without concern about nameclashes. The ambiguity can be tolerated as withinthe context of each module, the nameunambiguously refers to a unique entity.Secondly, name overloading providespolymorphism, where the same name applied todifferent types refers to different implementationsfor those types. Polymorphism allows one word todescribe ‘what’ is to be computed. Differentclasses might require different specifications of‘how’, a computation is done. For example ‘draw’is an operation that is applicable to all differentshapes, even though circles and squares, etc are‘drawn’ differently.

A.f (B someB) {...};

class B ...;class D : public B ...

A a;D d;

a.f (d);

The entity ‘d’ must conform to the class ‘B’,and the compiler checks this.

The alternative to function overloading bysignature, is to require functions with differentsignatures to have different names. Names shouldbe the basis of distinction of entities. This isknown to work and solves the above problems.The compiler can cross check that the parameterssupplied are correct for the given routine name.This also results in better self-documentedsoftware. It is often difficult to choose appropriatenames for entities, but it is well worth the effort.

These two uses of name overloading provide apowerful concept. But use of the same name inthe same context must be resolved. Errors canresult from ambiguity. In this case theprogrammer needs to differentiate betweenentities in ways other than name alone. Acommon way to do this is to introduce extradistinguishing names. For example in a group ofpeople, where two or more share the same firstname, they can be distinguished by their surname.Similarly a unique first name will distinguish themembers of a family with a common surname.

3.5. Virtual ClassesIf class D multiply inherits class A via classes

B and C, then if D wants to inherit only a singlecopy of A, the inheritance of A must be specifiedas virtual in both B and C. This raises twoquestions. Firstly, what happens if A is declaredvirtual in only one of B or C? Secondly, what ifanother class E wants to inherit multiple copies ofA via B and C? In C++, the virtual class decisionmust be made early, reducing the flexibility thatmight be required in the assembly of derivedclasses. In a shared software environmentdifferent vendors might supply classes B and C. Itshould be left to the implementer of class D or E,exactly how to resolve this problem. And this isthe simplest case. What if A is inherited via morethan two paths, with more than two levels ofinheritance? Such flexibility is key to reusablesoftware. You cannot envisage when designing abase class all the possible uses in derived classes,

This is analogous to classes, where each classin a system is given a unique name. Each memberwithin a class is also given a unique name. Wheretwo objects with members of the same name areused within the same context, the object name canqualify the members. For example a.mem andb.mem.

[Reade 89] points out the difference betweenoverloading and polymorphism. Overloadingmeans the use of the same name in the samecontext for different entities with completelydifferent definitions and types. Polymorphismthough has one definition, and all types aresubtypes of a principle type. C. Strachey referredto polymorphism as parametric polymorphism andoverloading as ad hoc polymorphism.

Block structured languages provideoverloading by scoping. Scoping allows the same

C++?? 2nd Edition page 8

Page 11: CPP - Literate Programming

name to be used in different contexts withoutclash or confusion. Nested blocks provide a subtleproblem. Names in an outer block are in scope ininner blocks. Many languages, however, allow aname to be overloaded in an inner block. Thisdoes more than overload the name, it hides it. Theuse of a name in the inner block does not indicateany relationship with the same name in the outerblock. Textually nested blocks ‘inherit’ namedentities from outer blocks. Inheritanceaccomplishes this in object-oriented languages.Inheritance eliminates the need to textually nestentities, and also accomplishes loose coupling.Nesting makes entities tightly coupled.

a combination of components, which quicklyleads to an exponentiation in the number of testsrequired.

C++ has an analogous form of hiding. A non-virtual function in a derived class hides a functionin an ancestor class. This hiding is explained insection 13.1 of the C++ ARM. This is adiscrepancy with declaring multiple functionswith the same name in the same class withdifferent signatures. A function in the derivedclass will hide the functions of the ancestor class,rather than add its signature to the list of possiblefunctions which can be called. This is confusingand error prone. Learning all these ins and outs ofthe language is extremely burdensome to theprogrammer. Often they will only be learnt afterfalling into a trap.

Contrary to most high level languages, a nameshould not be overloaded while it is in scope. Thisinconveniently hides the outer declaration, and theprogrammer cannot access the outer entity. It isalso error prone. The following exampleillustrates this:

3.7. Polymorphism and InheritanceInheritance provides a form of name

overloading similar to overloading in subblocks.The scope of a name is the class in which itoccurs. If a name occurs twice in a class, it is asyntax error. Inheritance introduces somequestions over and above this simpleconsideration of scope. Should a name declared ina base class be in scope in a derived class? Thereare three choices:

{ int i; {

int i; // hide the outer i. i = 13; // assign to the inner i.

// Can’t get to the outer i here. // It is in scope, but hidden. } 1) Names are in scope only in the immediate

class but not in subclasses. Subclasses can freelyreuse names because there is no potential for aclash. This precludes software reusability.Subclasses will not inherit definitions ofimplementation. Therefore case 1 is not worthconsidering.

}

Now delete the inner declaration:

{int i;

2) The name is in scope in a subclass, but thename can be overloaded without restriction. Thisis closest to the overloading of names in nestedblocks. This is C++’s approach. Two problemsarise. Firstly, the name can be unintentionallyreused. Secondly, because the new entity is notassumed to have any relationship to the original,its signature cannot be type checked with theoriginal entity. Since consistency checks betweenthe superclass and subclass are not possible, thetight relationship that inheritance implies, whichis fundamental to object-oriented design, is notguaranteed. This can lead to inconsistenciesbetween the abstract definition of a base class, andthe implementation of a derived class. If thederived class does not conform to the base class inthis way, it should be questioned why the derivedclass is inheriting from the base class in the firstplace. (See the nature of inheritance.)

{ i = 13; // Syntactically valid,

// but not the // intention.

}}

The inner overloaded declaration is removed,and references to that name do not result in syntaxerrors due to the same name being in the outerenvironment. The inner instruction nowmistakenly changes the value of the outer entity.A compiler cannot detect this situation unless thelanguage definition forbids nested redeclarations.E.W. Dijkstra uses similar reasoning in ‘An essayon the Notion: “The Scope of Variables”’ in “ADiscipline of Programming”, [Dijkstra 76].

The above example demonstrates how nestingresults in unmaintainable programs. This isbecause the inner block is tightly coupled to theouter block, and each is sensitive to changes in theother. The advantage of keeping componentsdecoupled and separate is that a programmer canconfidently make modifications to one componentwithout affecting other components. Testing canbe limited to the changed component, rather than

3) The name is in scope in the subclass, butcan only be overloaded in a disciplined way toprovide a specialisation of the original. Other usesof the name are reported as duplicate name errors.This form of overloading in a subclass ensures theentity referred to in the subclass is closely relatedto the entity in the ancestor class. This helpsensure design consistency. The relationship of

C++?? 2nd Edition page 9

Page 12: CPP - Literate Programming

name scope is not symmetric. Names in a subclassare not in scope in a superclass (although this isnot the case in typeless languages such asSmalltalk). In order to provide the consistentcustomisation of reusable software components,the same name should only be used by explicitlyredefining the original entity. The programmer ofthe descendant class should indicate that this isnot a syntax error due to a duplicate name, butthat redefinition is intended. (This has alreadybeen covered in the virtual section.) This choiceensures that the resultant class is logicallyconstructed. This might seem restrictive, but isanalogous to strong typing, and makes inheritancea much more powerful concept.

3.9. Anonymous parameters in ClassDefinitions

C++ does not require parameters in functiontemplates to be named. The type alone can bespecified. For example a function f in a classheader can be declared as f (int, int, char). Thisgives the client no clue to the purpose of theparameters, without referring to theimplementation of the function. Meaningfulidentifiers are essential in this situation, becausethis is the abstract definition of a routine. A clientof the class and routine must know that the firstint represents a ‘count of apples’, etc. It is truethat well known routines might not require aname, for example sqrt (int). But this is notappropriate for large scale software development.The use of anonymous parameters handicaps thepurpose of abstract descriptions of classes andmembers: to facilitate the reusability of software.Program text captures the meaning of the systemfor some future activity, such as extension ormaintenance. To achieve reusability,communication of intent of a software element isessential. A compiler strips away this level ofcommunication, producing a machine executableentity. Languages and compilers that perform lessthan optimal translations should not penalisecareful production of semantic entities. Butneither should a language definition allow lessthan optimal expression to the human reader.Languages do not have to be cryptic to achieveefficiency. In fact cryptic languages impairefficiency, as they make it harder for the pro-grammer to develop efficient systems, andfurthermore, they make it harder for automaticcode optimisers.

3.8. ‘.’ and ‘->’The ‘.’ and ‘->’ member access syntax came

from C structures. It illustrates where the C baseadversely affects flexibility. Semantically bothaccess a member of an object. They are, however,operationally defined in terms of how they work.The dot (‘.’) syntax accesses a member in anobject directly. For example ‘x.y’ means accessthe member y in the object x.

obj x; // declare object x of // class obj // with a member y.

x.y; // access y in object x // directlyx->y; // syntax error “. expected”

The ‘->’ syntax means access a member in anobject referenced by a pointer. For example ‘x->y’(or the equivalent *(x).y) means access themember y in the object pointer x refers to . Names are not strictly necessary in

programming. Naming exists to help the humanreader identify different entities within theprogram, and to reason about their function. Forthis reason naming is essential. Without naming,development of sophisticated systems would benearly impossible. Some languages accessparameters by their address (position) in theparameter list ($1, $2, etc). This is quiteunsatisfactory, even for shell scripts. Anonymousparameters can save typing in a function template,but then programming is not a matter of conve-nience. This is inconvenient for later readers. Theredundancy is beneficial and saves laterprogrammers having to look up the information inanother place. A real convenience in functiontemplates would be that abstract functiontemplates be automatically generated from theimplementation text (see header files for moredetails).

obj *x; // declare a pointer x to an // object of class obj.

x->y; // access y via pointer xx.y; // syntax error “-> expected”

In this example, ‘what’ is to be computed is“access the element y of object x.” In C++,however, the programmer must specify for everyaccess the trivial detail of ‘how’ this is done. Thecompiler can easily remove this burden from theprogrammer, as in fact most languages do.Furthermore, this reduces flexibility as if the ‘objx’ declaration is changed to ‘obj *x’, the effect iswidespread as all ‘x.y’ must be changed to ‘x->y’.Since the compiler gives a syntax error if thewrong access is used, this shows it already knowswhat access code is required and can generate itautomatically. Good programming centralisesdecisions. The decision to access the objectdirectly or via a pointer should be centralised inthe declaration.

Anonymous parameters illustrate the linkbetween courtesy and safety issues inprogramming. Due to pressure of work, a clientprogrammer might wrongly guess the purpose of aparameter from the type. Thus the failure of theoriginal programmer to provide a courtesy has

C++?? 2nd Edition page 10

Page 13: CPP - Literate Programming

caused a later programmer to breach safety. Aninterface client must know the intention of theinterface for it to be used effectively.

3.12. Optional ParametersOptional parameters that assume a default

value according to the routines declaration aresupposed to provide a shorthand notation.Shorthand notations are intended to speed upsoftware development. Such shorthand notationscan be convenient in shell scripts, and interactivesystems. In large scale software production,however, precision is mandatory, and defaults canlead to ambiguities and mistakes. With optionalparameters the programmer could assume thewrong default for a parameter. More importantly,optional parameters undermine type safety. Thetype of a function is defined by the compositionof its input types, and its output type:

3.10. Nameless ConstructorsMultiple constructors can have different

signatures, similar to overloaded functions. Thisprecludes two or more constructors having thesame signature. Constructors are also not named(apart from the same name as the class). Thismakes it difficult to discern from the class headerthe purpose of the different constructors. It isdifficult to match an object creation with thecalled constructor. Constructors suffer from all ofthe problems described with regards to functionswith the same name but different signatures. Itwould be easy to mark routines as constructors,for example:

f: T1 x T2 x T3... -> T4

The entire signature determines the type of thefunction, not just the return type. Optionalparameters mean that C++ is not type safe, andthat the compiler cannot check that the parametersin the call exactly match the function signature.

constructor make (...)...constructor clone (...)...constructor initialise (...)...

Furthermore, they do not provide a great dealof convenience. If a routine has five parameters,the last three of which are optional, and callerwants to assume the defaults for parameters 3 and4, but must specify parameter 5, then all fiveparameters must be specified. A better schemewould be to have a ‘default’ keyword in functioncalls:

where each constructor leaves the object invalid, but potentially different states. Namedconstructors would aid comprehension as to whatthe constructor is used for in the same way asfunction names document the purpose of afunction. Secondly, named constructors wouldallow multiple constructors with the samesignature. Thirdly, it is easier to match up anobject creation with the constructor actuallycalled.

f (a, b, default, default, e);

Other means, already in the language, caneasily provide this mechanism. For example, acall to another (possibly inline) function couldprovide the defaults for the optional parameters.This not only provides the convenience ofoptional parameters, but is more powerful. Anyparameter or combination can be filled in with anycombination of defaults, not just the lastparameters. Multiple intermediate routines canprovide multiple sets of defaults.

3.11. Constructors and TemporariesA ‘return <expression>’ can result in a

different value than the result of <expression>. Insection 6.6.3, the C++ ARM says “If required theexpression is converted, as in an initialisation, tothe return type of the function in which it appears.This may involve the construction and copy of atemporary object (S12.2).”

Section 12.2 explains “In some circumstancesit may be necessary or convenient for the compilerto generate a temporary object. Such introductionof temporaries is implementation dependent.When a compiler introduces a temporary object ofa class that has a constructor it must ensure that aconstructor is called for the temporary object.”

3.13. Bad deletionsThe following example is given on p.63 in the

C++ ARM as a warning about bad deletions thatcannot be caught at compile-time, and probablynot immediately at run-time:

A note says “The implementation’s use oftemporaries can be observed, therefore, throughthe side effects produced by constructors anddestructors.”

p = new int[10];p++;delete p; // errorp = 0;

Putting this together, creation of a temporaryis implementation dependent, so might or mightnot be done. If a temporary is created, aconstructor is called as a side effect, which canchange the state of the object. Different C++implementations could therefore return differentresults for the same code.

delete p; // ok

One of the restrictions of the design of C++ isthat it must remain compatible with C. Thisresults in examples like the above, that are ill-defined language constructs, that can only becovered by warnings of potential disaster.Removal of such language deficiencies wouldresult in loss of compatibility with C. This might

C++?? 2nd Edition page 11

Page 14: CPP - Literate Programming

be a good thing if problems such as the abovedisappear. But then the resultant language mightbe so far removed from C that C might be bestabandoned altogether.

data. Friend is a ‘limited export’ mechanism.Friends have three problems:

1) They can change the internal state ofobjects from outside the definition of the class.

2) They introduce extra coupling betweencomponents, and therefore should be usedsparingly.

3.14. Local entity declarationsDeclaring an entity close to where it is used,

has both advantages and disadvantages. It isconvenient, but can make a routine appear morecomplex and cluttered. A problem is that anidentifier can be mistakenly overloaded within anested block in a function, with the resultantproblems covered in the sections on nameoverloading and nesting. C does not have nestedroutines or blocks so does not have this problem.ALGOL uses this simple form of nameoverloading. (A block in the ALGOL sensecontains both declarations and instructions.)

3) They have access to everything, ratherthan being restricted to the members of interest tothem.

Friends are useful, and a case can be made forshades of grey between public, protected andprivate members. Multiple interfaces to a classprovide the functionality of friends and avoid theabove problems. Each interface to the class can beexported to everything, or selected classes only. Aselective export mechanism is more general thanpublic, private, protected and friend, andexplicitly documents the couplings betweenentities in the system. Selective export specifiesnot only that a member is exported but to whichclasses it is exported.

The ARM explains problems of localdeclarations with branching, which shows thecomplications in intermingling declarations andinstructions. Caveats cannot make up for or fixfaulty language definition.

One reason given for friends, is that theyallow more efficient access to data members thana function call. The way C++ is often used is thatdata members are not put in the public section,because this breaks the data hiding principle.

The C++ FAQ [Cline] (Q83) is unclear on thispoint (although it is mostly excellent), claimingthat an object is created and initialised at themoment it is declared. This only applies to auto,in stack objects. Dynamic entities are not createdand initialised until they are the subject of a ‘new’instruction. In well written object-orientedsoftware, routines will be small, typicallyperforming one atomic action per routine.

Data hiding is better described as‘implementation hiding’. Only a classes abstractfunctional interface should be visible to theoutside world. That is data members can beexported, but are viewed externally as functionalentities. This is because, when used inexpressions, functions and variables have nosemantic difference. They both return values of agiven type. (See fn () for an explanation of whyvariables and functions are best regarded assimilar entities.) (See also Marshall Cline’sexplanation of friends in the FAQ for furtherclarification of the friend concept.)

Small routines that implement atomicoperations are fundamental to loose coupling. Forexample, a base class that provides a singleroutine that logically performs operations A andB, is not useful to a subclass that needs to provideits own implementation of B, but does not want tochange A. The descendant must reimplement thelogic of both A and B, missing an opportunity toreuse the logic of A. Tight coupling reducesflexibility. Splitting A and B into differentroutines accomplishes loose coupling, andtherefore flexibility. Efficiency is also attainedwithout the mess of local entity declarations.Good design and clean modularisation achieveefficiency, as the entities which would be locals toa block in C++ are only created when the routineis entered.

The Cambridge Encyclopedia of Languagehas an interesting point about public and privatenames. It says “Many primitive people do not liketo hear their name used, especially inunfavourable circumstances, for they believe thatthe whole of their being resides in it, and theymay thereby fall under the influence of others.The danger is even greater in tribes (in Australiaand New Zealand, for example), where people aregiven two names - a ‘public’ name, for generaluse, and a ‘secret’ name, which is only known byGod, or to the closest members of their group. Toget to know a secret name is to have total powerover its owner.”

3.15. MembersCare should be taken with the C++ use of the

term member. In general use, an object is amember of a class. This corresponds to membersin set theory. But in C++, the term member meansa data item, or function of the class. Thisambiguity could have easily been avoided. 3.17. Static

The word ‘static’ is confusing in C++. Page98 of the C++ Annotated Reference Manual(ARM) mentions this confusion and gives twomeanings. Firstly, a class can have static

3.16. FriendsFriends are a mechanism to override data

hiding. Friends of a class have access to its private

C++?? 2nd Edition page 12

Page 15: CPP - Literate Programming

members, and a function can have static entities.The second meaning comes from C, where a staticentity is local in scope to the current file. Thechoice of different keywords would easily solvethis trivial problem. There is also a third moregeneral meaning that objects are statically orautomatically allocated and deallocated on thestack when a block is entered and exited, asopposed to dynamically allocated in free space.

variants. Inheritance and polymorphism providethis in OOP. A reference to a superclass can alsobe used to refer to any subclass, and thus providesthe same semantics as union, only in a type safemanner, as the alternatives can never be confused.An object reference is implicitly a union of allsubclasses.

3.19. Nested ClassesStatic class members are useful. Page 181 of

the ARM states that statics reduce the need forglobal variables. It is good to reduce globalvariables, but the C syntax obscures the purpose.

Simula provided textually nested classessimilar to nested procedures in ALGOL. Textual(syntactic) nesting should not be confused withsemantic nesting, nor static modelling withdynamic run time nesting. Modelling is done inthe semantic domain, and should be divorcedfrom syntax. You do not need textually nestedclasses to have nested objects. Nested classes arecontrary to good object-oriented design, and thefree spirit of object-oriented decomposition,where classes should be loosely coupled, tosupport software reusability. Semantic nesting isachieved independently of textual nesting. Inobject-oriented design all objects should interactonly via well defined interfaces. Objects of a classthat is textually nested in another class haveaccess to the outer object without the benefit of aclean interface. C avoided the complexity ofnested functions, but C++ has chosen to imple-ment this complexity for classes, which is of lessuse than nested functions.

Entities declared in functions can also bestatic. These are not needed in an object-orientedlanguage. The reason and history is this. ALGOLhas the notion of ‘OWN’ locals in blocks. Thesemantics of an OWN entity is that when a blockis exited, the value of the OWN is preserved forthe next entry to the block. I.e. the value ispersistent. The implementation is that at compiletime, the OWN entity is limited in scope to theblock, but at run time, it is located in the globalstack frame. The same instance of the variable isused in all invocations of the procedure, ratherthan each invocation using separate local storageon the stack. This causes complication in recur-sion.

Simula’s designers generalised the ALGOLnotion of block into class, and so object-orientation was born. Instead of discarding a classblock on exit, it is made ‘persistent’. Declarationswithin the class block are persistent, and thereforeprovide the functionality of static and OWN.Classes are more flexible than statics. Statics arepersistent in the same way as globals, ie for theduration of the program. Class member lifetime isgoverned by the lifetime of the object. Object-oriented languages do not need OWNs or statics.

OOP achieves nesting in two ways: byinheritance and object-oriented composition. Thusmodelling nesting is achieved without tighttextual coupling. For example, consider a car. Weknow in the real world that the engine isembedded within the car. In object-orientedmodelling, however, this embedding is modelledwithout textual nesting. Both car and engine areseparate classes. The car contains a reference to anengine object. This also allows the vehicle andengine hierarchy to be independently defined.Engine is derived independently into petrol,diesel, and electric engines. This is simpler andmore flexible than having to define a petrol enginecar, a diesel engine car, etc, which you have to doif you textually nest the engine class in the car.Other examples can also be structured withouttextual nesting, and no loss of generality.

3.18. UnionUnion is another construct that is superfluous

in OOP. Similar constructs in other languages arerecognised as problematic. For example,FORTRAN’s equivalences, COBOLsREDEFINES, and Pascal’s variant records. Whenused to overload memory space these force theprogrammer to think about memory allocation.Recursive languages use a stack mechanism thatmakes overloading memory space unnecessary, asit is allocated and deallocated automatically forlocals when procedures are entered and exited.The compiler and run time system automaticallyallocate and deallocate storage as required,ensuring that two pieces of data never clash forthe same memory space. This is essential so thatthe programmer can concentrate on the problemdomain, rather than machine oriented details.When union is used similarly to FORTRAN’sequivalences it is not needed.

In C++, not only can classes be nested withinother classes, but also within functions, therebytightly coupling a class to a function. Thisconfuses class definition with object declaration.The class is the fundamental structure in object-oriented programming and nothing has existenceseparate from class (including globals). C++ isconfused as to whether it is procedure-oriented orobject-oriented.

3.20. Global EnvironmentsThe global environment provides a special

case of nested classes. When classes are nested ina global environment, dependencies can arise that

Union is also not needed to provide theequivalent to COBOL REDEFINES or Pascal’s

C++?? 2nd Edition page 13

Page 16: CPP - Literate Programming

make the classes difficult to decouple from thatenvironment, and therefore not reusable. Even if aclass is not intended for use in another context, itwill benefit from the discipline of object-orienteddesign. Each class is designed independently ofthe surrounding environment, and relationshipsand dependencies between classes are explicitlystated.

header B also includes header C. A simple butmessy fix in all headers solves this problem:

#ifndef thismod#define thismod

... rest of header#endif

Headers show how C++ addresses theproblem of independent modules by a non-object-oriented approach that is sub-optimal; theprogrammer must supply this bookkeepinginformation manually. A class interface isequivalent to a module header. A module headercontains data and routines exported to othermodules. This is exactly the purpose of the classinterface. A class definition contains allknowledge of component classes and theirdependencies (inheritance and client) in the classtext. Dependency analysis is derivable from theclass text. Tools like ‘make’ can be integrated intothe compiler itself, and the errors and tediumencountered in the use of ‘make’ are avoided.#includes relate to the organisation andadministration of a project. Rational language de-sign eliminates such bookkeeping mechanisms.

In C++ functions can change the globalenvironment, beyond the object in which they areencapsulated. Such changes are side-effects thatlimit the opportunity to produce loosely-coupledobjects, which is essential to enable reusablesoftware. This is a drawback of both global andnested environments.

A good OO language will only permit routinesin an object to change its state. Removing theglobal environment is trivial. It is simplyencapsulated in an object or set of objects of itsown. Therefore global entities are subject to thediscipline of object-oriented design. Havingglobals in a system circumvents OOD. Objectscan also provide a clean interface to the externalenvironment, or operating system, without loss ofgenerality, for a negligible performance penalty.Thus classes are independent of a surroundingenvironment, and the project for which they werefirst developed, and are more easily adaptable tonew environments and projects.

A traditional system is assembled bycombining modules. An object-oriented system isassembled by combining classes. Modules are aprimitive form of classes. Classes are moresophisticated. They express more preciselyrelationships with other classes. C++ #includesand modules have problems. This primitivemethod is not required in an object-orientedlanguage.

3.21. Header FilesIn C++ a class interface must be maintained

separately from its body. While an abstractinterface should be distinct from a concreteimplementation, the interface and implementationcan both be derived from one source. In C++though, programmers must maintain the two setsof information. Replicated information has wellknown drawbacks. In the event of change, bothcopies must be updated. This can lead toinconsistencies that must be detected andcorrected. Tools can automatically extract abstractclass descriptions from class implementations,and guarantee consistency.

3.22. Class InterfacesSection 9.1c of the C++ ARM points out that

C++ has no direct support for “interfacedefinition” and “implementation module”. In aC++ class definition, all private and protectedmembers must be included in the public text ofthe class. The ARM points out that whenever theprivate or protected parts are changed, the wholeprogram must be recompiled. Further to what theARM says, all modules that are dependent on theheader file must be recompiled, even though theprivate and protected members do not affect othermodules. Private members should not be in theabstract class interface, as this exposesimplementation details to programmers of othermodules.

The programmer must also use #includes tomanually import class headers. #include is an oldand unsophisticated mechanism to providemodularity. #include is a weak form of inheritanceand import. C++ still uses this 30 year oldtechnique for modularisation, while otherlanguages have adopted more sophisticatedapproaches, for example, Pascal with Units,Modula with modules, Ada with packages. InEiffel the unit of modularisation is the class itself,and includes are handled automatically. The OOPclass is a more sophisticated way to modulariseprograms. Inheritance implements reusability andmodularisation, so #include is superfluous.

3.23. Class header declarationsC’s syntax for function declarations is

[<type>] <identifier> (<parameters>). For (a verysimple) example:

class CAnother problem is that if header A includes

header B, and header B includes header A acircular dependency occurs. The same problemoccurs if header A includes headers B and C, and

{a ();b ();

C++?? 2nd Edition page 14

Page 17: CPP - Literate Programming

int c (); In C++ the programmer must manuallymanage storage due to the lack of garbagecollection. This is a difficult bookkeeping taskthat leads to two opposite problems. Firstly, anobject can be deallocated prematurely, while validreferences still exist (dangling pointers).Secondly, dead objects might not be deallocatedleading to memory filling up with dead objects(memory leaks). Attempts to correct eitherproblem can lead to overcompensation and theother problem occurring. A correct system is afine balance. This is illustrated in the figurebelow.

d ();char e ();virtual void f ();

}

To find an identifier in this layout, the eyemust trace a course around the type specifications.This is a tiring activity. The eye has a greaterchance of missing the sought identifier, and theprogrammer must resort to using the searchfunction of a text editor to help out.

Other languages place the entity names first.For example:

DanglingPointers

CorrectSystem

MemoryLeaksclass C

{a (); These problems contribute to the fragility of

C++ programs, and usually result in systemfailure. Garbage-collection solves both problems.Garbage-collection has an undeserved badreputation due to some early garbage-collectorshaving performance problems, instead of workingtransparently in the background, as they can andshould. These problems are often over-emphasised as a justification for C++ ignoringgarbage collection. A possible solution is to buildgarbage collection into the run time architecture,but allow the programmer to activate anddeactivate it manually. Garbage collection can bedisabled in systems where it is inappropriate.

b ();c () int;d ();e () char;f () virtual void;

}

To those used to the ALGOL and FORTRANstyle of type first, this seems backwards. Butname first is logical as a real world exampleillustrates. Imagine if a dictionary is published,and the keywords are not placed first, but ratherthe entry order is -

In C++ it might be argued that the lack ofgarbage-collection is not an engineeringcompromise. Its inclusion is nearly an engineeringimpossibility, as a programmer can undermine thestructures required for implementing correctlyworking garbage-collection. While garbage-collection might not actually be an impossibilityin C++ (EC++), it is difficult, and programmerswould have to settle for a more restricted way ofprogramming. This could be a good thing. Butthen the compromise to remain compatible with Cbecomes difficult, if the compiler is to detectpractices inconsistent with the operation ofgarbage-collection.

noun /obvrzen/ obversion, the act or result of obverting

Such a dictionary would not sell many copies,unless the marketers managed to fool manypeople that the explanation of the meaning wasmore correct because the order of layout wasmysteriously magical. This example illustrateshow important subtle syntax decisions are, andwhy PASCAL style languages might have orderedthings contrary to FORTRAN, ALGOL andothers. The language designer must consider thesetrivial but important alternatives. The layout ofprogramming entities is essential for effectivecommunication. The dual roles of languagesyntax, and programming style affectcomprehension. A dictionary or index style layoutsuggests placing entity names first, followed bytheir definition.

3.25. Type-safe linkageThe C++ ARM explains that type-safe linkage

is not 100% type safe. If it is not 100% type-safe,then it is unsafe. It is the subtle errors that causethe most problems, not the simple or obviousones. Often such errors remain undetected in thesystem until critical moments. The seriousness ofthis situation cannot be underestimated. Manyforms of transport, such as planes, and spaceprograms depend on software to provide safety intheir operation. The financial survival oforganisations can also depend on software. Toaccept such unsafe situations is at best ir-responsible.

3.24. Garbage CollectionOne of the hallmarks of high level languages

is that programmers declare data without regard tohow the data is allocated in memory. In blockstructured languages, local variables are allocatedon the stack, and automatically deallocated whenthe block exits. This relieves the programmer of agreat burden. Garbage collection providesequivalent relief in languages with dynamic entityallocation.

C++?? 2nd Edition page 15

Page 18: CPP - Literate Programming

The C++ ARM summarises the situation asfollows - “Handling all inconsistencies - thusmaking a C++ implementation 100% type-safe -would require either linker support or amechanism (an environment) allowing thecompiler access to information from separatecompilations.”

methodology of choice of disciplined thinkers.Some people can hold a whole problem andsolution in their head and work in a disciplinedfashion until the solution is complete. Mozart issaid to have composed this way, producing hislast three symphonies in as many months in 1788.Beethoven toiled far more over the production ofhis works, taking years to complete onesymphony. Both composers producedmasterpieces. Mozart wrote music directly,whereas Beethoven wrote themes and ideas in hisfamous sketchbooks. The production ofmasterpieces depends on skill, not on method-ologies.

So why does the C++ compiler (at leastAT&T’s) not provide for accessing informationfrom separate compilations? Why is there not aspecialised linker for C++, that actually provides100% type safety? There is no reason why C++should not be implemented this way. Buildingsystems out of preexisting elements is thecommon Unix style of software production. Thisimplements a form of reusability, but not in thetruly flexible manner of object-orientedreusability.

It is becoming accepted that the softwarelifecycle should be an integrated process.Analysis, design and implementation should be aseamless continuum.The activities of the lifecycleshould progress in parallel to expedite softwaredevelopment. Facts found out only as late as theimplementation stage can be fed back into theanalysis and design stages. The object-orientedapproach supports this process. Artificialseparation of the steps leads to a large semanticgap between the steps. The transformationsrequired to bridge such semantic gaps are prone tomisinterpretation, time consuming and costly.

In the future, Unix could be replaced byobject-oriented operating systems, that are indeed‘open’ to be tailored to best suit the purpose athand. By the use of pipes and flags, Unix softwareelements can be reused to provide functionalitythat approximates what is desired. This approachis valid and works with efficacy in someinstances, like small in-house applications, orperhaps for research prototyping, but isunacceptable for widespread and expensivesoftware, or safety critical applications. In the lastten years the advantages of integrated softwarehave been acknowledged. Classic Unix systemsdon’t provide those advantages. Integrated sys-tems are more ambitious, and place more demandson developers. But this is the sort of software nowbeing demanded by end users. Systems that arecobbled together are unacceptable.

The same people should be responsible for allstages. This way they take responsibility for thesystem as a whole, rather than passing the buckand blame which occurs when analysts, designersand implementers are different groups. This is nota popular viewpoint in traditional hierarchicalmanagement structures where programmers getpromoted to designers who get promoted toanalysts. Hierarchical management alsodiscourages people from feeling responsible for aproduct. This culture must radically change if weare to produce quality systems.

A further problem with linking is thatdifferent compilation and linking systems shoulduse different name encoding schemes. Thisproblem is related to type-safe linkage, but iscovered in the section on ‘reusability andcompatibility’.

We should have learnt from the extremesSA/SD. Some quarters believed that methodologywas all important, while programming andprogramming languages were unimportant.Arcane and machine-oriented programminglanguages strengthened this attitude. Theselanguages concentrate on the ‘how’ ofcomputation, whereas the modellers correctlydemand notations that express the ‘what’, in orderto be implementation independent. A modernsoftware language supports the integration of theactivities of design and implementation by beingreadable, and problem-oriented. A languageshould be as close to design as possible. Theneeds and requirements of an enterprise canchange much more rapidly than programmers cankeep up, especially in a highly competitive andcommercial world.

3.26. C++ and the software lifecycleThe software lifecycle has attracted a great

deal of attention. It is at least generally acceptedthat the activities in the lifecycle are analysis ofrequirements, design, implementation, testing anderror correction, extension. Unfortunately, theresult of identifying these activities has resulted ina school of thought that the boundaries betweenthese activities are fixed, and that they should besystematically separate, each being completedbefore the next is commenced. It is often arguedthat if they are not cleanly separated, then you arenot practicing disciplined system development.

This view is incorrect. Someone who writes aprogram straight away is actually doing all thesteps in parallel. It might not be the best way dodo things in many circumstances, might or mightnot suit the style and thinking of different people,but this works in some scenarios, and can be the

So how does C++ fit into this picture? Well itis based on C that was designed mainly as animplementation and machine-oriented language. Itis an old language, that did not need to considerthe integrated lifecycle approach. C++ might have

C++?? 2nd Edition page 16

Page 19: CPP - Literate Programming

some of the trappings of object-oriented concepts,but it is the marriage of a problem-orientedtechnique with a machine-oriented language. Itaddresses implementation, but not so well theother aspects of the software lifecycle. Since C++is not so well integrated with analysis and design,the transformation required to go from analysisand design to implementation is costly. Thesemantic gap between design languages and theimplementation language is great.

software component, require assurance that thecomponent is trustworthy. Trusting programmersis against the commercial interest of both parties.This is not to cast dispersion on programmers, butmerely recognises that computers are good atperforming mundane tasks and checks, but peopleare not. If people were good at such things, wewould not need computers in the first place.Building trustworthy components is a safetyconcern.

We should have learnt from the structuredworld that this is the incorrect approach to thesoftware lifecycle. But in the OO world we areagain falling into the trap of dividing the lifecycleinto artificially distinct activities of OOA, OODand OOP, instead of adopting an integratedapproach to these. Modern languages provide amuch more integrated approach to the completesoftware development process than C++. C++supports classes and inheritance and otherconcepts of object-orientation, but fails to addressthe entire software lifecycle.

3.29. Reusability and CompatibilityDifferent compiler implementations need to be

compatible in order to realise reusability betweencomponents. Different C++ compilers generatedifferent class layouts, virtual function callingtechniques, etc. The name encoding schemes usedfor type safe linkage can also be different. If twodifferent compilers generate different run-timeorganisations, then different name encodings aredesirable as it will prevent two incompatiblelibraries from being linked. The C++ ARM (p122)states “If two C++ implementations for the samesystem use different calling sequences or in otherways are not link compatible it would be unwiseto use identical encodings of type signatures.”

3.27. Reusability and CommunicationReusability is a matter of communication.In order to use a software component, you

must be able to understand it. The writer mustcommunicate the purpose, intent, and correctusage of the component to the client. In theobject-oriented world, clear and concise definitionof software modules is not a mere nicety, butessential for reusability. Arising out of the issueof reusability is extendibility. In order tomaximise the reuse of software, it often must betailored for new applications. The clientprogrammer must decide whether the softwarecomponent is suitable for the new task. If so, whatis the best way to extend it? Clear communicationto clients is a courtesy concern.

This can be solved in two ways. Firstly, alibrary vendor could provide the entire source of alibrary so it can be compiled by the customerscompiler. This is not satisfactory if the sources areproprietary. Then the vendor will need a separaterelease for every environment, and every compilerin that environment.

Because of this problem a strong case existsfor a universal intermediate machine readablerepresentation of programs. Interestingly, somesystems are already using C as a ‘universalassembler’, notably AT&T C++ and Eiffel. Butthis cannot solve the above problems ofcompatibility between components without astandardisation effort on run time layouts andname encoding schemes.

3.28. Reusability and TrustReusability is a matter of trust.Trust results from confidence that safety

concerns have been met. If you do not haveconfidence in a software component, then it isdifficult to consider it for reuse. There could bedoubt that the software component providesenough functionality, or correct functionality.There could be doubt that the component isefficient enough, or worse it might crash. TheC/C++ philosophy of not building checks into thelanguage and compiler because programmers canbe trusted, works against trust and reusability.

3.30. Reusability and PortabilitySince true OOP ensures that objects are

loosely coupled to the external environment,portability to diverse environments is possible. Cis tightly coupled to the Unix environment, and assuch is not particularly portable to diverseenvironments.

3.31. Idiomatic ProgrammingThe ability to program in different idioms is

argued as a strength of C++. Idiomaticprogramming, however, is a weak form ofparadigmatic programming. It is programming ina paradigm without necessarily having compilersupport for that paradigm. The compiler cannotcheck for inconsistencies with the idiom, orparadigm. Defines can often be used to inventidioms. Anyone who has attempted to do object-

In the real world of reusability, the ideal oftrusting programmers is inappropriate. Trustingprogrammers results in less trustworthy software.In reality, customers doubt the claims ofsuppliers. It is the onus of the supplier to provetheir claims, and thus trustworthiness of thesoftware. The client is not required to trust thesupplier’s programmers. Potential clients of a

C++?? 2nd Edition page 17

Page 20: CPP - Literate Programming

oriented programming in a conventional languageusing defines will realise that it is impossible torealise all the benefits easily, if at all, withoutcompiler support.

Object level is natural for the programmer,and has the advantage that a programmer canimplement a system without taking into accountparallel processing at all. The same program willrun and produce identical results irrespective ofwhether the customer is running a singleprocessor, or a processor array.

3.32. Concurrent ProgrammingIn the next ten years multiple processor arrays

that execute programs concurrently will probablybecome common. Concurrency requires muchcleaner languages, than the single processorlanguages of today. Object-oriented conceptssupport concurrent programming. Objects canexecute state changing code independently of eachother. Concurrent programming will be enabledby the division of the state space of a system intomodules to achieve a high degree of independentprocessing. Objects provide a scheme to cleanlydivide state spaces. The demand that everythingbe broken down into loosely coupled modules,that only interact through well defined interfacesmight be perceived as inefficient. But it isprecisely this scheme that will mean thatconcurrent solutions can be developed efficientlyand transparently to the programmer. Concurrencyshould be transparent to the programmer, asconcurrency is a low level implementationconsideration. That is concurrency is how acomputation is done, not what is to be computed.The programmer should be concerned with whatis to be computed, not how. How something iscomputed is the concern of the targetenvironment, ie the compilers, operating system,and hardware. When programmers are notconcerned with this level, efficiency andportability follow automatically.

Side effects must be avoided in concurrentsystems. Suppose a computation depends oncombining the results of two functions f and g,such as f + g. If f and g are independent, then theycan be computed concurrently. If however, fproduces side effects that g depends on, they mustbe computed sequentially. F and g are parametersto the + function. Routine parameters can becomputed concurrently, as long as thecomputation of each causes no side effects. Sideeffects are avoided by restrictive practices that Cdevotees would object to.

C++ does not preclude the use of a globalenvironment. Access to shared global datapotentially causes a thread to lock, and if manysuch accesses occur, the advantage of concurrencyis lost. This is because updates to a globalenvironment are side effects. Programming insuch an environment requires complex lockingmechanisms to ensure that things happen in thecorrect order. Locks are rather like waiting for aplane to take off when it has to wait for anotherconnecting flight. This cannot be entirely avoided,but should be reduced as much as possible.

4. The role of LanguageFor an intermission between sections, I’ll

mention some interesting points that theCambridge Encyclopedia of Language [Crystal87] makes. It says that language is an emotionalsubject. “It is not easy to be systematic andobjective about language study. Popular linguisticdebate regularly deteriorates into invective andpolemic. Language belongs to everyone; so mostpeople feel they have a right to hold an opinionabout it. And when opinions differ, emotions canrun high. Arguments can flare over minor pointsof usage as over major policies of linguisticplanning and education.”

The aim of concurrent processing is to keepall the processors in a processor array as fullyutilised as possible, so that processor resources arenot wasted. This is as good as can be expected.There is nothing more mysterious to concurrentprogramming than the efficient use of resources.Keeping all processors busy is an inherentlydynamic problem, which the programmer cannotdetermine statically at compile time. All theprocessors can be kept busy, as long as there areenough threads in the system.

In concurrent programming, a thread is a unitof sequential execution. Concurrency is achievedby the splitting of threads. A thread can be splitwhen a state changing routine is invoked, but nota value returning function, because it must waitfor the value. State changing routines can easilybe invoked on another processor. Object levelgranularity seems to be a natural candidate forconcurrent processing. An object can have onlyone update thread at a time to avoid simultaneousupdate problems. Other levels of concurrency areinstruction level, and task or process level. Taskor process level is the level used in conventionalmulti-processing systems currently commerciallyproduced, and instruction level is quite difficult,best being left to instruction pipelines.

While natural language is difficult to be“systematic and objective about”, should thisapply to computer languages? The definition ofnatural language is generally beyond our control,with the exception of languages such asEsperanto. Programming language definition,however, is within our control. Programminglanguages must have expressiveness like naturallanguage, yet be precise and semanticallyconsistent. As programming languages haverigorous requirements, we should be even morecritical and objective about them. It is a measureof immaturity in the programming profession thatemotional and irrational defensiveness oftendenies valid criticism. Many dismiss the choice of

C++?? 2nd Edition page 18

Page 21: CPP - Literate Programming

programming language as a religious issue. Iflanguage choice is merely religious, then wemight as well still program in assembler, ormaybe even binary, because the adoption of highlevel languages would have no technical merit.Language choice, however, is a technicalconsideration. Technical measures should judgethe effectiveness of a language. Understanding therole of language helps quantify what must bemeasured.

sent from one object to another, so that they cancommunicate and interact. Static bindingdetermines this message in advance as thereceptor is always the same type, or descendant ofthat type. Static typing ensures in advance that thereceptor object can process the message. Dynamicbinding, means that the exact message to be sentis determined by the dynamic type of the receptorwhen the message is actually sent. For example,on your telephone, you talk to your friendsdifferently than a client, even though you areusing the same piece of equipment. These are theconcepts that C++’s virtual do not express well.Designing an object-oriented system is likedesigning a language by which objects interact.Thus tools used for formal programming languagedesign, BNF, denotational semantics, andaxiomatic semantics can help in the design of anobject-oriented system.

The Cambridge encyclopedia lists severalfunctions of language. “To communicate ourideas”, it says is the most common answer, andthis must surely be the most widely recognisedfunction of language. It lists several otherfunctions of language. One function is emotionalexpression. For instance, when we stub our toe,we often emit words, even when there is no one tohear. Another is social interaction. For example ifsomeone sneezes, we often “bless” them. Anotheris the power of sound, as in poetry and rhymingjingles etc. Another is the control of reality, as inspells and incantations. Perhaps computerprograms and spells are similar in purpose.Another is recording facts. This includes recordkeeping, historical and geographical documents,etc. Another is the instrument of thought. Weoften reason about things to ourselves inlanguage. Another function is the expression ofidentity. Language can express who we are, oraffirm our belonging to certain groups. Perhapsthe most important role of computer languages isto enable description, and recording the decisionsmade during the design and implementation of asystem.

“Language shapes the way we think, anddetermines what we can think about.” -B.L.Whorf. Bjarne Stroustrup quotes this in “TheC++ Programming Language”. But is this correct?The encyclopedia says, “It seems evident thatthere is the closest relationship between languageand thought: everyday experience suggests thatmuch of our thinking is facilitated by language.But is there identity between the two? Is itpossible to think without language? Or doeslanguage dictate the ways in which we are able tothink? Such matters have exercised generations ofphilosophers, psychologists, and linguists, whohave uncovered layers of complexity in thesestraightforward questions. A simple answer iscertainly not possible; but at least we can be clearabout the main factors which give rise tocomplications.”

Since language and communication are twoclosely related concepts, it is important tounderstand their relationship, and the nature ofcommunication. Language is the set of aural andwritten symbols with which we communicate.Laurence Wylie in the foreword to “French inAction” [Capretz 87] describes communication as“To understand this [communication] we mustknow the basic meaning of the words common,communicate, and communication. They arederived from two Indo-European stems that mean“to bind together.” In this ordered universe, nohuman being can live in isolation. We must bebound together in order to participate in anorganised effort to accomplish the necessaryactivities of existence. This relationship is so vitalto us that we must constantly be reassured of it.We test this connection each time we have contactwith each other.”

The above Whorf quote is a statement of theSapir-Whorf hypothesis on language and thought.Edward Sapir (1884-1939) formulated this withhis pupil Benjamin Lee Whorf (1897-1941). Itreflects the view of its day when great value wasplaced on the diversity of the languages andcultures of the world. The Sapir-Whorf hypothesiscombined two principles. The first is ‘linguisticdeterminism’, which states that languagedetermines the way we think. The second,‘linguistic relativity’, that the distinctions found inone language are not found in any other. Therecan be both verbal and non-verbal thought;following a road map in a car for example. Streetdirections are often difficult to put into words.

The Sapir-Whorf hypothesis in its strongestform, as in the Whorf quote, is now not generallyaccepted. For one reason, it is known thatconcepts can be translated from one language intoanother. This is even if in one language, theconcept can be expressed in one word, but takes aphrase of words in another. A weaker version ofthe Sapir-Whorf hypothesis is accepted. That is“language may not determine the way we think,but it does influence the way we perceive and

The concept of binding is also important incomputing. In networks, binding establishescommunication links between two or moreentities. This forms a greeting, so that arelationship is established, and communication ispossible. In programming we have the concepts ofstatic and dynamic binding. Binding in thisparadigm makes it possible for a message to be

C++?? 2nd Edition page 19

Page 22: CPP - Literate Programming

remember, and it affects the ease with which weperform mental tasks. Several experiments haveshown that people recall things more easily if thethings correspond to readily available words andphrases. And people certainly find it easier tomake a conceptual distinction if it neatlycorresponds to words available in their language.Some salvation for the Sapir-Whorf hypothesiscan therefore be found in these studies, which arebeing carried out within the developing field ofpsycholinguistics.”

Elements of Style,” [S & W 79]. This has beenaround for most of this century in one form or an-other. William Strunk, the original author hadsome stern advice for his students:

“Vigorous writing is concise. A sentenceshould contain no unnecessary words, a paragraphno unnecessary sentences, for the same reason thata drawing should have no unnecessary lines and amachine no unnecessary parts.”

The machines that the software professionaldevelops are not built, but written. Strunk’s lastsentence prompts consideration of the relationshipof writing to software development. A commonsituation is taking several thousand lines ofincomprehensible ‘code’, and making it executeefficiently. After spending considerable time weoften realise what the program does, and reduce itto several hundred lines of program ‘text’ thatruns ten times faster. Strunk’s quote should beapplied to programming. A routine should containno unnecessary declarations or instructions, asystem no unnecessary routines. It can also beapplied to programming languages. A pro-gramming language should contain nounnecessary constructs. This is the root of mydissatisfaction with C++. Much of it isunnecessary, even for the most complex systems.Its syntax is ugly. C++ has become what the Cworld has constantly criticised in languages likePL/1 and Ada. Only C++ is worse. We need toregain artistic elegance and simplicity.

The important question to the programmingcommunity is do programming languages ‘shape’the way we think about and design systems? Thenegative argument is that it is the concepts behindlanguages that are important, not the languagesthemselves. Languages only provide a frameworkfor the expression of the concepts. A language canonly be as good as the concepts it implements. Aprogramming language influences the way weprogram, and the way we use the concepts itimplements. It can clarify the concepts, or obscurethem as in the case of C++. A language mustimplement the concepts cleanly and simply. Itmust express the concepts in as few words andconstructs as possible. But this does not just meanavoiding keywords as in C. Programmers who un-derstand the concepts should have no difficulty inadapting to different languages, as long as the newlanguage implements the concepts elegantly.

A language can be judged like a wineconnoisseur judges wine, by holding it up to thelight to judge for clarity and colour. Ultimately, itis the taste that matters, but good colour andclarity suggests that the taste is more likely to begood. Clear programming language definitionhelps in the goal of the production of quality soft-ware.

6. Generic C criticismsThese criticisms apply to the C base language,

but in general adversely affect C++. R.P.Mody[Mody 91] gives an excellent general criticism ofC. He says that to properly understand C youmust understand the insides of the compiler. Hegives many examples of how C obscures ratherthan clarifies software engineering. He concludesthat he is “appalled at the monstrous messes thatcomputer scientists can produce under the name of‘improvements’. It is to efforts such as C++ that Ihere refer. These artifacts are filled with frills andfeatures but lack coherence, simplicity,understandability and implementability. Ifcomputer scientists could see that art is at the rootof the best science, such ugly creatures couldnever take birth.”

So where does this leave Sapir-Whorf withrespect to programming languages? Programminglanguages do not shape the way we think. It is theconcepts that shape the languages, and it is theway we think that shapes the concepts. Those whohave attempted to learn a language in order tolearn object-oriented programming realise that itis the concepts which must be grasped in order tobe effective. Once the concepts have been learnt,object-oriented programming seems a natural wayto program. It matches very effectively the waywe think. If C++ has been designed according tothe Sapir-Whorf hypothesis, its philosophicalbasis does not serve a computer industry thatshould shape tools best suited to its purposes,processes, thinking, and concepts.

C’s popularity is based on several myths.Firstly, that it is a high level language. It is not. Itis a structured assembler oriented towards the lowlevel machine domain, not to the problem domainof a high level language. Secondly, that it is smalland simple. Its semantics are not simple, but it isvery simple to make catastrophic errors. Thirdly,that it is portable. Certainly compilers areavailable on many platforms, but this does notmake programs portable, especially to diverse,and future architectures. Platform independenceachieves portability. Fourthly, that it is efficient.

5. On WritingDuring the development of this critique, I

realised it had grown larger than I had intended,and that my writing style needed some polish forsuch a large work. During my research, somecolleagues recommended a small book, “The

C++?? 2nd Edition page 20

Page 23: CPP - Literate Programming

What seems efficient on some platforms is thevery antithesis of efficiency on other platforms. Itseems efficient on certain platforms because itallows the lower level machine-orientedarchitecture to be visible at a higher level, insteadof being handled transparently by the compiler.This means that programs will be locked intocertain styles of architecture, or into current stylesof technology, instead of protecting programinvestment against future technological change.And lastly, that the semantics are mathematicallyrigorous. Anyone who reads the C++ ARM willrealise just how poorly defined the language is.Anyone who has practiced C will know howmany traps there are to fall into.

handles transparent to the programmer. This issimilar to the Unisys A Series approach whereobject ‘descriptors’ access the target object via amaster descriptor that stores the actual address ofthe object. On the A Series this is transparent toprogrammers in all languages, as this transparencyis realised at a level lower than languages. The Aseries descriptor mechanism also provideshardware safety checks that mean that pointerscannot overrun, and arrays cannot be indexed outof bounds. C cannot be implemented particularlywell on such machines, as C’s mechanisms arelower level than the target environment.

Other environments do not provide objectrelocation, so double indirection is an unnecessaryoverhead. In order for programs to be portable andto be at their most efficient in different targetenvironments, such system details should be theconcern of the target compilation system, not ofthe programmer.

6.1. PointersC pointers are a low level mechanism that

should not be the concern of programmers.Pointers mean the programmer must manipulatelow level address mechanisms, and be concernedwith lvalue and rvalue semantics, which aremachine oriented and not problem oriented as youwould expect of a high level language. A compilercan easily handle such issues without loss ofgenerality or efficiency. Memory models ofdifferent environments often affect the definitionof pointers. Memory model details such as nearand far pointers should be transparent to theprogrammer.

C’s pointer declaration syntax causes anothersmall problem:

int* i, j;

This does not mean, as might be easily read -

int *i, *j;

but

int *i, j;The programmer must also be concerned with

correct dereferencing of pointers to accessreferenced entities. Use of pointers to emulate byreference function parameters are an example. Theprogrammer has to worry about the correct use of&s and *s. (See the section on functionparameters.)

and should be written thus to avoid confusion.

6.2. ArraysPage 137 of the C++ ARM notes that C arrays

are low level, yet not very general, and unsafe.Page 212 admits, “the C array concept is weakand beyond repair.” Modern software productionis far less dependent on arrays than in the past,especially in the object-oriented environment. Thetrade off to be optimal, rather than general andsafe no longer applies for most applications. Carrays provide no run-time bounds checking, noteven in test versions of software. Thiscompromises safety and undermines the semanticsof an array declaration, ie an array is declared tobe a particular size, and can only be indexed byvalues within the given bounds. An index to anarray is a parameter in the domain of the arrayfunction. An index out of bounds is not a memberof the domain, and should be treated as severelyas divide by zero. C has no notion of dynamicallyallocated arrays, whose bounds are determined atrun time, as in ALGOL 60. This limits theflexibility of arrays. The C definition of arrayscompromises both safety and flexibility.

Pointer arithmetic is error prone. Pointers canbe incremented past the end of the entities theyreference, with subsequent updates possiblycorrupting other entities. How many lurking andundetected errors are in programs because of this?This illustrates how C undermines OOP byproviding a mechanism where state outside an ob-ject’s boundaries can be changed. Since pointersare intrinsic to writing software in C thisexacerbates the problem. Pointers as implementedin C make the introduction of advanced conceptslike garbage collection and concurrency difficult.

Another consideration is that dynamicmemory implementations vary between platforms.Some environments make memory blockrelocation easier by having all pointers referenceobjects via a master pointer which contains theactual address of the block. The location of themaster pointer never changes, so relocation of theblock is hidden from all pointers that reference it.When the block is relocated, only the masterpointer needs to be updated.

One view of arrays is just another object-oriented entity which should be treated in anobject-oriented manner as a class of data structure.It should have interface definitions, andconsistency checks inherent in object-orientedsystems. Another view is that an array is an

On the Macintosh, for example, the doubleindirection mechanism of ‘handles’ facilitatesrelocation of objects. Object Pascal makes these

C++?? 2nd Edition page 21

Page 24: CPP - Literate Programming

implementation of a function, where pairs ofvalues explicitly map the domain to the range,rather than being computed. This suggests thatAlgol was incorrect in distinguishing arrays byusing square brackets. An array just maps theinput argument (the index) to a value of the typeof the array. An array can be viewed as a randomaccess stack.

inputs to outputs. Abstract data types can be usedto design such systems. Also this will help targetenvironments to increase parallelism andconcurrency in a way transparent to programmers.

In object-oriented programming, by referenceparameters are used to pass the original object, nota copy. The called routine, however, cannotchange the state of the referenced object. Onlycalling a routine in the objects interface canchange the state. This has the desired effect of theobject being given to you, without being yours tochange, although you can effect change in theobject.

[Ince 92] considers that arrays and pointersneed not be relied upon so heavily in modernsoftware production, as higher level abstractionssuch as sets, sequences, etc are better suited to theproblem domain. Arrays, and pointers can beprovided in an object-oriented framework, andused as low level implementation techniques forthe higher level data abstractions. As has alreadybeen mentioned object-oriented programming isvery useful for the encapsulation ofimplementation and environment oriented details.Ince suggests that arrays and pointers should beregarded in the same way as gotos in theseventies. He suggests that languages such asPascal and Modula-2 should be regarded in thesame way as assembler languages in the seventies.This applies even more to C and C++, becausepointers and arrays are far more intrinsic in theuse of C and C++.

C shares faulty parameters with many otherlanguages. The interaction of C’s pointermechanism with a faulty parameter mechanism,however, makes C considerably worse than mostother languages. In C, pointers are used tosimulate by-reference parameters with by-valueparameters. The programmer must performtedious bookkeeping by specifying *s and &s forreferencing and dereferencing. Distinguishingbetween by-value and by-reference parameters isnot just a syntactic nicety, included in most highlevel languages, but a valuable compilertechnique, as the compiler can automaticallygenerate the referencing and dereferencing,without burdening the programmer.I agree with Ince that we have less dependence

on arrays, and that pointers in programminglanguages can be considered harmful. But Idisagree in as far as the concept of array is usefulfor mapping one set of values onto another, wherethis mapping cannot be describedcomputationally, but can only be expressed bypairs or tuples of coordinates.

6.4. void *“Passing paths that climb half way into the

void” - Close to the Edge, Yes.Is void * the C equivalent of an oxymoron? A

pointer to void suggests some sort of semanticnonsense, a dangling pointer perhaps? Maybe weshould tell the astronomers we have found a blackhole! While we can have some fun conjecturingwhat some of the obscure syntax of C suggests, aserious problem is that void * declarations areused to defeat the type system, and socompromise its purpose. A well thought out typesystem does not require such a facility. In anobject-oriented type system, the root class of theinheritance hierarchy provides the equivalent ofvoid.

6.3. Function ParametersParameters are used to pass routines simple

values (by-value parameters), or references toentities (by-reference parameters). Parameters areinputs to routines, and should not be changed.When memory was expensive, reusing parameterspace could conserve space. Changing parameters,however, is semantic nonsense, and mostlanguages get this wrong.

By reference parameters enable a routine tochange the value of an entity external to theroutine. Such updates beyond the environment ofa routine are side-effects. This introduces amechanism of updating the state space, other thanstraight assignment (although the routine can useassignment to achieve the ‘dirty deed’.) Thedanger is that the state of an object can bechanged without using the well defined interfaceof the object. By-reference parameters should notbe used to change the external world. Valuesshould only be passed to the external world by thereturn value of a function. Semantically, this isquite different to assignment to a referenceparameter; data flows through the program in onedirection, in via parameters, and out via returnvalues. Mathematically this maps compositions of

When a typed entity is assigned to a referenceof void *, it looses its static type information.When it is assigned back to a typed reference theprogrammer must explicitly specify to thecompiler the type information. This is error proneand should at least result in a run-time check, tomake sure that the correct type actually is beingassigned. Without type checks, the routines of oneclass can be mistakenly applied to objects ofanother class.

6.5. void fn ()The default type that a function returns is int.

A typeless routine returning nothing should be thedefault. Instead this must be specified by anotherconfusing use of void. This is an example of

C++?? 2nd Edition page 22

Page 25: CPP - Literate Programming

where C’s syntax is not well matched to theconcepts and semantics. Syntactically no <type>should suggest nothing to return. Also a typedfunction can be invoked independently of anexpression. This is a shorthand way of discardingthe returned value. Values should be returnedbecause they need to communicate with theoutside world, and ignoring returned values isoften dangerous. In other words, using a typedfunction as a void should result in a type error.

assign values to variables. Functions, however,are the target of assignment. The return statementaccomplishes this in C. Algol has no returnstatement, but uses assignment to the name of thefunction. The assignment of a value to a variablesets the return value for subsequent invocations ofthat function.

It is trivial for a compiler to realise thistransparency of view for variables and functions.In ALGOL style languages, the compilerautomatically deduces invocation when it sees aname that was declared as a routine, rather than avariable. The compiler knows that the identifierrefers to a routine. This compiler technology wasnot realised when FORTRAN and COBOL weredeveloped. This compiler technology is possible,because the compiler stores much informationabout an entity. A compiler can check that theprogrammer uses the entity consistently with thedeclaration. A compiler can generate correct code,without burdening the programmer with having toredundantly use an invocation operator. Thisenhances flexibility and implementationindependence.

In fact there should be no such thing as a voidfunction. A void function is a procedure.Procedures and functions should be distinguished.This distinction belongs to the problem ‘what’domain. A procedure is a routine that changes thestate of its object, but returns no value. A functionshould, in general, not cause any change to thestate of an object, but just return some resultdependent upon the objects state. Mathematically,a function is an entity that returns a value of agiven type. Procedures are untyped, and do notreturn a value. So it is incorrect to regardprocedures as functions. Functions as will beexplained below have more in common withvariables than procedures. Procedures can causeside effects, functions should not cause sideeffects. These distinctions are useful whenconsidering concurrency.

In fact the Unisys A series architectureelegantly achieves this level of transparency at thehardware level. The value call (VALC) operatorloads a value onto the top of stack. When VALChits a data value, that value is retrieved frommemory and loaded onto the stack. When VALChits a program control word (PCW), a routine thatcomputes the value to be loaded onto the stack isinvoked.

6.6. fn ()Empty parenthesis represent the function

invocation operator in C. Even though ‘()’ ismathematical looking, it is semanticallyequivalent to FORTRAN’s CALL, COBOL’sPERFORM, and JSR in assembler. The design ofthese operators was influenced by the underlyingmachine architectures. The invocation operator islow level, machine and execution oriented, and inthe ‘how’ domain.

Variables and functions should beinterchangeable for programmer optimisation. InC, it is not possible to change a function to avariable without removing all the (). This mightbe spread over many files, and the programmermight not bother with optimisation to avoid thetedium of the task. So the () operator reducesflexibility. Thus implementation detail is visiblefor the outside world to see. The () operator isanother bookkeeping task imposed on the Cprogrammer. Pure functional languages such asSML remove the variable/function distinctionaltogether, by not having variables at all.

This is opposite to most Unix shells, whereinvocation operators such as ‘run’ and ‘exec’ arenot needed. The ability to execute file names ascommands extends the command repertoire. Theshell runs executables and interprets shell scripts.There is no distinction as far as the shell user isconcerned. This is a widely accepted as an elegantand effective convenience. C’s () operatorintroduces the equivalent of a run command intothe language.

The removal of the variable/functiondistinction would remove the need for a commonuse of C++’s inline functions. Inlines clutter thename space of a class and add work for theprogrammer. All that is required is to directlyexport a data member as a function.

No invocation operator exists in the problemoriented domain of high level languages. This isbecause the semantics of a function is to return avalue of a given type. How this value is computedis unimportant. The value could be computed by aroutine invocation, by sending a message across anetwork, by forking an asynchronous process, orby retrieving a precomputed result from a memorylocation, ie a variable.

C also has pointers to functions. Functionpointers are analogous to the call by name facilityin ALGOL, and this was recognised as havingpitfalls. Consistent application of the object-oriented paradigm avoids these pitfalls. Acommon use of function pointers is to explicitlyset up jump tables. The mechanism behind virtualfunctions is a jump table of function pointers. Thedesign of a program can take advantage of thisfact, without resorting to explicit jump tables.

The distinction that languages like C makebetween variables and value returning routines isartificial. It could be pointed out that variables arefundamentally different to functions, as you can

C++?? 2nd Edition page 23

Page 26: CPP - Literate Programming

Another use is to jump to a function in a table thatis indexed by an input character. A switchstatement can cater for this mechanism that makeswhat is meant explicit, while keeping underlyingmechanisms (and possibly optimisations)transparent. C++ allows function pointers tomember functions to be stored in tables (via the .*and ->* operators).

appears to be left to the implementation, whichcontributes to non-portability. If this can’t bedefined for a sequential processor, then it is evenworse for a concurrent environment.

The shorthand += and -= are more powerful asvalues other than 1 can increment the variable. Ithas been suggested that there should also be &&=and ||= operators.

If it is mistakenly believed that a multiplicityof operators is required to produce more optimalcode, then it should be pointed out that codegenerators, especially for expressions, canproduce the best code for a target architecture. Aplethora of operators complicates the task of anoptimiser. A compiler can optimise well beyondwhat a programmer can do. An optimisingcompiler will analyse the surrounding code, and ifan entity is used several times in a local scope, itwill keep the value of that entity handy locally atthe top of a stack, or in a register, rather thanretrieve it from slow main memory several times.The nature of such optimisations depends on themachines architecture, which a programmershould not have to be aware of. Open systemsdemands that programs can be ported amongstdiverse architectures and environments, verydifferent to the original machine, and not onlyrun, but run efficiently. Optimisers work best withsimple, well defined languages.

6.7. Metadata in StringsThe implementation of strings in C mixes

metadata with data. Metadata is data about anobject, but is not part of the data itself. Examplesof metadata are addresses, size and typeinformation. Such metadata is often referred to asdata descriptors, and can be kept independently ofthe data, with the advantage that the programmercannot mistakenly corrupt the metadata.

In C strings, metadata about where a stringterminates is stored in the data as a terminatingbyte. This means that the distinction between dataand metadata is lost. The value chosen as theterminator cannot occur in the data itself. Thecommon alternative implementation is to store alength byte in a fixed location preceding thestring. This length metadata can be hidden fromthe programmer who does not need to knowwhere the length metadata is stored. Thisimplementation also has the advantage that thelength of a string can be easily obtained, withouthaving to count the number of elements up to theterminating null.

In fact constructs such as:

while (*s1++ = *s2++);

might look optimal to C programmers, but arethe antithesis of efficiency. Such constructspreclude compiler optimisation for processorswith specific string handling instructions. Asimple assignment is better for strings, as it willallow the compiler to generate optimal code fordifferent target platforms. String assignment willalso hide the implementation details of strings. Ifthe target processor does not have stringinstructions, then the compiler should beresponsible for generating the above loop code,rather than requiring the programmer to write suchlow level constructs. The above loop construct forstring copying is also the contrary to safety, asthere is no check that the destination does notoverflow. The above code also makes explicit theunderlying C implementation of strings, that arenull terminated. Such examples show why Ccannot be regarded as a high level language, butrather as a high level assembler.

6.8. ++, --The increment and decrement operators are

often used as an example that C was designed as ahigh level assembler for PDP machines. Theseoperators provide a shorthand convenience, butare unnecessary. There are no less than threeways to perform the same thing -

a = a + 1a += 1a++++a

For full generality, only the first form isrequired, the others are a mere convenience. Thelast two forms a++ and ++a are the postfix andprefix forms. They are often used in the context ofanother expression. Thus several updates can beperformed in one expression. This is a verypowerful and convenient feature, but introducesside effects into an expression that sometimeshave surprising effects, and can lead to programerrors. The following example is given on p.46 ofthe C++ ARM -

As with name overloading, memory storageupdate is a problematic, but necessary part ofprogramming. A language should provide it in aconsistent and expected way. Many languagesrecognise that memory update is problematic, andtypically only provide limited but sufficient waysof updating, by an assignment operation. (Manylanguages have block memory copies as well, butassignment can also provide block copy.)Furthermore, many languages avoid side-effects

i = v[i++]; // the value of ‘i’ is // undefined

The ARM points out that compilers shoulddetect such cases, but the exact interpretation

C++?? 2nd Edition page 24

Page 27: CPP - Literate Programming

by limiting updates to only one per statement. Cprovides too many ways to update memory. Theseadd nothing to the generality of the language,increase the opportunity for error, and complicateautomatic optimisation. Restrictive practices arejustifiable in order to accomplish correctlyfunctioning and efficient software.

Consider the paradigm of letters and words.Words are spelt by assembling letters in order.There are 26 distinct letters. With the addition ofdigits 0 to 9, and the underscore character, wehave a complete and correct definition foridentifiers. Letters can be written in a number ofstyles. They can be bold, italic, upper or lowercase. Such typographic representations, however,do not change the semantics of a word. Thus if wewrite ALGOL, Algol or algol, we recognise theword to represent a computer language. The caseof the letters does not change the semantics. Lettercase is only a typographic device. Typographicconventions make program text more readable,but should not affect the semantics of a program.Case distinction is based on the low levelparadigm of character codes such as ASCII usedinternally in the computer. This weakens thepurpose of using names to replace addresses, asnames are reduced to a string of character codes.

6.9. DefinesThe define declaration -

#define d(<parameters>)

has a different effect to -

#define d (<parameters>)

The second form defines d as (<parameters>).Extra white space between tokens should notaffect semantics of constructs.

#defines are poorly integrated with thelanguage. The ‘#define’ must be in column 1, andknows nothing about scope rules. Errors indefines can lead to obscure errors, as thepreprocessor does not detect them, but leavesthem for the compiler. Programmers must befamiliar with the particular preprocessorimplementation on their system, as preprocessorimplementations are different, particularlybetween Classic C and ANSI C.

Case distinction also contributes to errors. Itintroduces ambiguity, and as has already beenmentioned, ambiguity weakens the purpose ofnames, as identity is lost. As every programmerwill have experienced, one character errors aremore difficult to find than one would think. Forexample if an identifier is declared Fred, anotherone can be declared fred. Such names are easilymistyped and confused. We are in general poorproof-readers. The psychological reason for thisis that the the brain tends to straighten out errorsfor our perception automatically. The human brainis an excellent instrument for working out whatwas intended, even in the presence of radicalerror. (This makes us good at difficult tasks likespeech recognition.) In order to overcome thisprogrammers must use their powers ofconcentration to override this natural tendency ofthe brain. Distinguishing upper from lower case innames only adds another level of difficulty. Goodlanguage design takes into account suchpsychological considerations in these small butimportant details, being designed towards the wayhumans work, not computers. Such considerationsof cognitive science make a big difference to theeffectiveness of people, but do not have anyimpact at all on the efficiency of code generatedfor the computer. What is more important, peopleor computers?

6.10. NULL vs 0[Ellemtel 92] recommends that pointers

should not be compared to, or assigned to NULL,but to 0. Stylistically, NULL would be preferable.It would also allow for environments where nullpointers have a value other than 0. ANSI-C,however, has subtle problems with the definitionof NULL.

6.11. Case DistinctionIt is good to adopt typographic conventions

for names, but distinguishing between upper andlower case in names can cause confusion.Confusion leads to errors and systems that aredifficult to maintain and modify. Case distinctionis based on the implementation paradigm of howcharacter codes work. Why do we have names?To give entities identity, and aid our memory ofthat identity. Philosophically, case distinction iscontrary to the fundamental purpose of names. Case distinction provides another form of

name overloading. Name overloading is a double-edged sword. It leads to ambiguity, confusion anderror. Name overloading as has been suggested inthe section on name overloading should only beprovided in controlled and expected ways, whereoverloading provides a useful function such asmodule independence or polymorphism. Where aname is overloaded in the same scope thecompiler should report an error.

Case distinction in interactive systems is apoor user interface. It is clumsy having tocontinually use the shift key, and will slow a goodtypist. More importantly, case distinction makesnames harder to remember, and so is contrary tothe purpose of aiding memory. It is difficultenough for users to remember commandmnemonics or file names, let alone exactly thecase. Names are used instead of difficult toremember addresses. If we did not have names,we would have to retrieve files by addresses, orcall people by their social security number.

As another example, a commonly usedtechnique is -

C++?? 2nd Edition page 25

Page 28: CPP - Literate Programming

class obj TYPE{ CHARACTER

int Entry;FUNCTIONS ord: CHARACTER -> INTEGERvoid set_entry (int entry)// convert input character to integer{ char: INTEGER /-> CHARACTERentry = Entry;// convert input integer to character}

} PRECONDITION // check i is in range

If you have not spotted the error in the aboveexample, what was it supposed to mean?

pre char (i: INTEGER) = 0 <= i and i <= ord (last character)

6.12. Assignment OperatorThe notation ‘->’ means every character will

map to an integer. The partial function notation ‘/->’ means that not every integer will map to acharacter, and a precondition, given in the ‘prechar’ statement, specifies the subset of integersthat maps to characters. Object-oriented syntaxprovides this consistently with member functionson a class:

Using the mathematical equality symbol forthe assignment operator is a poor choice ofsymbols. Programming assignment is not equal tomathematical equality (:= != =). Languagedesigners of ALGOL style languages realised theywere semantically quite different, so took the careto distinguish, only using ‘=’ to mean equality inthe sense of mathematical assertion. In C the lackof distinction leads to error. It is easy to use =(assignment) where == (equality) is intended,which looks reasonable, but results in errors thatare difficult to detect.

i : INTEGERch : CHARACTER

i := ch.ordThis leads to a more general criticism of C, in

that it has a pseudo mathematical appearance. Fewpeople are proficient at interpreting mathematicaltheorems, most passing over such sections in text,making the assumption that the mathematicsproves the surrounding text. The pseudo-mathematical nature of C has this bad attribute ofmathematical notation. It is difficult to read, whilelacking the semantic consistency and precision ofmathematical notation. One of the keys ofreusability is readability.

// i becomes the integer value of// the character.ch := i.char// ch becomes the character// corresponding to the value i.

but a routine char would probably not bedefined on the integer type so this would morelikely be:

ch.char (i);// set ch to the character// corresponding to the value i.6.13. Type CastingThe hardware of many machines cater for such

basic data types as character and integer, and it isentirely possible that a compiler will generatecode that is optimal for any target hardwarearchitecture. So many languages have characterand integer as built in types. The object-orientedparadigm, however, can treat such basic data typesconsistently and elegantly, by the implicitdefinition of their own classes.

Type casting is just a mechanism to mapvalues of one type onto values of another type.This means type casting is no more than a specificform of mathematical function. Type casting hasbeen useful in computer systems. Often it isrequired to map one type onto another, where thebit representation of the value remains the same.Type casting is therefore a trick to optimisecertain operations. Type casting provides nouseful concept that general functions cannotimplement. Furthermore, type casting underminesthe purpose of strongly typed systems. In manylanguages, the type system has not beenconsistently defined, so programmers often feelthat type casting is necessary.

Another example of type conversion is fromreal to integer. Here though, the programmermight wish to specify the use of two typeconversion functions to truncate or round.

TYPEREALMathematically, all functions perform type

casting. An example often used in programming isto cast between characters and integers. Type castsbetween integers and characters are easilyexpressed as functions using abstract data types(ADTs).

FUNCTIONStruncate: REAL -> INTEGERround: REAL -> INTEGER

r: REAL

C++?? 2nd Edition page 26

Page 29: CPP - Literate Programming

i: INTEGER if (condition) statement1; /* Semicolon

i := r.truncate required */ // i becomes the closest integer else // <= r statement2;

i := r.roundif (condition) // i becomes the closest integer{ // to r statement1;

Again many hardware platforms providespecific instructions to achieve this, and anefficient object-oriented language compiler willgenerate code best suited to the target machine.Such inbuilt class definitions might be a part ofthe standard language definition.

} /* Semicolon must be omitted */else statement2;

This is an irregularity, as a parser willreduce both of the above to the grammatical form:

if (condition) statement6.14. Semicolonselse statement

I am not concerned whether the semicolon isdefined as a terminator or separator. Argumentsthat languages that define the semicolon asterminator are superior to those that define it asseparator are, however, baseless. The semicolonas separator is really quite logical. It is based onviewing the semicolon as a statement sequencingor concatenation operator. It is therefore a binaryoperator, requiring both a left and a right handside. Some people claim to find this conceptdifficult to understand, but if we consider it in thecontext of a mathematical expression, it would besilly to expect that an addition be written as:

(In fact why do conditions in C if and whilestatements have to have parentheses aroundthem?)

7. ConclusionsC++ is overly complex. C is widely

recognised as being a simple language. But eventhis is doubtful, as it has many operators, and adifficult precedence system. Its pointer style ofprogramming is difficult. Overall, C has manytraps that lead to difficult to detect errors insoftware. Object-oriented languages shouldprovide sophisticated concepts in the simplestpossible framework. Where the framework is notsimple, the concepts are obscured. OOP addressesmany issues in order to facilitate the production ofcomplex and sophisticated programs. Many ofthese issues are addressed in implicit and subtleways, but are lost in C++. Subtle errors can beintroduced into C++ software in many ways, andfurthermore, the combination of these will causeeven further problems. C++ has devices for pettyconvenience, while sacrificing majorconveniences and long-term correctness andsafety. C++ forces the programmer to performmany administrative bookkeeping tasks that acompiler can easily do.

a + b +

Another way to look at a separator is toconsider the structure of a program. A program isa list of elements. The executable part of aprogram is a list of sequentially executedinstructions. Elements in a list must be separated,and the semicolon is one syntax to separateelements in a list. The semicolon is therefore partof the syntax of the list, not part of the syntax ofthe individual instructions. Languages such asFORTRAN separated instructions by requiringthat they be placed on different lines or cards. Ifan instruction overflowed a line, a continuationcharacter was required, like the backslash in C.Well defined languages do not requirecontinuation characters, as line breaks areunimportant, and have no effect on semantics.Languages should have very regular grammars, sothat the semicolon could be an entirely optionaltypographic separator.

It can be considered what application domainC++ is relevant for? The answer to this is thatC++ might be used as a better C. But for whatapplications is C relevant? C is relevant for lowlevel Unix systems programming. It is not agenerally applicable language in view of its lowlevel nature, and its flaws. C is not applicable forlarge scale production. Hence C++’s attempt toimprove it. C++, however, has not solved C’sflaws, as I once hoped it would, but painfullymagnified them. Better languages exist for higherlevel functions such as communications andnetworks, scientific work, compilers, etc. Ienvisage that C has a place as a high levelassembler that can be used to implement smallpieces of code on suitable platforms, where

In natural language both the comma andsemicolon are separators, only the full stop is aterminator. If the comma was a terminator,function invocations would look like:

fn (a, b+c, d, e,);

It is often argued that the semicolon asseparator leads to irregularities. C’s handling ofthe grammar of semicolons, however, leads to anirregularity in if/else’s:

C++?? 2nd Edition page 27

Page 30: CPP - Literate Programming

efficiency is of prime importance. Thus the use ofC would be limited and well controlled, ratherlike small assembler routines are currently used insome systems for the same purpose. Indeed themove to C++ should only be considered in thecase of upgrading a body of C programs forbackwards compatibility. In the case of newprojects alternatives to C and C++ shouldseriously be considered.

advanced musician ensure that the tempo of apiece is correct, and since playing to a metronomeis more difficult, will help sharpen the musiciansperformance of the piece. The musician does notjust view the metronome as an aid for beginners,or as something that restricts him to a set beat, butas a tool that helps produce a polished andprofessional performance. C should not be seen asa language to which you graduate after you havelearnt to program in languages with safety checks.In fact changing to C or C++ is a great stepbackwards. Languages with consistency andsemantic checks are essential aids to theproduction of professional software.

Programming is the orchestration of changewithin a large state space. Object-orientedtechniques provide a method of simple divisionand management of such state spaces. Managingsuch state spaces requires the simplest techniques,in order to guard against detectableinconsistencies that lead to errors in executablesystems. C and C++ do not implement the simplemanagement of a large state space, and allowmany potential errors to go undetected. The roleof a language as a tool cannot seriously beregarded as some authoritarian that stops us doingwhat we want or need to do, as many languageswith type safety and consistency checks are oftenviewed. Programming languages should embodythe collective wisdom of common sense practicesthat have been learnt over many years, bycommon and painful experience. C++ lacks theimplementation of much of this wisdom.[Sakkinen 92] observes that much of the C++literature has few references to external work orresearch. It fails to draw on the insights andprogress made by many researchers. This leadsme to believe that C++ is parochial and removedfrom the many advances that will makeproduction of systems easier and more costeffective.

This paper has shown many cases where C++uses old C mechanisms to provide things that canand should be expressed consistently within theobject-oriented paradigm. For example typecasting. The move to pure object-orientedlanguages will facilitate more consistentprogramming and avoid many typical errors thatoccur in software production. C++ also makesdistinctions that belong in the ‘how’implementation domain. For example, ‘.’ vs ‘->,and variables and functions. These makebookkeeping work for programmers, whichshould be handled by a compiler. But then C++fails to make distinctions that belong in the ‘what’problem domain. For example, procedures vsfunctions. Making distinctions in the ‘how’domain adds inconvenience to the language.Failing to make distinctions in the ‘what’ domainlimits the power and expressiveness of thelanguage. The amount of change required in C++to address the issues raised in this paper is seen aslargely insurmountable.

It is better to detect and avoid errors than tofix them. The fixing of errors happens many timesduring the development process. This slows downthe development process, and is therefore costly.Good programmers in this context (often called‘gurus’), are those who recognise symptoms, andrecommend fixes. Good programmers in the bettersense (often called ‘impractical idealisticdreamers’) adopt better practices (programminglanguages being a subset of these), that avoiderror in the first place.

A programming language is just a tool, in thesame way that an axe is a tool. If the axe is bluntwhen chopping down a tree, then procedures,processes and methodologies could be invented tomake it as effective as possible. But that leavesthe real problem unsolved; that the axe that doesthe real work is blunt. So it is with programminglanguages. To develop a system, it must beimplemented, and a programming language is thetool to do the real work. If the language is blunt,then procedures, processes and methodologiesmight alleviate the situation, but they do not solvethe problem. Once the axe is sharpened, then realprogress is made, and the procedures, processesand methodologies also become more effective. Agood axeman will have good axe wieldingtechnique, but given a choice of axes will choosethe sharpest implement. A poor axeman could beineffective with even a sharp axe, but the axemaker will still strive to produce the sharpest axefor the good axeman. The argument that poorprogrammers will produce bad programs in anylanguage so we shouldn’t bother with betterlanguages is fallacious.

C encourages gurus who spout false wisdomon obscure subjects. Writing programs in C isoften called ‘coding’. Coding is writing obscureencryptions that will later have to be decoded, bynone else than a guru! C also encouragesprogramming by guesswork. C programmers oftensolve ‘bugs’ by adding extra ()s, *s and &s,without understanding the problem. People whoattain proficiency at this guesswork, are known as,well you guessed it, gurus!!

The view that correctness checks are trainingwheels for students, which gurus don’t need mustbe dispelled. Many disciplines have techniques toensure correctness. For example, the metronomein music is not just for students, but will help an

As mentioned in the introduction, both sidesof the analysis/design vs implementation debate

C++?? 2nd Edition page 28

Page 31: CPP - Literate Programming

need to compromise in order to bridge thesemantic gap. The perpetuation of low levellanguages such as C into OOP is proof that theprogramming community is not willing tocompromise, or sharpen its axe enough in order tobridge this costly gap.

8. BibliographyC++ ARM Ellis and Stroustrup “The annotatedC++ Reference Manual” AT&T 1990.[Capretz 87] Pierre J. Capretz “French in Action,A Beginning Course in Language and Culture”Yale University Press.The critique began with certain questions, and

as no work can be absolute (particularly aprogramming language), it will end with morequestions that it is hoped will create more debate,and more questioning into what we are reallytrying to achieve with program development.

[Cline] Marshall Cline “C++ Frequently AskedQuestions” comp.lang.c++ newsgroup.[Crystal 87] David Crystal “The CambridgeEncyclopedia of Language” CambridgeUniversity Press.Does C++ provide effective communication

between programmers separated in both space andtime? Does C++ provide communication betweenthe levels of analysis, design, implementation andmaintenance?

[DDH 72] Dahl, Dijkstra, Hoare “StructuredProgramming”[Dijkstra 76] E.W. Dijkstra “A Discipline ofProgramming” Prentice Hall.

Are the compromises made by C and C++ stillrelevant to today’s environments, and theenvironments of the not very near future?

[Ellemtel 92] “Programming in C++: Rules andRecommendations” Ellemtel TelecommunicationSystems Laboratories, Sweden.

Could C++ be regarded as the PL/1 of theobject-oriented world, as PL/1 was the marriageof FORTRAN and structured ALGOL concepts,and C++ is the marriage of C with object-orientedconcepts?

[Ince 92] D.C.Ince “Arrays and PointersConsidered Harmful”, ACM SigPlan Notices,January 1992.[Mody 91] R.P.Mody “C in Education andSoftware Engineering” ACM SIGCSE BulletinVol.23 No. 3 September 1991.

Are the compromises made for the restrictedmachines and environments of 20 years ago stillappropriate for today? Are languages based on 20year old compromises appropriate in modernsoftware development environments?

[Reade 89] Chris Reade “Elements of FunctionalProgramming” Addison-Wesley, 1989.[RBPEL91] Rumbaugh, Blaha, Premerlani, Eddy,Lorensen “Object-Oriented modelling andDesign”. Prentice-Hall, 1991.

Should new software developments be forcedto accept such compromises?

Is C++ patching old material with new cloth,or pouring new wine into old wineskins?

[S & W 79] William Strunk and E.B.White “TheElements of Style”, MacMillan Publishing, 1979.

What are we really trying to achieve inprogramming anyway? [Sakkinen 92] Markku Sakkinen “Inheritance and

Other Main Principles of C++ and Other Object-oriented Languages”. University of Jyväskylä,1992. (Also published as selected papers inECOOP ‘88, Computing Systems Vol. 5 No. 1,and Structured Programming Vol. 13 (1992).)

Ian JoynerNovember 1992

[SJE91] Saake, Jungclaus, Ehrich “Object-Oriented Specification and Stepwise Refinement”in IFIP Workshop on Open Distributed ProcessingBerlin, 1991.[Weg91] Peter Wegner “Concepts and Paradigmsof Object-Oriented Programming” ACMSIGPLAN OOPS Messenger Volume 1 no. 1August 1990.[X3J16 92] Members of the X3J16 working groupon extensions “How to write a C++ LanguageExtension Proposal for ANSI-X3j16/ISO-WG21”ACM SIGPLAN Notices Vol. 27 No. 6 June1992.[Yoshida 92] Koichiro Yoshida Title and book inJapanese.

C++?? 2nd Edition page 29