the pre-natal death of the cis project: a software disaster story

The Pre-Natal Death of the CIS Project: A Software Disaster Story*

Tom Gilb Independent Consultant, Norway

1. INTRODUCTION

The new President of a European Corporation made a tough decision. He had simply to “give up” on a very large scale computer project which was considered vital to the management of the Corporation in the troubled years ahead. (Japanese competition was already threat- ening the Corporation’s very existence.)

The Corporate Information System-CIS as it was known internally-had been started by his predecessors five years earlier, with a budget which included eighty work-years of corporate computer staff for development. By the time the new President took over, it had already used twice that budget in costly internal profes- sional resources, but was nowhere near to producing any useful results at all.

The Corporation had done all the “right things”:

-The Corporation had consulted American business management publications and had accepted the idea that a centralized corporate-wide information system was needed.

-The Corporation had hired a famous Californian “Think Tank” to do a feasibility study, which took two calendar years (and fifteen work-years) to com- plete .

-The Corporation made use of the largest computer manufacturer’s biggest and most modem computers and database software.

-The Corporation uncomplainingly paid several million dollars extra development costs, when the initial budgets were used up.

--The Corporation was using all the latest structured programming techniques recommended by the computer suppliers.

There were three main problem areas:

* Reprinted with permission from the text of Princi#es of Software Engineering Management by Tom Gilb, to be published by Addison- Wesley Publishers, Limited, Wokingham, England, 1988.

Address correspondence to Robert L. Glass, P.O. Box 22012, Seattle, WA 98122.

The Journal of Systems and SoRware 8, 161-163 (1988)

0 1988 Elsevier Science Publishing Co., Inc.

The project “functioned.” All the necessary programs were actually running. But the programmers and the project staff claimed they needed more time and more computer resources to “tune the system” so that it would be able to operate at the necessary speed. There were about twenty thousand changes to be fed into the computer daily, just to keep it up to date. One single change could cause the computer to use up to 20 minutes of its time to update all the consequences of that change within the highly integrated system.

Fortunately not all updates were so time-consum- ing. But every effort to tune the system to better efficiency totally failed to get a day’s normal updating through the large computer within one day. Furthermore, they discovered that it would take an estimated two years of additional development effort for them to be able to add a new factory, or to add a single one of their wholly-owned suppliers to the integrated system. Apart from the intolerable delay involved, they were faced with the fact that these kind of changes within the Corporation were taking place at the rate of about one every six months.

THE CONSEQUENCES

When the Corporate Information System project was thrown out, they were left with nothing in the way of new management control systems. They also lost five years in which their competitors might well have achieved some real results in this area.

This was the unpleasant reality that confronted the new Corporate President just one year after he took on his job.

This sort of thing shouldn’t happen. But it does. (And it did.)

3. PREVENTING THE DISASTER

So how can it be prevented? The President of the Corporation had the courage not

161

0164-1212/88/$3.50

162 Tom Gilb

to throw good money after bad. But the system developers themselves did not apparently have the courage to admit their own failure.

When I asked one of the programmers about the project, he said, “I knew it wouldn’t do the job-but my programs worked.” He did not feel the need to worry about whether the entire system was a failure, as long as his own component functioned.

Why did CIS fail? I would classify the reasons for this software engi-

neering prqject’s failures into these categories:

1. Failure to determine and control project attributes critical for survival

2. Failure to find an architecture suited to those critical attributes

3. Failure to “evolve” a useful system in smaller useful steps

The critical attributes of a system are those qualities and resources that can cause the death of the system as a whole if they are allowed to go beyond certain limits (the ‘ ‘worst acceptable level’ ‘) .

It is necessary for project managers to determine these dangerous attribute areas, and to take steps to manage them.

In the CIS case we see two examples of critical attributes that it should have been possible to determine and control.

The most critical one was the practical daily opera- tional performance of the system. Even after all efforts had been made to improve performance, it still could not do a single day’s work in a day on the largest machine available.

It must have been possible, at an early stage in the project, to estimate the order of magnitude of transactions to be handled by the system per day. The number was in fact approximately twenty thousand. There are 86,400 seconds in 24 hours, so it was necessary that an average transaction be handled in roughly one second. Yet, some transactions were taking minutes in practice.

There should have been a s~ci~cation somewhat like this:

WORK-CAPACITY:

Practical work capacity must be sufficient to handle a normal day’s work in a normal office day.

Worst case (for trial use): 4 seconds on average per transaction

Planned level (initial real use): less than 1 second on average per transaction

In practice, I could not (on interviewing the project leader) find any sign of such specifications. The assump- tion was that suf~~ient computer capacity would be

available. Because these initial specifications were not made a part of the for~aZ requirements for the system, nobody felt obliged to worry about them.

There was another requirement that would have caused the failure of the system, even if the work- capacity requirements had been met. It was the ability to integrate new business units, such as new factories, into the corporate system.

The initial efforts at such integration led to the estimate that it would take two years of effort to integrate a major business unit-and that such changes were hap~ning on the average about every six months!

The specification for adaptability, with hind-sight, should have looked like this:

ADAPTABILITY:

The system shall be capable of integrating all new business units and requirements in such a way that the system itself is never the delaying factor.

Worst acceptable case: Major new business units, such as a factory or supplier, shall be integrated within six months using no more than ten programmer/analysts or five work-years of total effort.

Planned level: Major business units should be able to be added or removed from the system with less than six work-months of qualified effort,

Again, nobody was concerned with this a~p~bility requirement. They concentrated their efforts on getting the initial business configuration to work. Yet in the five-year period of the project, it was a constantly moving target.

I often wonder how this company could have paid outside consultants for a 15 work-year feasibility study without the consultants discovering a need for such requirements.

Note that there certainly are several other critical attribute s~ci~cations that should have been made (for example “availability”, “usability”, and “portabil- ity”) and were not. “Adaptability” and “work capacity” are merely the ones that we brow killed the project in practice.

If there are no challenging attribute-specifications, then the most basic input to the engin~ring and architecture process is missing. This, in my experience, is often exactly what happens. And it happened here.

For example, since the work capacity requirements were not clearly stated, then designers could probably not see that the large database software system supplied by the hardware manufacturer was too slow and clumsy. The use of this database management software system is a major “architectural” decision. It should never have been chosen to be at the heart of such a tightly integrated system.

A Software Disaster Story

But the manufacturer was not the one to rely on for warnings of the danger. They stood to earn a lot of money if the system was used.

At the more-detailed software engineering level, both of the above-mentioned critical attributes should have been the subject of a great deal of design work in the file and program module organization, in order to meet these demanding specifications.

All these above errors in the CIS project might have been discovered early enough to correct. But this could only happen if some practical experience with using the system was gained soon enough to realize that something was wrong with the work-capacity or adaptability.

Instead, delivery was based on the “big bang” approach. They planned for the entire dream system, after five years of effort-or nothing at all.

In a post-mortem discussion with the CIS-company people on this issue, they agreed with me about what would have happened if they had used “evolutionary delivery” of the system in much smaller useful steps (for example once each quarter of a year).

In the very worst case, if nothing at all could be achieved using the system, they would have discovered this early, when they failed to make the first delivery- step pay off, or even work at all. They could have cut their losses at an early stage and devoted their resources to finding a better solution.

If they could only deliver some of the dream, but not all of it, they would at least have achieved something (rather than nothing-at a very high price).

4. SUMMARY

The CIS case is quite real. It happened to VOLVO of Sweden. And, they were kind enough to give me background data for my use here.

VOLVO’s own reaction to the case was to set up sophisticated project monitoring groups (“Shark Rooms” they call them), and to avoid such tightly integrated and centralized systems.

I don’t know if that is the best thing they could have done. This book argues for a different course of action.

The CIS (actually called VIS at Volvo) is still typical of a multitude of projects, from all branches of software

163

building, which I see many times a year and all over the world. They all commit the same sins:

1.

2.

3.

5.

Unclear specification of critical system attributes (not made measurable). Lack of real engineering of systems and software to achieve those critical attributes. Lack of a trained group of real software architects (softects) and software engineers. The “bricklayers” (programmers) were left to do the architecture by default. Lack of any systematic notion of evolutionary delivery, feedback, and change-upon-learning proc- esses. We are always going for the “big bang” delivery.

BASED ON THIS STORY, LET US SPECIFY SOME INITIAL PRINCIPLES OF SOFlWARE ENGINEERING MANAGEMENT

1.

2.

3.

4.

5.

6.

The invisible target principle All critical system attributes must be specified clearly. Invisible targets are usually hard to hit (except by chance). The All-the-Holes-in-the-Boat Principle Your design solutions must satisfy all critical attributes simultaneously. The Clear-the-Fog-from-the-Target Principle All critical attributes can be specified in measurable testable terms, and the worst-acceptable level can be identified. The Learn-Before-Your-Budget-is-Used-Up Princi-

ple Never attempt to deliver large and complex systems all at once; try to delivery them in many smaller increments, so that you can discover the problems and correct them early. The Keep-Pinching-Yourself-to-See-if-You-are~-Drearn- ing Principle Don’t nelieve blindly in any one method; use your methods and common sense to measure the reality against your needs. The Fail-Safe Minimization Principle If you don’t know what you’re doing, don’t do it on a large scale.

the pre-natal death of the cis project: a software disaster story

Documents