Some conclusions from an experiment in software engineering techniques

In two earlier reports we have suggested some techniques to be used producing software with many programmers. The techniques were especially suitable for software which would exist in many versions due to modifications in methods or applications. These techniques have been taught in an undergraduate course and used in an experimental project in that course. The purpose of this report is to describe the results that have been obtained and to discuss some conclusions which we have reached. The experiment was completely uncontrolled, the programmers generally inexperienced and poor, and the programming system used was not designed for the task. The numerical data presented below have no real value. We include them primarily as an illustration of the type of result that can be obtained by use of the techniques described in the earlier reports. We consider these results a drastic improvement over the state of the art. Major changes in a system can be confined to well-defined, small, subsystems. No intellectual effort is required in the final assembly or "integration" phase.

In two earlier reports [l,2] we have suggested some techniques to be used in producing software with many programmers. The techniques were especially suitable for software which would exist in many versions due to modifications in methods or applications. These techniques have been taught in an undergraduate course [3] and used in an experimental project in that course. The purpose of this report is to describe the results that have been obtained and to discuss some conclusions which we have reached.
The experiment was completely uncontrolled, the programmers generally inexperienced and poor, and the programming system used was not designed for the task. The numerical data presented below have no real value. We include them primarily as an illustration of the type of result that can be obtained by use of the techniques described in the earlier reports.
We consider these results a drastic improvement over the state of the art.
Major changes in a system can be confined to well-defined, small, subsystems. No intellectual effort is required in the final assembly or "integration" phase.

The Project
The class was asked to produce the KWIC index system described in [2].
The project was divided into six modules, but two were combined because they were clearly simpler than the remaining four* For each of the five assignments we specified four distinct types of implementation. Each student was given one of those to program. Had the experiment been a complete success, any combination of one version of each assignment would have run correctly; we would have had 4~* working versions (five independent selections from sets of four elements). In addition, each student was assigned to write a program which would "checkout" some module other than his own. Because of the billing policies of our University Computing Center, the programs were to be written and run in WATFIV -a version of FORTRAN. All the defined functions were to be made available as either subprograms or FORTRAN functions. *See Appendix 1 for a brief description any measure,two of the poorest students in the class.) 4. This program was clearly incorrect, but still did not violate the restrictions specified for the modules which it called. Thus combinations involving this program would run but would produce incorrect output. It produced the same incorrect output in every combination tested.
The program was "completed 11 by the student well past the due date and the "checker" was not able to do his job.
5. This program simply failed to terminate in any case. The error was found by the checker. It ran in 4.4 seconds.

2.
We have just repeated the whole experiment with a somewhat larger class. The results were essentially the same. We estimate that the family of programs has 1100 members, more than 40 of these were tested.
Performance improves somewhat^ ranging between 3 and 13 seconds. The only interesting distinction between the two experiments was that the instructor (project leader) changed from intensely interested to bored and unconcerned with no noticeable effect. We also eliminated the problem with storage limitations mentioned above.
Conclusions 1. We cannot avoid stating our conclusion that the experiment has revealed some validity in the comments of our earlier papers (2,3$.
Clearly one purpose of this paper is to draw your attention to those earlier ones.
2. Our most significant new conclusion comes in the area sometimes called "project management". Recent papers (e.g. [5]) have suggested that the project manager must devote a significant part of its best manpower to the "integration phase". In our experiment the "integration phase", while not mechanised, was so simple that it could have been mechanised.
Even in the few cases where errors did occur, the system had been structured in such a way that diagnostic messages automatically indicated the module making the error. We had no need for anyone who had a thorough knowledge of the whole system. Our experience indeed suggests that the integration phase is a very poor place to invest one's manpower. The limited capacity of our minds makes us more efficient when our job depends on a relatively small amount of knowledge. Moreover, if we plan our project management around a large "integration phase"^we will have to invest that manpower agah whenever we change some part of the system.
Our experiment suggests that manpower can be much more profitably invested in the "pre-programming" or "design" phase. The success of-our project depended largely upon the precisely written module specifications described in [l]. The "cost" or intellectual effort required to produce one of these module specifications was comparable to the cost of producing an implementation of the module. Such predesign work therefore appears to many as unjustifiable overhead. When we amortize this cost over the number of versions of the system which are finally built, and consider the savings realized in the final "integration" phase, it appears to us that the overhead is well justified.
Efforts in the industry to invest heavily in a "pre-design" or "concept" phase have often proven fruitless because the ourcome was a set of natural language documents which were so general that they provided almost no decisions to guide the development groups. When this predesign phase produces precise module specifications the payoff is much more significant.
Additional amortisation of the "pre-design" effort can occur when the modules or their specifications are used (either unchanged or slightly modified) in a later project.
3. Another important conclusion lies in the area of documentation.
Several firms have invested heavily in formalized documentation standards intending to make all information easily available to everyone on the project. Our experiment suggests that the effort in these projects can be focussed. Precise documentation of the external characteristics of each module is essential and should be in a standard notation. Our project had minimal documentation about the internals of the one-man assignments.
Industrial practice would require more effort in the area than we put into it, but much less effort than is now common. More significant, the specifications produced in the pre-design phase were the only external documentation required throughout the project. These documents were updated several times as errors were discovered, but no additional descriptive material was needed. This is yet another way that the effort invested in the pre-design phase can be amortized.

Our experience demonstrated the importance of careful attention -
to the possibility of errors in the running program during the "preprogramming" phase. Because of our careful attention to the errors in the design phase, errors which did occur when the systems were assembled were quickly traced to their source and meaningful diagnostic information was produced with almost no effort on the programmer's part. A paper reporting what we have learned in this area is in preparation.

Our experience has indicated the great value of independant
module tests (by persons other than the module author) before integration.
In an earlier effort of this sort we required each programmer to test his own module before integration. In the two experiments which we discuss here, we required an additional person to test the module against the formal specifications (another use of our predesign efforts). Our success rate increased drastically and there were apparently two reasons: (1) Sloppy programmers do sloppy tests.
(2) The specifications, although precise, can be misinterpreted by human programmers. A misinterpretation by the programmer which resulted in an error in his module often results in a corresponding error in his tests. An independently written test was unlikely to share the same misconceptions .
We are well aware that, as E # W # Dijkstra has put it (VI /'Program testing can be used to show the presence of bugs, but never to show their absence. 11 Showing the presence of bugs however is a very valuable service.
We eagerly await the day that professional programmers habitually produce programs which are written so that they can be carefully proven to be error free. In the meantime we suggest that effort invested in independent pre-integration testing is well worthwhile.
Our experience also suggests that both the hierarchical structure which can be found in the system [2] and the abstract nature of the modules themselves greatly ease the building of the "scaffolding" required for independent module tests. To test a given module one needs simulate only those modules immediately below it in the system hierarchy. Further, the nature of the modules means that many of them can be directly simulated by arrays for testing purposes.

NON-CONCLUSIONS
The reader of this paper and the references might be led to some conclusions which those closer to the project would not draw. We mention them here to avoid midleading our readers.
1. The KWIC index structure given in [2] is the best known. FALSE! Our experiment showed us a number of faults in the design which we are now trying to remedy.

2.
Writing a system in a higher level language such as FORTRAN helps to produce a better structured system. FALSE (or at least not (There were a few good programs but they were notable exceptions).

5.
Communication between modules should always be by subroutine call as it was in the sample system. FALSE! If one divides a system into modules according to the criteria given in [2] the use of subroutine calls imposes a terrible overhead.
Two more non-conclusions Several writers (e.g. Dennis [7]) have suggested that a hardware supported virtual memory and a language with the ability to pass complex data structures are necessary conditions for well structured or "modular" programs. Neither of these "necessary conditions" were met in the experimental system we are discussing.
We did not need the ability to pass data structures as parameters (all parameters were integers) between modules because of the nature of the way that our system was divided into modules. Data structures were always operated upon within a single module. W<e suggest that there is often a false identification of the modular structure seen at design time with characteristics of a program when if is running. This however is a very complex issue and we cannot discuss it further here.
-9-Our programs were written in FORTRAN and could have run either with or without the virtual memory mechanism. This however is begging the question because we built a small system where overlays were not necessary.
Memory assignment could be done at compile time or assembly time and would be fixed while the program was running. It is definitely true that memory assignments are data which should not be shared between modules but should be hidden from all but one [8]. This allows (in fact requires) programs to be written for a virtual memory. However, the implementation of the one virtual memory module can be done in many ways (hardware mapping, run time software, or assembly time software.) The choice between these implementations is determined by performance considerations not by "modularity" considerations. Thus we can agree with the virtual memory recommendation only if it is stated more carefully indicating that the necessary condition is that memory allocation considerations be hidden from all but one "module". As a historical note we might mention, that one well*structured system, the T.H.E. operating system,(which made heavy use of the virtual memory concept) was implemented without mapping hardware using the run-time software option mentioned earlier.

Final Conclusions
We believe that the small scale experiment described above has provided us with some valuable insights into methods of software production.
We recognize the danger of applying small scale results to larger scale projects. We hope however that some organization with the facilities for carrying out larger scale projects will cautiously attempt to apply these results to larger scale projects so that we may refine them further.