ABSTRACTION and VERIFICATION in ALPHARD: Design and Verification of a Tree Handler

The design of the Alphard programming language has been strongly influenced by ideas from the areas of programming methodology and formal program verification. The interaction of these ideas and their influence on Alphard are described by developing a nontrivial example, a program for manipulating the parse tree of an arithmetic expression.

The major concerns of the Alphard research are the total cost of software development and the quality of the resulting programs. Problems that arise from repeated modifications to large programs, although often ignored in the literature, are of particular interest.
The Alphard language design has drawn heavily on previous work in both programming methodology and program verification.
From the former we learned that in order to understand the programs we write, we must find some way to make them less complex; this may be done by restricting both the form of the programs (through modularity and localization of information [Parnas72]) and the process through which we create them (through stepwise refinement [Dijkstra72, Wirth71 ]). From the latter we learned that a programmer needs a precise, correct description of what a program does in order to use it without having to understand its implementation in detail; we also found techniques for writing and proving such descriptions.
Our concern with modifiability implies that the things we do to reduce program complexity must remain visibly part of the program. Thus it is not sufficient to develop a program in a well-structured fashion; the structure that was imposed must be obvious in the resulting program. The concept of abstract data type has therefore become central. In Alphard the concept is realized through a language mechanism called a form. The form is derived from the Simula class [Dahl72] in much the same way as the CLU cluster [Liskov74], and has the property that a programmer may reveal the behavior of some data type* to other users while concealing details of the implementation.
This explicit distinction between the abstract behavior of a data type and the concrete program which happens to implement that behavior provides an ideal setting in which to apply Hoare's techniques for proving data representations correct [Hoare72].
In the Alphard adaptation, we show (a) that the concrete representation is adequate to represent the abstract type, (b) that it is initialized properly, and (c) that each operator provided for the type both preserves the integrity of the representation and does what it is claimed to do (in terms of the abstract behavior and of the concrete procedure that happens to implement the operator). The specific formulas that must be proved are given below, and the methodology is This paper describes the language and verification methodology that have resulted from merging these ideas. A particular example is used to motivate the description, and a Page 4

Introduction
Example: Minimal-Register Evaluation Order Suppose you are given an arithmetic expression represented as a binary parse tree and you are asked to output the nodes in postfix form with the subexpressions arranged in the order that minimizes the number of registers required for the expression evaluation. An algorithm for finding this order was given by Nakata; its description was refined by Johnsson [Nakata67,Johnsson75]. The algorithm has two steps: Assign a weight W to each node n of the tree such that if a is a leaf then W^O, otherwise the immediate descendants of n have labels right and left and = min(max(/e/t + l, right), max(/e/t, right + l)l is the number of registers needed to evaluate the tree with root n.
To evaluate the expression, begin at the root node and walk through the tree generating code so that at each node the operand requiring the larger number of registers is evaluated first. If the operands require the same number of registers, the left operand is evaluated first. If the right operand is evaluated first, include an indication of the reversal in the output stream.
Assuming that suitable definitions for trees and an output stream exist, this is easily converted to a program. We will use a data abstraction called a btree as if it were a primitive data type.
It acts like a binary tree with an associated collection of node references called bnodes. There are at least enough operators on bnodes to obtain the left son, the right son, and the value field (nodeval) of any node and to determine whether a node is a leaf. The btree form given in Appendix A provides other operators, but they are not required for this example. For convenience, we restrict the size of btrees. We will use a queue to construct the output; a suitable definition is given in [Wulf76].
We first write, more precisely than the English algorithm above, an expression that describes the desired output for a parse tree E. This expression appears as the post conditon (output assertion) of the procedure minreg that computes it. We let W^ denote the weight of the left subtree, \N riaht denote the weight of the right subtree, and invertop supply the operator that indicates subexpression reversal.^ The operator "~M denotes concatenation. The for which a binary tree is a natural primitive data structure; the specifications and procedures for the solution assume the existence of an implementation of binary trees.
The Alphard form which defines those binary trees is developed in the third section.
The development of that form is essentially independent of the motivating example, so the resulting abstraction is useful for other applications as well. The program minreg operates on an arithmetic expression stored as a btree named E with a two-field record at each node and a known maximum height. The question marks on the btree parameters r and maxht indicate that those are implicit parameters -that is, they will automatically be available for any btree which is passed as input.
The record field names must, however, be exactly "data" and "wt'\ Minreg produces a queue named P from the tree E by first declaring a bnode variable, exptr, to point at nodes in E (exptr is automatically initialized to the root of E), then evaluating the register requirements of the subtrees with function markweights, and finally producing the queue with a special treewalk, minregwalk.
Note that P is automatically initialized to the empty queue when the output variable for the procedure is set up.
Using M^. to denote the result of executing markweights on the tree with root k (e.g., and nodevaL It also refers to the wt field of the record stored as the value at each node. These operations are discussed in detail in the next section. The enq function appends its second argumenl to the queue named by the first argument (i.e., enq(Q,e) = Q append e). The queue was created (initially empty) in the toplevel procedure rr\\nreg for the purpose of collecting the output. 4 The detailed proofs are standard and would contribute little to this exposition of Alphard.

Definition and Verification of a Form
Alphard's data abstraction mechanism is the form, a syntactic device for encapsulating a set of data declarations, function definitions, and other information about implementation details while revealing to the user only selected information about the behavior of the abstraction.
The verification shows that the implementation supports the behavior described in the specification. The programs in the previous section used "btree" and "bnode" in the same way that other languages use type names: we said that exp was a bnode and assumed that we could therefore perform certain operations on it. In this section we develop the form that defines btrees and bnodes. The definition includes not only the functions actually used by the procedures above, but also enough others to round out the form as a useful abstraction. For example, the form defines functions that might be used to construct the parse tree that minreg manipulates.
A form contains three major components. These are the specifications, which provide information to the user about the abstract behavior of the objects being defined, the representation, which defines the concrete data structures used to maintain the objects and which states certain of their properties, and the implementation, which contains the bodies of the operators. Thus the skeleton of the btree form is: form btreeOM: record, maxht: integer) = beginform specifications representation implementation endform where ellipses are used to denote text which will be filled in later. This form actually describes a variety of specific trees: both the maximum height of the btree, maxht, and the record to be stored at each node, N> are parameters to the instantiation of the form. Note that bnodes have also been treated as "types". One of the components of the btree form is the definition of bnode, which is a form in its own right. We will examine each of the components in turn; the fragments discussed here are assembled as a complete form definition in Appendix A.
In the next section we define, implement, and verify btrees and their associated bnodes, showing how the information needed to understand their behavior is Kept separate from the information about their implementation.
of graphs is given in Appendix

Specifications of btree
The requires simply says that only nonnegative values of maxht (the maximum height of the tree) make sense. The \e\ declares that a btree may be regarded as a distinguished root and a graph, and that graph concepts will be used to explain them. Since a graph consists of a pair of sets, the \e± goes on to describe these sets in terms of booleans and the record type passed as an instantiation parameter. The invariant states certain relations on the graph which must always hold of a btree; the comments (!...) give the intuitive interpretation of each phrase. Initially states that when a btree is originally instantiated, it is empty except for the root. For each function, the specifications give the function name, its input parameters, its result (if any), and the abstract pre and post conditions needed for verifying the function and describing its inputs and outputs. The \n\/ar\sn\ will always be implicitly anded with these explicit clauses to give the actual pre and post conditions. The functions root and height are applicable to any btree (i.e., any one for which the invariant holds), so the constant true as an explicit pre condition is omitted.
Finally, the btree specifications give the abstract description of the sub-form bnode.
The latter form's organization is similar to btree's, except that the specifications of bnode have been printed with those of btree in order to localize the information that will be presented to a user. A name declared as a selector may be used both to set and to fetch values. Note that a bnode is always associated with a particular btree.

Representation of btree
The representation part shows how btrees are actually stored in terms of other data The unique declaration states that each btree will consist of a vector of records (node value and "inuse" bit) indexed from 1 to 2 max^+ l-l. Alphard's scope rules prevent the vector and the record field names from being used outside the form. The inrt clause of the declaration gives the initialization code to be executed when that vector is allocated.^ It sets all inuse bits to false, then sets the record at the root to (null,true). The unique declaration states that each instance of a btree will get its own vector. In the representation chosen for this version of btree, all nodes are stored in a vector and the j" 1 node's sons are found at positions 2j and 2j + l. The inuse bit distinguishes whether potential tree positions are actually included in the tree; a separate bit was set aside for this purpose because the node can be an arbitrary record and, as a result, there is no way to encode "nonexistence" in the node value itself. Note that this is the first time a specific implementation strategy has been mentioned: up to this point a linked-list strategy should have seemed equally plausible.

Verification Considerations
We turn now to the question of how we decide whether a form will actually behave as promised by its abstract specifications -that is, what properties of a form must be verified if we wish to use its instantiations with confidence. The methodology depends on explicitly separating the description of how an object behaves from the code that manipulates the representation in order to achieve that behavior.

4b. I C (T) A pre(rep(D) A out(T) D post(rep(T))
Step 1 shows that any legal state of the concrete representation has a corresponding abstract object (the converse is deducible from the other steps).
Step 2 shows that the initial state created by the representation section is legal.
Step 3 is the standard verification formula for the concrete operation as a simple program; note that it enforces the preservation of I c . Step

' j
We will use I a (rep(T)) to denote the abstract invariant of an object whose concrete representation is T, I C (T) to denote the corresponding concrete invariant, italics to refer to code segments, and the names of specification clauses and assertions to refer to those formulas. In step 4b, "pre(rep(T'))" refers to the value of T before execution of the function.
A complete development of the form verification methodology appears in [Wulf76].
-in the f>re and post conditions for each function, which describe the effect the function has on a graph which satisfies the invariant.
The form contains a parallel set of descriptions of the concrete object and how it behaves. Since btrees are implemented in terms of a vector of records, the concrete specifications give restrictions and effects on that vector. In many cases this makes the effect of a function much easier to specify and verify than would the abstract description alone. Now, although it is useful to distinguish between the behavior we want and the data structures we operate on, we also need to show a relationship that holds between the two. This is achieved with the representation function rep(T) t which gives a mapping from a vector of records to a graph and its root. The purpose of a form verification is to ensure that the two invariants and the rep(T) relation between them are preserved.
In order to verify a form we must therefore prove four things. Two relate to the representation itself and two must be shown for each function. Informally, the four required steps are^:

Verification of form properties of btree
and 2, which show the overall va proof of these steps. Proof: Take the clauses of the conclusion one by one:  For btree, several of these steps will be simplified by appealing to the following standard construction, which determines the correspondence between an index in the vector representation and a path from the root to a node in the abstract graph.
Let T[j] be the vector element which represents some node in a btree.
Then the (abstract) path from the root to a node is the path whose elements are These steps demonstrate that any vector T which satisfies I c represents a legal limited-height btree and that the initial value of a newly-instantiated btree is initialized properly. We will show below that each function preserves the accuracy of the representation, but the adequacy of that representation is established here.

Implementation of btree
The implementation part gives the bodies of the two functions and the bnode form Thus we have shown that the representation supports the abstraction. We will next discuss and verify some of the functions used by the programs of the previous section. Other functions are given in the form definition in Appendix A. Note that the invariants of btree (as well as those of bnode) must be preserved. This step is omitted from the proofs given here because no part of the btree representation is altered.  Changing this particular field has no effect on any invariant, so nothing must be proved.

Conclusion
This paper has used a concrete example to explain the Alphard philosophy on the development and verification of programs. The example was nontrivial; it implemented the abstraction with a nonstandard representation, and it involved a subtype. Several aspects of the development deserve special notice. colleagues ,n the Alphard project, particularly Bill Wulf and First, note that we did not verify the "main program". The program was simply a restatement of an algorithm that had undergone considerable analysis in another formulation.
It would have been unreasonable to redo that analysis in the course of verifying the program.
We therefore indicated that it was sufficient to ensure that the program was an accurate restatement of the algorithm. If program verification is ever to impact real programs, we must take such steps to avoid reproving all programs from first principles. Since the form encapsulates a collection of related information about how some abstract behavior is to be achieved, it is a reasonable body of information about which to prove theorems. This is evidenced by the nearly complete independence of the discussions of the minreg program and the btree form.
Next, the form presented in Appendix A contains functions not actually used by the program of the example. We believe that in the future libraries of forms will develop, and that these will be more useful than present libraries because the forms are verified and because Finally, some of our colleagues have expressed concern over the length of Alphard programs. Certainly the verification information adds text, but we believe that this information must be supplied somewhere. Nakata gave an Algol program for converting a parse tree to code [Nakata67], That program performs a slightly different operation from minreg, so an exact comparison is impossible, but if we ignore verification information and the btree functions that were never used, the number of lexemes in the Alphard procedures and forms is within 107, of the number of lexemes in Nakata's program. This crude comparison supports our feeling that the program text itself is not excessively large.