Formal Program Verification Using Symbolic Execution

Symbolic execution provides a mechanism for formally proving programs correct. A notation is introduced which allows a concise presentation of rules of inference based on symbolic execution. Using this notation, rules of inference are developed to handle a number of language features, including loops and procedures with multiple exits. An attribute grammar is used to formally describe symbolic expression evaluation, and the treatment of function calls with side effects is shown to be straightforward. Because symbolic execution is related to program interpretation, it is an easy-to-comprehend, yet powerful technique. The rules of inference are useful in expressing the semantics of a language and form the basis of a mechanical verification condition generator.

I. INTRODUCTION AN accepted way of proving things about programs is to use rules of inference like those introduced by Hoare [3]. Such rules are usually formulated so that they can be applied to the last statement of a program, yielding one or more shorter programs and possibly some formulas in logic to be verified. By iteratively applying rules of inference, the task of proving program correctness is reduced to that of proving statements in predicate calculus. Another proof technique is based on the notion of symbolic execution [2]; statements are processed in the same order that an interpreter would execute them, as opposed to the "backward" order of the first method. It appears as though both techniques are equally powerful and logically equivalent. However, the close analogy between symbolic execution and interpretive execution seems to make the symbolic execution method easier to comprehend. In particular, symbolic execution is a natural paradigm for expressions which can contain function caUls with side effects. One of our goals is to develop rules of inference which formalize symbolic execution. To our knowledge, such a formalism has not appeared in the literature to date. Our second goal is to present rules of inference for some ad-  [10] are presented, as are rules for procedures with multiple labeled exits.
Finally, we will introduce an extended notation which allows the formal treatment of expressions with side effects. Side effects are strictly disallowed in verification-oriented languages such as Pascal [9] and Euclid [5], but it will be shown that this restriction can be relaxed with little difficulty.
The ideas in this paper grew out of a project to design and implement a mechanical verification condition generator [1]. Early in the project it became clear that we needed a concise formal 'notation-for stating verification rules for the constructs found in contemporary programming languages. We found the notation presented in this paper to be a good solution to this problem.
After an introduction to symbolic execution, notation and rules for a simple language are described. The rules are then extended to handle multiple-exit loops, and procedures with multiple labeled exits. Finally, a formal treatment of expressions with side effects is presented.

II. CONCEPTS OF SYMBOLIC EXECUTION
In symbolic execution the values of program variables are represented by symbolic constants or expressions. For example, the value of variable v might be represented by "B + 3," where B is a symbolic constant (not a program variable). The collection of all variables and their values is called a state. As a program path is "executed," assumptions about the state are recorded in the path condition.
For example, suppose we wish to verify the following program which sets x to its absolute value: pre x=y; if x < 0 then x : -x endi; post x >= 0 & (y=x or y=-x) The first and last lines contain input and output specifications called pre-and postconditions. The keyword endi is the closing bracket for if. In the initial state, variables x and y have arbitrary constants X and Y as their values. For now, we will write the state as follows: State = {x is X, y is Y}.
Next we evaluate the precondition by replacing each variable by its value in the current state, obtaining X=Y. Note that X and Y are values while x and y are variables. We assume the precondition is true when the program begins by setting the path condition to X=Y:  1, JANUARY 1982 State = { x is X, y is Y}, PathCond = (X=Y). The path taken through the program depends on the value of the expression x<O. We symbolically evaluate x<O to obtain X<O, and first assume that it is true by "anding" it to the path condition, obtaining State { x is X, y is Y}, PathCond = (X=Y & X<O).
The statement "x := -x" is executed by changing the state so that the value of x is -X: State = { x is -X, y is Y}, PathCond = (X=Y & X<O). Now we want to show the postcondition is true. Substituting values from the current state into the postcondition yields -X>=O& (Y=-X or Y=--X). Since this is implied by the path condition, this path is verified.
Following the other path (X<O is false) gives rise to the path condition PathCond = (X=Y & -(X<O)) where ";" is the logical negation operator. The postcondition becomes X>=O & (Y=X or Y=-X).
Again, this follows from the path condition, so this path is verified. The program is correct because all paths have been verified.
III. NOTATION AND TERMINOLOGY To formalize this approach, some notation is introduced. A substitution S is denoted by S = [tl/xl, t2/x2,... ,tn/xn]. where the term ti is to be substituted for variable xi, and the x's are all distinct. An instance of an expression is obtained when a substitution is applied to the expression. This operation is indicated by "I", e.g., B [e/x] indicates the instance of B in which each occurrence of x is replaced by e.
Substitutions can be composed to form a new substitution. For any expression b, and substitutions R and S, the instance b I(RS) obtained by applying the composition of R and S to b is identical to the instance (b I R) IS obtained by left-toright application of R and S.
The identity substitution is [], i.e., b [ ] b. The application operator has higher precedence than logical connectives, e.g., in the formula P&Q IS, S is only applied to Q. If Q stands for a statement with many components, S is applied to each of them.
Substitutions are used to formalize the state concept in symbolic execution. The value of program variable x in state S (which is a substitution) is x I S.
To formalize symbolic execution, we will use fonnulas of the forn S, PC\A to indicate the correctness of statement list A, given an initial path condition PC and state S. A is a statement sequence, the last of which is -a statement of the form confirm Q. This specifies that Q must be true of the fmal state. Hence, the formula [ ], P\A; confirm Q corresponds to P{A}Q in the more conventional notation found in the literature.
Rules of inference have the following form: This means that, given A and B, we can infer C. In practice, the rules are used "backwards"-the rule is used to reduce the problem of proving C to that of proving both A and B.
To be more precise about the concept of correctness, we say that a procedure is partially correct if its postcondition is true whenever the procedure terminates (exits), given that the precondition was initially satisfied. A procedure is totally correct if it is partially correct and it always terminates. This discussion will not consider termination proofs, and the term "correct" is taken to mean "partially correct." IV The keyword endi is the closing bracket for if. The conditional expression B is evaluated by applying it to the state S.
There are two possibilities: B IS or -BBI S. The proof is by cases. In the first case, B IS is assumed and the statement sequence A2 is executed before Al. In the second case, -B IS is assumed and A3 is executed before Al. These cases result in 44 the first two lines of the rule. If both cases can be verified, the if statement followed by Al must be correct.

C. Confirm Statement
PC-QIS S, PC \ confirm Q The confirm statement asserts that Q must be true for the program to be correct. A confirm statement is verified by proving that Q is implied by the path condition when the variables in Q are replaced by the values in the current state. The keyword endw is the closing bracket for while. The iteration statement is a "while loop" that contains a loop invariant. The first line of this rule states that the loop invariant I must be true when the loop statement is encountered. The next two lines consider the cases B and -B. If B is true, the loop body is executed and we want to show that I is maintained as a loop invariant, as stated by the second line. The state is replaced by the identity substitution because nothing is known about the state (except on the first iteration) due to assignments in A2. Assuming B is not true, the loop terminates and Al is executed. After an arbitrary number of iterations, the only other assumption we can make is I, the loop invariant. This gives the third line of the rule.

E. Procedure Call Statement
Consider the following procedure declaration: procedure p (var x, const y); use var g; use const c; preP; post Q; A end p; Procedure p has a variable (value-result) parameter x, a constant (value) parameter y, and a global variable g which is altered by p, and a global c which is not altered by p. The body of p is the statement sequence A. The postcondition Q may refer to the initial and final values of x and g; their initial values are #x and #g while their final values are x and g, respectively.
We assume that no two distinct variables in the program have the same name, e.g., if a procedure uses "x" to name a local variable, then no other procedure can declare a variable or parameter by the same name. If this assumption does not hold, variables are simply renamed to produce an equivalent program which is suitable. This eliminates the ambiguities normally resolved through scope rules, since each identifier refers to a unique variable.
The rule of inference for a call to this procedure is where SI = [v'/x, e/y, g'/g, v/#x, g/#g]S and v', g' are new unique constants.
The first line of the rule indicates that the procedure precondition must be true in the current state. The substitution [v/x, e/y] is used to replace the formal parameters by the actual parameters. Composing the substitution with S evaluates the actual parameters in the current state.
The second line reflects the calling program after the execu- The global variable g will have the new value g', and the variable parameter v acquires the new value v'. The order of substitution for v and g is important because they might be the same variable. This places a call by value-result interpretation on parameters, because v' may be "copied" onto g'.
The new path condition is PC & Q IS1. Let us look at the substitution Sl applied to the postcondition Q. Unique constants v ' and g' are substituted for the formal variable parameter x sad altered global g. The constant parameter e (which may b an expression) is substituted for its formal parameter y. The initial value of the actual variable parameter v is substituted for #x, and g is substituted for #g. For example, consider the following procedure, derived from our previous program which computes absolute values: where B is some path condition. The rule for assignment statements is applied, yielding Now we use the rule for procedure calls. The first line of the rule says to prove The second line of the procedure call rule yields The confirm rule indicates we must now prove The consequent is simplified by applying the substitution, yielding B& (v'>=O& (v'=-2 or v'=--2)) -v'=2 which is true. Therefore, the path and hence the calling program is verified.
V. MULTIPLE-EXIT LOOPS In this section, several new rules are introduced to allow an equivalent of Zahn's loop construction [10]. The  Ln: Bn enda ends; The phrase altering v indicates that the variable v can be modified by the loop body, which consists of statement list Al, a maintain statement, and statement list A2. Either Al or A2 may be empty. One and only one maintain statement is included in the loop body to provide the loop invariant.
Each loop statement must be followed by a select statement which contains labeled statement lists called alternatives. Within Al and A2, there may be statements of the form exit Li, which cause immediate transfer of control of the alternative labeled Li. Control is only transferred to within the select statement following the immediately enclosing loop; however, to effect multiple-level exits, alternatives may contain exit statements themselves. With this loop statement, there are two cases to be proven.
In the first case (line 1), the loop body is executed up to the maintain statement by executing Al. The confirm statement indicates that the loop invariant must be true at this point.
The reason for the end; A3 after the confirm statement is that Al may contain an exit (discussed in the next -section) which transfers control to A3; the keyword end marks the end of the loop. A3 contains the select and subsequent statements. In the second case (line 2), the loop invariant is assumed, with unique constants as values for altered variables. The loop body is then executed, starting ant must be confirmed. By induction, it can be shown that the assertion I in the maintain statement will always be true if the two cases hold.
A further explanation of the altering clause is in order. In the literature (and also in Section IV-D), the iteration rule is usually based on the assumption that the loop body can alter every variable in the state. Therefore, the state is discarded by using the identity substitution, and the only informatio'n carned out of the loop is contained in the invariant I and the exit condition (i.e., -B in Section IV-D). This makes it necessary to use invariants that state details about variables not altered by the loop, because the loop invariant must be strong enough to eventually allow the proo'f of the final confirm statement. To simplify the task of finding sufficiently strong invariants, an altering clause is added to the loop statement to specify which variables might be modified by the body of the loop. Now, most of the state will usually be unchanged by the loop, and the loop invariant only needs to specify how the loop affects' the variable that it alters. The concept of the altering clause is due to Ogden [8]. where Al does not contain the statement end.
This rule simply states that an exit statement causes control to transfer to the statement following the immediately enclosing loop. Rules for the branch statement are given below. Recall that the first statement in A2 is a select statement because it follows a loop statement. where LI and L2 are distinct identifiers.
These rules describe the action of transfering control to a labeled alternative. Note that control passes to the end of the select statement after the alternative statement list is executed.
These rules assume that the branch label must be listed in the select statement.
For example, we can apply the exit rule to S, PC \exit TWo; ... ; end; The second branch rule apphes, yielding S, PC \ branch Two; select TWo: p(x) enda; THREE: ... enda ends;. . . Now the first branch rule is used to obtain S,PC \p(x); ...

VI. MULTIPLE-EXIT PROCEDURES
The general form of the procedures dealt with in this section is given in Fig. 1. This form is like that of Section IV-E except two labeled postconditions are provided to allow several control paths to emerge from a procedure call. Statements in the body of procedure p are denoted by A. To simplify the notation in this section we restrict procedures to one variable (valueresult) parameter, one constant (value) parameter, two global variables, only one of which can be modified, and two exit labels. The generalization to an arbitrary number of these constructs is straightforward.
Each procedure call is followed by a select statement with labeled alternatives corresponding to the labels in the postcondition. An example of such a call is p(v, 10); select Li: . . . enda; L2: . . . enda ends; Within p, statements of the form exit Li and exit L2 are allowed. When an exit is executed, control is transferred back to the calling procedure. The alternative corresponding to the exit label is executed next as was the case with the loops in Section V.

A. Procedure Declaration
The procedure declaration in Fig. 1  Each procedure declaration in a program is verified independently in a similar manner. The substitution is used to save the initial values of x and g in #x and #g so that the postconditions (and loop invariants) in the body of p can refer to them. Notice  This rule assumes that p has been declared as in Fig. 1 call on a multiple-exit procedure must be followed by a select statement. The new procedure-call rule is like the first (see Section IV-E) except for each label, a different postcondition is assumed, and a different path is taken. The branch statements direct program flow to the proper alternative. The substitution SI is identical to S1 in Section IV-E.
As an example, consider the following procedure which dec-

VII. EXPRESSIONS AND FUNCTIONS
Symbolic expression evaluation, without side effects and function calls, is easily carried out by instantiating expressions with substitutions as in the above inference rules. To incorporate functions with side effects, an attribute-grammar [4] is used to formally describe expression evaluation. The treatment of Pascal-like structured variables is also described.

A. Notation
All operations are written in functional notation, e.g., A + B is written + (A, B). This removes concern over operator precedence. Structure references are denoted by acc(A, Slist) where A is a structure and Slist is a list of selectors, i.e., array indexes and record field-names (pointers will not be considered here). The function "acc" is called the access function. The expression ch(A, Slist, t), is a structure whose value is identical to that of A except at the element named by Slist, where the value is t. The function "ch" is the change function. The change and access notation is dueto McCarthy and Painter [7]. The concept was extended by Luckham and Suzuki [6]. Recall that in performing the composition, S is applied to u. The modify function changes the value of x to v without applying S. This function will be used in the rules that follow. We further defme the notation M(S, acc(A, Slist), v) to mean M(S, A, ch(A, Slist, v)). This generalization provides a mechanism whereby values can be substituted for structure elements rather than just simple variables. This definition is used in the rules for assignments, and procedure and function calls.
All nonterminals in the attribute grammer have eight attributes: Ci an inherited verification condition list Cs a synthesized verification condition list Ss a synthesized state Si an inherited state Ps a synthesized path condition Pi an inherited path condition V a synthesized value L a synthesized location.
Attributes are further qualified by nonterminals in the production rule, e.g., Pi(expr) is the inherited pat-h condition attribute for the nonterminal "expr." If the same nonterminal occurs more than once in a production rule, integers are appended to distinguish the occurrences, e.g., "exprl" and "expr2." The The location is x, the value is 10, the new state and path condition are unchanged, and there are no verification conditions. The attribute grammar is presented below with comments to help the reader.  If an expression is an identifier, then its evaluation is performed by finding the value of the identifier in the state. The path condition and state are unchanged. We will not show the production rules for "id." Instead, we will simply state that the yalue attribute of "id," i.e., V(id), is the identifier itself.
A constant or a field-name also evaluates to itself; we will not give an explicit rule for them.
2) expr =acc(id, slist): Ci(slist) = Ci(expr) Si(slist) = Si(expr) Pi(slist) = Pi(expr) Cs(expr) = Cs(slist) Ss(expr) = Ss(slist) Ps(expr) = Ps(slist) V(expr) = acc(V(id)I Ss(slist), V(slist)) L(expr) = acc(V(id), V(slist)). To evaluate a structure-reference, the selector-list is evaluated; then the structure is accessed. Notice that side effects from the evaluation of slist can affect the value of "expr." Also notice that the location attribute has the same selector list as the value attribute, but the structure name, V(id), is not instantiated by the state, Ss(slist). Primitive operations are performed by evaluating the operands from left to right and returning the result of applying the operator to the evaluated operands.
Our rule for function calls deals with functions of the form shown in Fig. 2. As with previous procedure rules, a generic example is used, since its generalization to multiple parameters and global variables is straightforward. Functions do not contain exit statements; an implicit exit follows the function body. The statement f = e assigns e to be the value returned by f. The function production rule is similar to the rule for procedure calls. The main differences are that a value (the constant f') is returned, and parameter evaluation can have side effects. where v', f', g' are new unique constants, and Fig. 2 gives the declaration of f. As above, sharped (#) symbols refer to initial values. The first step in the symbolic execution of f(v, e) is to evaluate the parameters which may contain function calls which change the initial state Si(expr) and path condition Pi(expr) and add items to the verification condition list. Symbolic execution then adds the precondition of f to the list of verification conditions Cs(e) after the appropriate substitution for variables. The new values v' and g' of v and g are recorded in the output state Ss(expr). The postcondition of f is added to the path condition after the appropriate substitution for variables. The value returned, f', is substituted for f in the postcondition. The replacement of formal parameters by actuals is essentially the same as the procedure-call rule in Sections IV-E and VI-B. The additional intricacies in this section-stem from 49 L(exprl). = nil. VIII. RULES FOR STATEMENTS We are now prepared to present rules of inference for statements, allowing expressions with side effects. The rules will be similar to the previous versions, but expression evaluation is accomplished using the "E" function rather than through application of substitutions. We can now define this function more precisely in terms of attributes: E(expr, Si(expr), Pi(expr)) = (L(expr), V(expr), Ss(expr), Ps(expr), Cs(expr)) where Ci(expr) is the empty list of verification conditions. The rules for statements are listed in Fig. 3.
The assignment rule says that the expression on the left-hand side is evaluated first, yielding a location, L (see Fig. 3). The right-hand side is then evaluated yielding a value V. The state is modified by changing the value of L to V. The verification conditions, Cl and C2, resulting from the evaluation of exprl and expr2 must be true. Notice that the final state reflects side effects from exprl and expr2. For example, consider the path [ A, 3)). The precondition of inc iS TRUE, so the verification condition attribute Cs is TRUE TRUE. The new conditional rule is like the first one in Section IV-B except the evaluation of expr can result in a different path condition and state (PC' and S'), andthe precondition of its function calls must be verified. The confirm rule is identical to the previous one. The loop rule is essentially unchanged. For consistency, the modify function is used in place of the composition operation used previously. The exit rule and branch rules are identical to the ones in Sections V-B and V-C.
The rule for procedure calls in Fig. 3 is simil-ar to the attribute-grammar production for function references. The  changes to the state are the same except the constant f', representing the value of the function, is not necessary. This is also true of the substitution applied to the postcondition(s). A verification condition is generated for each exit label as in the previous procedure-call rule.

3) Confirm Statement
As before, the correctness of a procedure declaration is established by assuming its precondition, executing its body, and confirming its postcondition. A select statement is used to select the appropriate component of the postcondition for the exit that is taken. The initial state contains the values of #'d variables.
A function declaration is verified in a similar manner as shown in Fig. 3. Recall that functions can' only have a single exit. All procedures and functions called from the body of a procedure or function must be correct by the same definition.

IX. SUMMARY
The notion of symbolic execution facilitates the writing and understanding of formal rules of inference for proving program correctness. Using these rules, a language with side effects, multiple-exit loops, and procedures with multiple exits has been formally described. A similar set of rules has been used as the basis of a mechanical verification condition generator [1]. In our experience, the implementation of a verification condition generator is relatively straightforward once the verification rules have been precisely stated using the notation developed in this paper. This is particularly true of the symbolic execution of expressions. The' details of the attribute grammer in Section VII are quite intricate, but the implementation of the attribute grammar is very similar to its formal specification.

I. INTRODUCTION
IN packet-switched broadcast channels the problem of multiple access has received several solutions ranging from schemes of the dedicated type, like TDMA and FDMA, through demand assignment and reservation methods, to the completely random methods of contested access like the Aloha scheme.
On the one end of the spectrum we have the class of dedicated methods. The chief representative of this class is the widely known TDMA method which has been extensively used in data communications, but only recently analyzed in the context of satellite or packet-radio communications [1], [2].
It is known that this scheme performs satisfactorily in terms of channel utilization (throughput) and average packet delay if the traffic is heavy (high load factor) or if the user terminals produce data on a regular (almost periodic) basis. Otherwise the performance of TDMA tends to become unsatisfactory. 0098-5589/82/0100-0052$00.75 X 1981 IEEE