Exploiting positive equality and partial non-consistency in the formal verification of pipelined microprocessors

We study the applicability of the logic of Positive Equality with Uninterpreted Functions (PEUF) to the verification of pipelined microprocessors with very large Instruction Set Architectures (ISAs). Abstraction of memory arrays and functional units is employed, while the control logic of the processors is kept intact from the original gate-level designs. PEUF is an extension of the logic of Equality with Uninterpreted Functions, introduced by Burch and Dill [1994], that allows us to use distinct constants for the data operands and instruction addresses needed in the symbolic expression for the correctness criterion. We present several techniques that make PEUF scale very efficiently for the verification of pipelined microprocessors with large ISAs. These techniques are based on allowing a limited form of non-consistency in the uninterpreted functions, representing initial memory state and ALU behaviors. Our tool required less than 30 seconds of CPU time and 5 MB of memory to verify a 5-stage MIPS-like pipelined processor that implements 191 instructions of various classes. The verification was done by correspondence checking-a formal method, where a pipelined microprocessor is compared against a non-pipelined specification.


Introduction
The logic of Positive Equality with Uninterpreted Functions (PEUF) [2][3] was proposed as an extension of the logic of Equality with Uninterpreted Functions (EUF), introduced by Burch and Dill [4].Uninterpreted functions allow the abstract modeling of functional units and memories by replacing their actual implementations when formally verifying microprocessors.That leads to a considerable reduction of the computational complexity when verifying pipelined microprocessors.
By imposing some restrictions in the syntax of EUF, PEUF allows the use of a distinct constant for each data operand or instruction address used in the symbolic expression for the correctness criterion.By a distinct constant we mean a term which is not equal to any other term in the same domain.The result is a significantly increased computational efficiency of PEUF, compared to EUF.
The focus of this paper is how to make PEUF scale easily for verification of realistic pipelined microprocessors with large ISAs.We propose the use of initial state which is non-consistent across instructions, but consistent for the same instruction.We also propose an efficient way of generating distinct constants.The result is a very high computational efficiency for PEUF, and invariance of the verification CPU time and memory with the number of instructions implemented in the processor.
In modeling of microprocessors, we use abstraction of memory arrays and functional units.We achieve the abstraction by means of the Efficient Memory Model (EMM) [13][14] and its capability to dynamically introduce new initial state (as required by a simulation sequence) which is consistent with previously introduced initial state.Observing that every combinational block of logic can be implemented as a read-only memory with the logic block inputs serving as memory addresses, we abstract functional units at the bit level by replacing them with read-only EMMs.The definition of the EMM automatically enforces consistency of the output values for the present input pattern with output values returned for previous input patterns.
When using the EMM to replace memories and functional units, we assume that their actual implementations have been verified separately.For example, the formal method of symbolic trajectory evaluation has been combined with symmetry reductions to enable the verification of very large memory arrays at thc transistor level [ 1 I].An efficient representation of word-level functions has enabled the verification of complex functional units like floating-point multipliers [6].
We also use an efficient encoding technique [ 151, targeted to EUF and PEUF, for representing term-variables by means of BDD variables [ 11.This technique allows such term-variables to be used while symbolically simulating the processor's control logic, kept intact from the actual gate-level design.Thus, we avoid the need to create a separate abstract model, as is done in previous methods based on uninterpreted functions [4][7][8].Our previous results [IS] showed that the encoding technique cannot break the circular dependencies between functional units, which result from the feedback loops of the forwarding logic, and hence cannot scale for verification of realistic pipelined microprocessors.
When verifying pipelined microprocessors, we use the formal method of correspondence checkingcomparison to a nonpipelined specification, as pioneered by Burch and Dill [4][5].

Logic of Positive Equality with Uninterpreted Functions
Burch and Dill illustrated that EUF fits very efficiently the problem of representing and verifying a pipelined microprocessor [4] by comparing it against a non-pipelined specification.Particularly, functional units and memories can be abstracted as uninterpreted functions or predicates that take as inputs data operands, represented abstractly as terms.The forwarding logic can be described as nested ITE operators that select one out of several terms, based on a formula, produced by the control logic.
We can observe that there are 3 classes of terms needed in the verification of pipelined processors by correspondence checking: instruction addresses, register identifiers, and data.Of these only the register identifiers are compared for equality by the gate-level control logic, in order to form control decisions, e.g., for forwarding or stalling.This is based on the assumption that equality comparisons of data terms are made only by means of uninterpreted predicates, e.g., the decision to take a conditional branch when the terms are equal.Hence, the interpreted equality predicate "=" would never be applied on such terms.The same is true for instruction address terms.This allows us to impose some restrictions to EUF in order to gain computational efficiency.The logic of Positive Equality with Uninterpreted Functions (PEUF) extends EUF [4] by adding an additional class of terms called "p-terms" (for "positive terms") with its own set of variables and function symbols such that these terms are used in a highly restricted fashion.In particular, we allow equality tests to be performed among p-terms, but we only allow the results of these tests to be used in monotonically positive Boolean formulas.These formulas cannot be used to control ITE operators.The benefit of keeping this restricted class of terms is that they can be handled in a simpler and more efficient way by the validity checker.Fig. 1 shows the syntax of PEUF.

Syntax
P-terms include general terms, in addition to a special class of variables called "p-term-variables," the ITE operator applied to p-terms, and the application of a special class of function symbols known as "p-function" symbols.Two p-terms may be compared for equality, but the result is a restricted form of formula called a p-formula.A p-formula may only contain the monotonically positive Boolean connectives A and v.It cannot contain any negations and it cannot be used as the control for an ITE operator outside of uninterpreted functions and predicates.

Deciding P-Formulas
We say that an interpretation I is maximally diverse with respect to a set of p-terms when for any two terms s and t in T, I(s) = I(t) only when either 1) s and t are identical, or 2) s and I are both applications of the same p-function symbol f on lists of argument terms with equal interpretations, i.e., s = A s , . . . ., sk) and t =Atl, . . ., tk), where I(si) = I(ti), 1 I i 5 k.Also, it is assumed that a p-term-variable or a p-function-symbol application is not equal to any term-variable or function-symbol application.
For a p-formula F, an interpretation I is maximally diverse when I is maximally diverse with respect to the set of terms { t in F I t is a p-term-variable or a p-function application result).

Theorem. A p-formula is valid $it is true for a maximall) diverse interpretation.
The intuitive explanation of the theorem is that a maximally diverse interpretation of a p-formula creates the worst case scenario for the p-term equality comparisons in it.If the p-formula is true under a maximally diverse interpretation, then the validity of the formula will be preserved for any other interpretation, due to the monotonicity of the Boolean connectives A and v, which are allowed in p-formulas.The complete proof is presented in [3].The result allows us to use a distinct constant for each p-termvariable and p-function-symbol application, i.e., for each instruction address or data operand, when computing the correctness criterion.By a distinct constant we mean a term which is not equal to any other term in the same domain.

Encoding Term Values with Symbolic Bit Vectors
We

Abstracting Memories and Functional Units
We will use the types address expression, AExpr, and data expression, DExpr, for denoting the kind of information that can be applied at the inputs or produced by the outputs of an abstract memory.Let mo : AExpr + DExpr, defined as a mapping from address expressions to data expressions, be the initial state of such a memory.Then, m@), where a is an address expression, will return the initial data of the memory at address a.The write operation for an abstract memory will be defined as Write(mi, al, dl) + mi+1 [lo], i.e., taking as arguments the present state m; of a memory, and address expression al designating the location which is updated to contain data expression d l , and producing the subsequent memory state mi+l, such that: (1) Based on the observation that any functional block can be represented as a read-only-memory (ROM), with the block's inputs serving as memory addresses, we will represent abstract functional units as abstract ROMs.According to the semantics of an abstract memory, an abstract ROM will always satisfy the property a1 = a2 a flu,) =flu2), wherefl) denotes the output function of the ROM-modeled abstract functional unit.
Motivated by application to actual circuits, we will represent address and data expressions by vectors of Boolean expressions having width n and w, respectively, for a memory with N = 2" locations, each holding a word of w bits.The type BExpr will denote Boolean expressions.

mi+l(a2) + ITE(a1 = a2.dl. m ; W .
Address comparison is implemented as: while address selection AI t ITE(b, A2, A 3 ) is implemented by selecting the corresponding bits: (3) The definition of data operations is similar, but over vectors of width w.
We use the Efficient Memory Model (EMM), a behavioral memory model for symbolic simulation, in order to represent register files, memories, and latches in the circuits that we examine.During symbolic simulation, the sequence of writes to each EMM is represented as a list, write-lisf, that contains entries of the form (c, a, d ) , where c is a Boolean expression denoting thc set of contexts (conditions) for which the entry is defined, a is an address expression denoting a memory location, and d is a data expression denoting the contents of this location.The context information is included for modeling memory systems where the Write operations may be performed conditionally, depending on the value of a control signal, i.e., a write port enable signal.Initially write-fist is empty for each EMM.Given an update Write(mi, (cl, a l , dl)) of the current memory state mi, the subsequent memory state mi+l is defined as:

Exploiting Non-Consistency of a Memory's Initial State
In this paper we relax the constraint for consistency of all the initial state that is introduced on-the-fly.First, we use the value of the Sequential Program Counter (pointing to the instruction that follows sequentially the presently executed instruction; so that it will be equal to PC + 4 in many architectures), already available in the Execution Stage for computing the Target Program Counter of jump and branch instructions, as an additional input to the ALU.The effect is to make function InitSrare() of the EMM, that models the ALU, be non-consistent across instructions but consistent for the same instruction (as identified by the instruction address), thus turning the ALU into a different uninterpreted function for each instruction executed.The ALU in the specification non-pipelined processor is defined identically, with the Sequential Program Counter serving as an input.This idea is based on the observation that we need to preserve the consistency between the implementation and the specification simulation sequences, while the consistency within the same simulation sequence is not important when evaluating the correctness criterion.It should be pointed out that non-consistency is a conservative approximation.If the processor is correct when its functional units' outputs are non-consistent across instructions (from the same simulation sequence), it will also be correct when the constraint for consistency is imposed.The same idea can be applied to the initial state of all functional units and memory arrays.
Second, when modeling a Register File, which has one write-port and two read-ports (one for each of two source register identifiers), we represent it with two register files.Each of them provides the data for one of the source registers, while both get updated in the way that the original register file is.In this way, the initial states for the source registers read will not be consistent across the two register files.Note that this representation of a Register File is more conservative than the original oneif the pipelined processor is correct without the consistency constrained for the initial states of the two source registers for each instruction, it will be correct when we impose that constraint as well.

Experimental Results
In previous work, we attempted to verify a pipelined MIPS processor with a memory stage [ 151.The control logic was kept intact at the gate-level.Term-variables, encoded with BDD variables as opposed to constants, were used for the data operands and instruction addresses.However, for optimal BDD sizes, the BDD variables that encode the address term-variables of an EMM should precede in the variable ordering those BDD variables that encode the EMM's data term-variables.The problem was that the Data Memory gets addressed with data term-variables, produced by the ALU in the Execution stage.At the same time the Data Memory produces term-variables which are fed back into the ALU, by means of the forwarding logic and the Register File.As inputs to the ALU, these term-variables will be compared for equality against former input term-variables in order to select an output term-variable from among both the former output term-variables and a newly generated one.Hence, there is a circular dependency between the term-variables produced by the ALU and the Data Memory, due to the feedback loops in the data path.The result is an exponential complexity of the BDDs, when computing the correctness criterion, and insufficient memory, given a limit of 256 MB.Isles er al. [8], who were trying to verify an abstractly-defined design of a pipelined DLX processor with a memory stage, have also run out of memory, given a limit of 1 GB.
In this paper, we examine three versions of a 5-stage pipelined MIPS processor [I21 with a 32-bit data path -see Table I .Each is compared to its non-pipelined specification processor.The functional units, register file, and pipeline latches are replaced by EMMs.The control logic is described at the gate level.The instruction-decoding PLA is defined to produce a unique pattern of ALU control bits for each instruction, while the other control bits are determined according to the class of the instruction.-__

I
MIPS-I5 and MIPS-42 are based on the original MIPS instruction encodings [9] for the implemented instructions, while MIPS-I91 is based on the same instruction formats, but has different operation-code and function-code encodings, in order to use all possible instruction-encoding patterns and to define a very large ISA.This processor has the same functional units and the same pipeline structure as the previous two, except that it has more control signals going from the instruction-decoding PLA into the ALU in order to identify a different computation to be performed for each instruction.
When symbolically simulating the circuits and computing the correctness criterion, we used p-terms (represented as constants) for the data operands, immediate values, and instruction addresses.The register identifiers were represented as term-variables, and were encoded with BDD variables, as explained in Sect.3.For a way to encode the initial state of pipeline latches, the reader is referred to [ 151.

MIPS-I5 I 15 I 6
We also exploited the idea of a non-consistent initial state (Sect.5) by using a split Register File and a non-consistent ALU behavior, based on the Sequential Program Counter.The latter was achieved by letting the user specify nets that will be used as "imaginary" additional address inputs when generating the initial state of an EMM.We call such imaginary address inputs a tag.
We compared two encoding schemes for the distinct constants that are used for p-termsconsecutive constants, i.e., bitvectors encoding consecutive binary numbers, and 1 -hot constants which have a 1 in a single bit position and Os in the other bit positionssee Tables 2 and 3 Tagging the ALU with the Sequential PC made the difference between impossible and possible when verifying the MIPSlike processor with 191 instructions, as shown in Table 4.The use of constants for the values of instruction addresses (i.e., the values of the Sequential Program Counter) made the selection of the distinct uninterpreted function, to be performed by the ALU for each instruction, very efficient.As a result, the ALU was prevented from accumulating a single complex output term that is consistent across all instructions, which broke the effect of the forwarding logic feedback loops.
Using 1 -hot constants, as opposed to consecutive constants, reduced the CPU time and memory with 1/3 and the maximum BDD node count by half.The reason is that the I-hot constants lead to spreading the Boolean expressions, which result from deep nesting of ITE operators, across all bit positions of a bitvector, thus avoiding the building of a single big BDD.
Splitting the Register File resulted in only a negligible CPU time reduction, when the above two ideas were employed.Additionally tagging the initial state of the Register File and the Data Memory with the Sequential PC did not lead to a significant performnce improvement.However, processing the ALU tag (i.e., the Sequential PC) first, immediately followed by the other input p-terms, when performing the address comparisons for computing the EMM's initial state, resulted in reducing the CPU time by half but did not change the memory consumption.
When generating the initial state for the pipeline latches, we obtained the best performance from the following BDD variable order: 1) ExecutiodMemory, 2) MemoryNrite-Back, 3) Instruction-DecodeExecution, 4) Instruction-FetcWInstruction-Decode, 5) Instruction Memory, where a 'T' separates the two pipeline stages divided by the pipeline latch.The reason is that the ExecutiodMemory latch contains symbolic information for taken branches or jumps that will result in symbolic conditions for squashing the instructions in the preceding stages, i.e., this information will affect many instructions in flight.Hence, the BDD variables used for encoding the initial state of that pipeline latch have to precede the BDD variables that encode the initial state of the other latches in order to get smaller BDD sizes when computing the expression for the correctness criterion.Then, the MemoryNrite-Back latch will affect the expressions for the data operands of all the subsequent instructions by means of the forwarding logic, so that this latch is ranked second.Similar reasoning explains the order of the other pipeline latches.

Conclusions
We showed that the logic of Positive Equality with Uninterpreted Functions scales very efficiently for verification of pipelined microprocessors with very large ISAs.Critical to that was the idea of functional units' behavior that is non-consistent across instructions, but consistent for the same instruction, based on the value of the Sequential PC.This idea made the difference between impossible and possible when verifying a 5-stage MIPS-like pipelined processor that implements 191 instructions and resulted in verification time and memory which are invariant with the size of the implemented ISA, when extending it from 42 to 191 instructions.
I 191 I 129 I 26 I 2 I 34 I MIPS-I91

p-term) Fig. 1: Syntax rules for the logic of Positive Equality with Uninterpreted Functions
will consider two kinds of term valuesconstants, compared for equality with only terms from the same domain.Constants have a fixed interpretation and are encoded with distinct bit vectors with a Boolean constant (i.e., either true or false) in each position.Variables have an interpretation that may map them to any value in the domain.Our technique to dynamically generate bit vectors that encode term variables from the same domain can be summarized as follows (see[14]for details).When generating the n* vector, it could potentially have n possible valuesto be equal to any of the previous n-1 vectors, or to be distinct from all of them.Therefore, we use rlog(n)l new Boolean variables in the low order bits of the nth vector and the binary constant 0 in the remaining bit positions.If the vectors have a width of k bits, as determined by the circuit, then the number of variables generated for a new vector saturates at k.

Table 2 : Results when using consecutive constants for the p- terms I I I I J I ALU 11 21 I 3.1 I 65,589 MIPS-42
I Unified I ---11 177 I 25.4I 1