Program design with abstract data types

: This paper explores the use of abstract data types as a modularization and structuring technique in the design of programs. The concepts of type, and type definition are discussed. Some data structuring mechanisms are generalized and several simple examples are presented. The examples increase in complexity and conclude with the design of a directory system, illustrating the power of data types as a design tool.


Introduction
This paper explores the use of abstract data types as a modularizstion and structuring technique in the design of programs, particularly larger programs including compilers and operating systems.The emphasis is on design.The particular language in which the program will be coded should be of relatively little importance in the design process.We are interested in a technique for constructing specifications which can be converted at another time (and possibly by others) into an implementation.
Programming languages which provide data type facilities to varying degrees have been in existence since about 1967 when SIMULA-67 [1] appeared.Since then, there has been an ever increasing number of such languages, notably ALGOL-68 [71 PASCAL [8], and ELI [6].There are more advanced languages currently under development or proposed, including ALPHARD [9], CLU [3], and PASQUAL[51 A useful bibliography of papers concerned with data types and programming languages may be found in [5].
Abstraction refers to the mental process which, when confronted with a set of objects, can distinguish some objects from others on the basis of what they have in common.What is "abstracted out" is a unifying theme, called the type of those objects.To pick a conventional example, the set { 7, 1.35, -6, 'A', 0.3E6 } contains three types of entities: integers, reals, and characters.These are types which jus.t about every programming language provides.More abstractly, the set contains only two types: numbers and characters.Even more abstractly, it contains only one type -constant.
When we prefix "data type" with "abstract" we mean an arbitrarily complex type which characterizes a certain kind of behavior.
The concept of using abstractions in programs to control complexity is certainly not a new one.For very small programs it is sufficient to use a straightforward control flow through the program statements.Slightly larger to medium sized programs require the use of procedures to group individual statements in a logical manner, while a main program directs the flow of control among the various top-level procedures.When dealing with large programs it becomes necessary to group the individual procedures into modules.The process by which this should be done is not yet universally understood.This paper expresses the view that the use of data types as a modularization principle is the right approach.
We will begin by discussing what it means to define a new data type and will describe some basic data structuring mechanisms.Numerous examples are given to familiarize the reader with the notation, building up to the final example involving the design of a directory system.

1) Type Definitions
A type definition has two basic attributes: 1) functional specification: the behavior of a particular class of objects as seen by users of that class.
2) algorithmic specification: the means by which that behavior is accomplished.
The distinction between functional and algorithmic specification is most important.First, it allows programmers to construct implementations in parallel with one another without any more information in common than their mutual interfaces.
Second, it ensures that a change in data structure or algorithm is localized as long as the functional specifications remain unchanged.A type definition is really a module in the sense of the information boundaries discussed by Parnas [4].It provides a clear design procedure for the modularization of a large program or system, where one has been previously lacking.
Language defined types such as integer, real, and boolean are called scalar types.The term scalar is intended to signify that values of such a type are considered to be indivisible.The scalar type integer characterizes single values taken from the range {-n,...,n}, in conjunction with a set of operations which may be applied to integers, such as addition, subtraction, multiplication, etc.As far as a user of type integer is concerned, he should not know about, and more importantly not rely on, the way in which integers are represented or the particular algorithms which implement integer arithmetic on his computer.Adhering to this principle makes program portability possible, since different machines may use different algorithms.We generalize these notions to abstract data types and say that a type definition consists of: 1) a representation for objects of the type, known only to the definition itself (algorithmic specification).
2) a set of operations which manipulate this representation, the names and calling sequences of which are known to users of the type (functional specification), but the algorithms of which are known only to the definition itself (algorithmic specification).
The binding of operations to particular types is vital to the description of behavior.It is not enough to present a picture of a control block as the description of a "process" in an operating system.What is necessary is an explanation of exactly what can be done to a process, and specifically not how it is done.In fact control block formats should not be disseminated at all; rather only a description of valid operations which can be used on that data structure should be provided to users.One of the most important things gained from this is the "sanctity" of a data structure.
Every data structure may be viewed as having an invariant relation which it is characterized by (e.g. the mathematical notions of "list" and "tree").It is easier to verify the invariance of this relation if the relevant changes to the structure are made by a small number of "privileged" procedures.If you parcel out a data structure address and the right to modify arbitrary fields in arbitrary ways, it is possible (likely, in a brand new program) to find inconsistencies in the structure due to program bugs anywhere in the entire system.If all the information that is distributed is operation names, the scope of possible sources of error is greatly reduced to the implementation of those very operations.
The next sections present a set of tools useful for the description of data abstractions, and introduce some relatively intuitive syntax.

2) Enumeration Types
Often in the course of writing a program in a standard language, a programmer will find the heed to express values of scalar types which are not provided by the language.For example/ one might find an inventory program which, in order to identify the color of an object, uses integers to represent red-l, yellow- The operations provided by a type are declared inside the scope of the definition in order to exhibit modularity.The brackets type and end delimit the definition.To use one of these operations in a program description, it is necessary to prefix the operation name with the type name in order to permit several types to have operations with the same name (and for readability).The complement operation takes a single parameter of type color and returns a value of the same type.For example, if the statements shade«-green; shade«-color.complement(shade);were executed, shade would obtain the value red.Even though in most languages an implementation of these types will require the use of integers, nevertheless the irrelevance of the mapping may be preserved by declaring macros (or variables) for use in place of the integer constants.

3) Composite Types
Given the ability to define new scalar types, we must now be able to build upon those definitions to define more interesting types.
Consider as an example the definition of the type complex.We wish to represent numbers in the complex plane and provide arithmetic operations on them.
We need to specify to users that the operations add(w,z:complex):complex mul(w,z:complex):complex create(a,b:real):complex will be available.Add will take two complex values and produce the complex sum.Mul will behave analogously.Create will form a new complex value equal to a+bi.In terms of algorithmic specification, we can represent a complex value by its real and imaginary parts.Consider the following definition: type complex = declare r,i:real; op add(w,z:complex):complexbegin declare sumxomplex; The body of the definition contains some variable declarations (r and i) and some operation definitions (add, mul, and create).When a variable of type complex is declared, e.g.declare x.complex; or dynamically allocated, e.g.complex.create(l,2) the variables inside the definition of complex are allocated, just as when an integer variable is declared the bit string by which it is represented is allocated.From outside the definition it appears that x is an atomic entity to be used with the operations provided, e.g.

complex.add(x.y).
From inside the type definition, as within, say, the microcode which implements integer arithmetic, the structure of x is visible.In this case the structure consists of the variables r and i.To access a particular structure, it is necessary to specify which structure by prefixing the variables with the name of a complex variable.For example, the use of w.r in the body of the add operation refers to the r part of the complex object w which was passed in as a parameter.The use of sum.i refers to the i part of the locally declared complex object sum.An implementation of this type in a language with dynamic storage allocation will be straightforward, i.e. it will be possible to return a pointer to a newly allocated structure representing the result of an operation.In other languages it may be necessary to write the operations as subroutines, passing the destination variable as an output parameter.In FORTRAN, storage allocation could be simulated by declaring an array of pairs, each index into the array representing a single complex value.The informal language in which these examples are presented assumes that functions can return values of arbitrary types.
An example of the use of type complex might be: declare xxomplex; x<-complex.add(complex.create(l,-2),complex.create(2,3)); which would result in x acquiring the value 3+i.Note that we were not forced to represent complex values in terms of their real and imaginary parts.We are free to re-design the type definition as long as we conform to the stated specifications of the operations.For example, we might decide to use polar coordinates to represent a complex value, if that made the operations more efficient.Although the create operation is still specified to produce the value a+bi from parameters a and b, there is nothing to prevent it from internally converting to any other representation, and there is certainly nothing to prevent us from writing a different create operation which expects arguments with a new significance, e.g.In addition to the allocation of the structure variables, when a variable of type complex (or any type) is declared, the create operation is implicitly invoked.This is important in order to allow the data structure invariant relation (if there is one) to be established.Although in this simple example this is unnecessary, it will become necessary later on.'

4) Data Structuring
A comprehensive presentation of data structuring methods in relation to types may be found in [2].We will discuss briefly some structuring mechanisms in order to have some syntactic base upon which to build.

4.1) Arrays
In the past, an array has generally been thought of as a table indexed by integers, as is the case with a matrix.A matrix, however, is a type, having certain operations such as multiplication and inversion defined for it.An array is only a structuring* mechanism and might as well be generalized to a more powerful mapping than is possible with just integer domains.Think of the array as a mapping from any finite scalar type to any type at all.For example,  where medicalrec defines the fields of the record as local data and provides operations on that data.Since it is not unlikely that the structure of medicalrec will change at some point during system development, in this way we anticipate modifications by restricting their scope.

4.3) References
A reference value is a pointer to an object, with the restriction that reference variables can only reference objects of a single type.Typeless pointer variables are the cause of many program bugs since they are free to point to arbitrary addresses.
References are much safer to use and make programs less devious in their logic.The reference concept is important with regard to the use of a dynamic storage allocation mechanism.We postulate the operations new and free for every data type, but make them available for use only inside of a type definition.This is because of the need for initialization of the data structure invariant relation discussed in section 1.From outside of a type definition, the create operation must be used.
If c is a reference to complex, i.e. declare c.ref complex; then, inside of type complex, the statement c<-new complex; causes the dynamic allocation of a complex object, and assigns a reference to that new object to c.The statement free c; results in the storage referenced by c being released.All that we can do with the value of c is to assign it to other ref complex variables or pass it as a procedure argument, since as yet we have not defined any operations for reference types.In particular, we will not allow statements such as c«-c+2; which are likely to introduce errors.More importantly, these statements violate the notion of type, since if the address contained in c is incremented by 2 it probably no longer is the address of a complex variable.We will define an operator for references called dereferencing, which converts a reference into the object it references.If c contains a valid reference, then ct is of type complex.For example, ct.r is the r field of the complex object referenced by c It is not legal to write cr, since c is not of type complex.This imposes more structure on the program specifications and prevents many possible bugs.Our examples assume that T applies to the reference immediately to its left.We will use the.reference value null to indicate that no object is currently being referenced.

5) A Small Abstraction
With the tools just presented, we can begin to describe some simple abstract objects.Suppose we would like to model the concept of a "bag" full of data, in the sense that once a datum is inserted in the bag, it can only be removed by trial and error, since "reaching your hand" into the bag is done blindly.Assume that the bag is to hold integers.
We need to provide certain key operations, namely: If we desire to remove the restriction on the size of a bag, we can store the bag contents in a list.To define a list of integers, we need to define each node in the list and the rules for connecting them.We will first specify type node, so that we can use its general purpose operations to manage a list: A useful type of list for the purpose of defining type bag is a circular list, so that a random choice from a bag is accomplished by "spinning" the list before deletion.
A circular list of integers defined using type node could be:  Note also that the intermediate types node and clist can be re-used in the specification of other list structures or more abstract types such as bag without the need for duplication of code or effort.In addition, the three types make up three modules suitable for independent programmer work assignments.

6) Parameterization
As we mentioned in the introduction, abstraction involves the noticing of similarities.This requires that things which may be potentially dissimilar among a group of objects have a wide range of freedom of variation.The use of parameters to procedures is necessary to convey the notion of abstract operation.The "square" operation gains its usefulness because it will square any number.When types are given parameters, they acquire much wider applicability in a similar way.end; A type constructor may be thought of as a macro.When its actual parameters are specified, it completely defines a new type.In the declaration of r in string.concat, the parameter to its type is given as (s.len+t.len).A vector of this length (A) is allocated, along with a word (len) containing this value.If not for the ability to parameterize a type definition, we would have to define a new type string for every possible string length!Even more important is the ability to use type names as parameters to type definitions.In our previous example, we made the unfortunate (and irrelevant) committment that the objects contained in a bag must be integers.If, however, we had written the definition of bag as: To illustrate the power of systems design using data types, we have chosen as our final example a hypothetical directory system as might be found in an operating system.
The directory system to be designed will be a hierarchical one, in which a node name consists of a path through the directory structure to the appropriate descriptor.
For example, the name A.B.C refers to directory A, subdirectory B, descriptor C.There should be no a priori bound on the number of directories, nor on the depth of the hierarchy.We will not restrict the definition to that of file directory, since the notion of file is not relevant to the structure.The purpose of the directory is to associate an The directory itself will contain objects which will either be descriptors or subdirectories.We will call this type direntry for directory entry.The structure of a directory, then, will be a hashtable (as previously described) which is indexed by a string and which contains direntry objects: type directory(descriptor.type)declare H:hashtable( 100,string,direntry(descriptor)); The hashtable will have 100 buckets, but we could have left that as a parameter to directory.
In order to look up a path in a directory, we must first find the direntry corresponding to the path header (via hashtable.find)and then look up the rest of the In the end we have a precise description of the concept of directory.This directory system may be used in conjunction with a file system (note that there is a difference between a file system and the directory structure it uses, as we have just shown).It may also be used to store objects in a protection system, or for any of a number of other applications.Each of the types directory, direntry, hashtable, etc. may be viewed as a module suitable for a programmer work assignment.
type complexdeclare r,theta:real; op create(a,b:real):complex = begin declare cxomplex; c.r«-sqrt(a*a+b*b); c.theta<-arcsin(b/c.r); re\um c end end; declare A:array[color] of integer; declares a vector which is indexed by objects of type color and whose individual elements are integers.We could use such an array to represent the frequencies of the various colors of light, e.g.A[red] = 6485.A two dimensional array can be modeled as a vector of vectors, e.g.declare A.array [1:10] of array[l:5] of real; so that A[i] denotes a row (or column, depending upon the interpretation).
empty(b:bag):boolean put(b:bag,i:integer) take(b:bag):integer and we would like to guarantee to the external world that a "taken" element is chosen at random.There are several ways to specify this type.One way is to use a vector to represent the state of the bag, although we have to place an upper limit on the size of the bag.Consider: have specified that a bag will consist of a vector of length 100 (A), and an integer (top) which we will use to indicate the current size.We have specified the create operation, since here it is necessary to perform initialization.The invariant relation of a bag is "the bag contains (top) elements" so we must set top to zero whenever a bag is created.If we had allowed the new operation to be used outside of :he type definition, top could remain unitialized.We can specify the empty and put operations easily: op empty(b:bag):boolean » return (b.tOp=0); op put(b:bag,integer) = if b»top=100 then error comment overflow; else begin b.top<-b.top+l;b.A[top]«-i end; Assuming a uniform random number generator, take can be specified as: op take(b:bag):integer = if b.top=0 then error comment underflow; else begin declare k,i:integer; M-randint(l,b.top); i<-b.A[k]; b.A[k>-b.A[top]; comment fill in gap; b.top<-b.top-l;return i end end of type bag; op empty(c:clist):boolean » return (c.size-0) end of type clist; Now type bag can be simply defined as: type bag = declare cxlist; comment the create operation for a clist performs the necessary initialization for a bag; op empty(b:bag):boo|ean) • return clist.empty(b.c);op put(b:bag,i:integer) = clist.insert(i,b.c);Op take(b:bag):integerbegin clist.spin(b.c);return clist.remove(b.c)end end of type bag; Note that we have completely re-designed the algorithmic specification of type bag, but left the functional specifications unchanged.Any user of type bag would not have had his programs affected (except for the size restriction having been removed).
object of type descriptor (maybe filedescriptor) with one of type path.The operations provided for directories should include lookup and insertion functions.If we think of a path name as a list of strings, we can define type path as: type path « declare l:list(string); op first(p:path):string = return list.head(p.l);op rest(p.path):path= return list.tail(p.l);op empty(p:path):boolean • return list.empty(p.l);op terminal(p:path):boolean = return path.empty(path.rest(p))end; parameterized type is called a type constructor^ since the types resulting from different parameter values may be widely differing.Consider the definition of type string in terms of the primitive type character: for i from 1 to s.len do r.A[i>-s.A[i]; for i from 1 to t.len do r.A[s.len+i>-t.A[i]; .return r end etc.
7) A Larger Design ExampleA data type having many applications in systems programming is the hash table.For this discussion, we characterize a hash table by the type of object used to index the table, by the type of object stored in the table, and by the number of buckets In order to find a result value given an index value, the appropriate array element must be found using a hash function.Then the list must be searched for the index value.Since the hash table module knows nothing about the representation of values of type index, it is up to index itself to provide that hash function.For example, Note that if the search procedure does not find the index on the list, it returns the undefined value of type result -a convenient way to indicate an error to the caller.This of course assumes an operation called "undefined" for type result.now remains to be able to insert values into the hash table.The insert function must search the appropriate list and only add to the list if the value is not already there: in the case of indexing by strings, we would expect to find the operation string.hashdefined.We can then define the hashtable.findoperation as: Having finished the definition of type directory, we must now tackle the definition of type direntry.A direntry has a name (part of a path) and either indicates a descriptor or a sub-directory: To look up a path in a direntry which is a node requires checking that the path terminates at that point.If the direntry refers to a sub-directory, then the directory.lookupoperation is recursively applied: op lookup(p:path,d:direntry):descriptor = if d.node then if path.empty(p) then return d.valuet else return descriptor.undefinedelse return descriptor.lookup(p,d.subdirt);To insert a descriptor in an existing direntry, we must do the insertion in the sub-directory (if there is one): op insert(p:path,d:direntry,r:descriptor):boolean « if d.node or path.empty(p)then return false else return directory.insert(p,d.subdirT,r);Creation of a new direntry (as required by the directory.insertoperation) is straightforward when the path is a terminal.When it is not, creation of the direntry