A formal specification of a visual language editor

A non-trivial case study is presented, on the use of the Larch specification languages to describe the Miro visual languages and graphical editor. In addition to excerpts from the specification, the authors discuss properties of Miro provable from the specification, limitations of Larch, and general lessons learned from this exercise.<<ETX>>


Introduction to and Contributions of This Paper
The Mir6 visual languages let users specify formally through pictures the security configuration of file systems (i.e., which users have access to which files) and general security policy constraints (i.e., rules to which a configuration must conform). With the Mir6 editor, users draw both types of pictures and access other Mir6 tools. This paper presents a non-trivial case study on the use of the Larch specification languages to describe the Mir6 languages and editor. We had two goals in mind while writing the specification: to end up with a formal specification that could serve as both documentation of and a basis for formal reasoning about the specificand, i.e., the Mir6 languages and editor; and to apply Larch to determine its strengths and weaknesses. Though  We begin with brief descriptions of Mir6 and Larch in Section 2 and present excerpts from the specification in Section 3. In Section 4 we discuss properties of the specificand provable from the specification, limitations of Larch, and general lessons learned from this exercise. In Section 5, we close with a brief discussion of related and future work.

Specificand: Mir6
Mir6 consists of two visual languages, the instance language and the constraint language [HM r T"90]. An instance (language) picture graphically denotes an access matrix that defines which users have which accesses to which files. Instance pictures model the specific security configuration between a set of users and a set of files, e.g., Alice cannot read Bob's mail file. A constraint (language) picture denotes a set of instance pictures (or equivalently, the corresponding set of access matrices) that satisfies a particular security constraint, e.g., users with write access to a file must also have read access. When an instance picture, IP, is in the set denoted by a constraint picture, CP, we say IP "matches" CP or IP is "legal with respect to" CP.
The basic elements in the instance language are boxes and arrows. Boxes that contain no other boxes represent users and files. Boxes can contain other boxes to indicate groups of users and directories of files. User group boxes may overlap to indicate a user is in more than one group. Labeled arrows go from user (group) boxes to file (group) boxes; the label indicates the access mode, e.g., read or write. Access rights are inherited by corresponding pairs of boxes contained within boxes connected by arrows, thus minimizing the number of arrows necessary to draw. Arrows may be negated to indicate the denial of the labeled access. Figure 1 shows an instance picture, as drawn in the Mir6 editor. The positive arrow from Alice to Alice's files indicates that Alice has read and write access to her files. The positive arrow from Alice' s friends to Alice's schedule file indicates that both Bob and Charlie have read access to Alice's schedule. By default, since there are no arrows between Alice's friends and her other files, Bob and Charlie do not have read access to Alice's mail file. We could also show this property with an explicit negative arrow between Alice's friends and her mail file.
The constraint language also consists of boxes and arrows, but here the objects have different meanings. A box labeled with an expression defines a set of instance boxes. E.g., the left-hand box in Figure 2 denotes the set of instance boxes of type User. There are three types of arrows, allowing us to describe three different relations between boxes in an instance picture, IP: syntactic (solid horizontal) -whether an arrow explicitly appears in IP; semantic (dashed horizontal) -whether an access right exists in the matrix denoted by IP; and containment (solid vertical with head inside box) -whether a box is nested within another in IP. Additionally, the thickness attribute of each constraint object is key in defining a constraint picture's meaning: in general, for each set of instance objects that matches the thick part of the constraint, there must be another set of objects (disjoint from the set matching the thick part) that matches the thin part. Figure 2 shows a constraint picture that specifies that users who have write access to a file must have read access to it as well.
The Mir6 editor provides the facilities to create, view, and modify both instance and constraint pictures. Pictures can be saved in files and read back into the editor. The editor also serves as an interface to other Mir6 tools, e.g., one that generates the access matrix denoted by an instance picture, and one that translates a picture into PostScript form. The left-hand side of the window in Figure 1 displays a menu from which users can select the type of picture and object they wish to draw. Other menu buttons provide additional editing commands and interfaces to the other Mir6 tools. The editor maintains language-specific constraints as users draw pictures. For example, all arrows in Mir6 pictures must be attached to boxes: if a user moves a box in the picture, all arrows attached to that box also move. Our formal specification captures this behavior precisely.

Specification Language: Larch
We wrote our specifications using Larch specification languages. We assume some rudimentary knowledge of Larch, present a refresher here, and give further details as we present the specification. See Larch provides a "two-tiered" approach to specification. In one tier, the specifier writes traits in the Larch Shared Language (LSL) to assert state-independent properties of a program. Each trait introduces sorts and operators and defines equality between terms composed of the operators (and variables of the appropriate sorts). E.g., the Box trait ( Figure 4) introduces the sort B and the operators move JDOX, resize_box,and copy-box; four equations constrain the meaning of copy -box.
In the second tier, the specifier writes module interfaces in a Larch interface language, such as the Generic Interface Language (GEL) [Che89], to describe state-dependent effects of a program. A requires clause states each procedure's pre-condition; an ensures clause, its post-condition; a modifies clause lists those objects whose value may possibly change. The assertion language for the pre-and post-conditions is drawn from LSL traits. Through based on clauses, a Larch interface links to LSL traits by specifying a correspondence between (programming-language specific) types and LSL sorts. An object has a type and a value that ranges over terms of the corresponding sort.
Part of the interface specification for the editor below defines the type Editor, which is based on the Ed sort, introduced in the EditorTrait trait. The MoveBoxes procedure's pre-condition requires that some non-empty set of objects be selected. The post-condition says that the state of the picture in the editor is updated (as defined by the move-boxes operator whose meaning is obtained from EditorTrait) and that all objects are unselected. In a post- M deifications in this paper appear exactly as they have been cnecked by the LSL and GILC tools. Thus, non-ascn symbols appear as LSTEXmacros (e.g., \u for u (set union)).

The Specification
We divide the specification into two main pieces: that specifying properties of Mir6 pictures and that specifying the behavior of the editor. We use LSL to describe Mir6 pictures and additionally use GIL to specify the editor's procedures that manipulate the pictures. Figure 3 illustrates how the traits of the LSL part of the specification fit together. Each oval corresponds to a trait and an arrow indicates that one trait includes another. A picture in either the instance or constraint language is a collection of boxes and arrows (BasicPic trait). Mir6 pictures are further constrained to satisfy well-formedness properties (Pic and WFPic), which include, for example, the condition that arrows be attached to boxes. Pictures drawn in the instance and constraint languages are structurally very similar, so our approach is to factor out the properties common to both languages (bold ovals in Figure 3), and then specialize for each language (dashed ovals). At the bottom we define the EditorTrait which includes all the others; it is the link between the LSL and GIL tiers in the editor specification. In this paper we will walk through only the traits along the spine in the figure. Also, to save space we will typically present only the signature and not the equations in each trait.

Boxes and arrows are the basic objects of any Mir6 picture. Instance and constraint pictures differ only in the attributes of their respective boxes and arrows and in the rules for combining them into pictures. Traits for boxes and arrows are later specialized to distinguish between instance and constraint pictures.
A box object has a value of sort B (see Box trait in Figure 4) and has pos and size attributes. 3 We assume a box's position is the coordinates of its bottom left corner, and its size is given as a coordinate pair of its width and height. A box also has a label attribute, which will be customized for instance and constraint boxes, and the boolean attributes thick and starred, needed for constraintpictues. Finally, we use sysname to identify a box. Although we do not require it at this level, we assume that sysnames are unique. Sysnames serve two purposes: to distinguish between otherwise identical objects and to provide an easy way to identify objects in a picture for other operations (e.g., deleting a box).
The Box trait also introduces operators on boxes. The record notation in Larch automatically produces the generator for the record sort: an operator that takes as its arguments all of the attributes of the sort and produces something of the record sort (e.g., mk_B). The record shorthand also generates operators of the form b. f oo and f oo-get s (b,foo-val) for each field f oo (where b is of sort B and/00_va/ has the same sort as the field f 00).

The introduces clause declares additional operators and the equations in the asserts clause give them their meaning.
Move -box and resize_box reset the position and size attributes of their respective box arguments. The reason we even need a copy -box operator as opposed to relying on Larch's built-in equality operator for all sorts is that not all values of all fields are the same when one box is a copy of the other. Having an explicit copy-box operator also makes it convenient to specify default initialization values for certain fields. For example, one issue in the design of the editor was whether or not a copy of a box should have the same label, or should have a default label (the empty string). Thus, we choose instead to write equations only for the fields we require to be the same, and allow the values of the other fields to be specified in another trait or at the interface level.
Boxes in the instance language differ from those in the constraint language in two ways. First, the sort of values for some attributes are different. Namely, an instance box's label is a string whereas a constraint box's label is "box descriptor" -a boolean expression that describes a set of boxes. We handle this difference by using the generic sort LabelSort and then in the InstanceBox trait, we rename LabelSort with the sort Str (for strings) and in the ConstraintBox trait, we rename it with the sort BoxDesc (for box descriptors).

Figure 3: The dependencies of the Mir6 traits
The second difference is that some attributes are meaningful for only constraint pictures and hence are unnecessary for instance pictures. We could avoid this problem if either Larch provided a way to extend (subtype) records or we were willing to use nested records (see Section 4). However, since there are only a few of these attributes, we choose instead to include at the trait level all possible attributes for the two kinds of boxes, and then make assertions at the interface level that place constraints that will distinguish between instance and constraint boxes.
We specify properties on arrows similarly (using Larch's record construct) and omit the details here. TOJDOX and f romJDox are two of many fields in the record for arrows and are used to keep track of the boxes at the arrow's head and tail.

Pictures
A picture is a set of boxes and set of arrows (see Figures 5 -9). To avoid a long and confusing trait, we divide the specification of properties of pictures into three main traits: BasicPicture, Picture, and WFPicture This trait also introduces the operators move -a -box, resize-a_box, delete-box, anddelete_arrow. We use picunion later in traits to perform the higher-level copy operation, boxes and arrows are observer functions on pictures.
The axioms defining these operators are straightforward and given in the standard style of "algebraic" specifications (define the meaning of each non-constructor operator in terms of each constructor operator). We give details of only one here. Copy-picture recurses through the objects in the picture and calls the appropriate copy operator on each object (box or arrow). copy__picture (create^picture (empty_BSet, empty__ASet) ) -create_j?icture (empty_BSet, empty__ASet) , copy_j?icture (create_jpicture (insert (bs, b) , as) ) -insert^box (copy__picture (bs, as) , copy_box (b) ) , copy_picture (create_j?icture (bs, insert (as, a) ) ) == insert_arrow (copy_jpicture (bs, as) , copy__arrow (a) ) , copy_picture (insert_box (pic, b) ) == However, since one well-formedness condition common to both instance and constraint pictures is that all arrows must be attached to boxes, we additionally introduce and define here the operator all_arrows_attached. There are other well-formedness conditions, e.g., arrows must go from user boxes to file boxes in an instance picture, but for this paper, we assume only the arrows-attached property.
We use extract_wf in the editor interface to describe the behavior of the command to copy a set of selected objects, which itself is a picture that may or may not be well-formed. The result of extract-wf (os) is a picture that contains all the objects of os except the "dangling" arrows (i.e., arrows which are not attached to boxes in os).

well-formed (ipic) ~ all-arrows-attached (ipic) & ~ambiguous (ipic).
Finally, most of the editor's procedures work on pictures regardless of whether they are drawn in the instance or constraint language. For example, moving a collection of boxes behaves the same regardless of whether they are instance or constraint boxes. To avoid duplicating the entire interface (e.g., have two separate Move Instance-Picture and MovejConstraint-Picture procedures) we introduce a union sort P to handle both instance and constraint pictures -just as we introduced an Object sort to handle both boxes and arrows. Figure 9 shows the signature for the PicUnion trait. By providing a union sort P and the appropriate operators, the editor's procedures can now work on either type of Mir6 picture. Most of the operators introduced in PicUnion trait deal with coercing to and from values of the picture sort P and values of instance and constraint picture sorts. . Select ed-ob js is the set of currently selected objects in the picture. The remainder of the record describes the current "mode" of the editor (as indicated in the menus): picture-type indicates whether the current picture is an instance or constraint picture, object-type is either box or arrow, arrow-kind is the kind of arrow -syntactic, semantic or containment (relevant for only constraint pictures), and the rest of the attributes are self-explanatory. EditorTrait introduces one additional operator: display-window is left unspecified here, but is intended to be a mapping from the abstract editor value to actual screen pixels.

Interface Level
Now that we have build up a rich trait, we are ready to specify the editor's interface. First we name the module we are speafymg (mi ro.edi t or), and then establish correspondences between types and sort ^X^SSSlE SSnTflT 1 * "* 6 : ery Pr0CedUre

Discussion
Uien two hmuations of Larch that arose out of this specification exercise, and finally some from having done the specification after the implementation g

4.1, Stating Consequences
Larch provides a way to state consequences of a trait's theory through an implies clause. This clause is a good place to document additional assertions about a specificand. As a simple example, and also one that shows the interplay between traits and interface, consider copying objects. At the interface level the CopyOb j s procedure copies only the subset of selected objects that form a well-formed picture. We could have defined the copy-picture operator in the BasicPicture trait to copy only the well-formed subset but decided it was more appropriate to specify this restriction at the interface level, leaving the trait level more general: the copy-picture operator copies all the objects in the picture. We add the following implies clause to the BasicPicture trait in order to record this decision explicitly. Note that we cannot make the stronger statement that copy-picture (p) == p because when objects are copied, not all of the fields, e.g., box labels, are copied. implies for all (p:Pic) size (boxes <copy_jpicture <p) ) ) == size (boxes (p) ) S1ze (arrows (copy_j?icture <p) ) ) == size (arrows(p)) We can also state the strong assertion that a well-formed picture is actually just a graph where boxes are nodes and arrows are edges. Thus, we add the following consequence to the WFPicture trait: Finally, we can take this one step further. If we add to the WFInstancePic trait the requirement that each arrow goes from a user box to a file box and no arrow goes from a box of one type to a box of the same type, we can state as a consequence that a well-formed instance picture is a bipartite graph.

Subtyping Records
We made a critical design decision by representing each Mir6 graphical object as a record. Records conveniently let us associate attributes with each kind of object and give us operators that let us set and get values of each of those attributes. However, die main drawback to using records is that Larch does not permit record "subtyping". It would be more general to define a GraphicalObject trait that introduces a record sort GO with fields like label and then to define Box and Arrow traits, introducing B and A sorts, each as a "subtype" of GO. B would "automatically" have the same fields as GO plus ones like position and size; A would add fields like f ronLbox and to-box. Then we could write for b of sort B, b. label and b. size, and for a of sort A, a. label and a. f rom_box. However, in Larch, if we were to factor out attributes common to all Mir6 objects into a GraphicalObject trait, we would be forced to use nested records in the records in the Box and Arrow traits and to write b.go. label or a. go. label where go is the field name of sort GO.
Instead, we decided to avoid nesting records entirely since the resulting specifications would be less readable. However, this decision forced us to include attributes of records in some traits that make sense only for subsequent uses of that trait. For example, the Box trait's record has a thick attribute that makes sense only for constraint, not instance, pictures. This land of problem and solution is well-known in the "object-oriented" community; not until writing this specification did we see that record subtyping would not only be convenient, but lead to better specifications.

Union Sorts
Sort checking is invaluable in Larch, but one place where it gets in the way is in the use of unions. In the PicUnion trait, the P sort is introduced to be the union of instance and constraint pictures. In the editor, it does not matter we have an instance picture or a constraint picture, we just want to select an object or copy objects. So, if the operators copy .picture: P -> P, copy-picture: IPic -> IPic and copy-picture: CPic -> CPic are defined, then ideally we would like the constraints on copy-picture to hold regardless of whether the picture is an instance or constraint. Instead we are forced to first determine whether the picture is an instance (or a constraint) picture and then coerce the picture to be an instance (or constraint) picture so that we can apply the appropriate, more specific, copy operator, and finally, coerce back the result into the more general picture sort, P: if is_instance(p) then to__P (copy_picture (as_instance (p) ) ) else % is__constraint to_P (copy_j?icture (as_constraint (p) ) ) Another variation of this problem arose from specifying the change-att r operator in the Picture trait. We would like to specify change-attr with the following simple equation: change-attr (obj, label, value) == label-gets (ob j, value) but we cannot for two reasons. First, an LSL operator name is an unstructured identifier -label-get s really stands for a set of possible identifiers depending on what the actual name of label is in the left-hand side of the equation above. Instead, we are forced to use a large if-then-else statement to cover each possible label. Second, the ob j and value parameters to change-att r are union sorts. As argued above, we need to do explicit coercion between the union sort and the specific sorts of the union (and vice versa) in order to achieve the intended effects of the above equation. Hence, change_att r becomes one big two-layered if-then-else clause, first on object sort (box or arrow), then on label name. For each valid object/label pair, there is a clause to do the appropriate coercions and assign the value to the appropriate label: change__attr (obj, label, value) == if (is_box(obj)) then if (label=pos) then to_Object (pos__gets (as__box (obj) , as__cp (value) ) ) else ...

Implications of Specifying After Implementing
We began writing formal specifications in parallel with designing the Mir6 languages and designing and implementing of the editor. We wrote three major versions where the last version (this one) was written after the implmentation was running. The current version itself went through at least six minor iterations. Writing a formal specification after an implementation has two obvious implications. One is that the specification tends to be biased towards the implementation; the other is that places for improving the implementation become embarrassingly evident. We found both to be true in our case.
Having already implemented the editor before completing the specification, we had a concrete model of the languages and editor in mind. This model led to an implementation-biased initial specification. In each subsequent iteration we removed some of the "implementation details." We believe the final specification is relatively unbiased, but that we would have taken fewer steps to get where we are had we written more of the specification before the implementation.
One example of implementation detail we removed is in the definition of well-formedness. In the implementation of the editor, the condition that all arrows be attached to boxes is enforced automatically by checking and reestablishing certain constraints on arrow objects. So, in a previous version of the specification, the arrow sort A had fields for the coordinate positions of the arrow's head and tail. The WFPicture trait defined an operator called ad just-arrows that took a picture and a box, and based on the position of the boxes at the arrow's head and tail, algorithmically recomputed the new coordinates. But at a more abstract level, coordinate information for an arrow is not important; we only need to know which boxes are at its head and tail. As a result, we define an abstract well-formedness property (e.g., all arrows are attached) at the trait level and require through an interface invariant that all editor procedures enforce this property. No where in the current specification do we ever give a precise algorithm for enforcing well-formedness. It would be up to the implementor to decide whether and how arrows need to be adjusted to maintain well-formedness.
The second observation from doing the specification after the implementation is that doing the specification earlier would have resulted in a better implementation. One example to support this argument is our experience in implementing multiple selection in the editor. The initial implementation of the editor allowed at most one object to be selected at any time. While working on the specification we decided to extend the editor to allow selecting multiple objects. So before coding multiple selection, we wrote a mostly formal specification defining precisely what the effects of each editor operation would be on the set of selected objects. The result was an implementation of multiple selection that was clean, consistent and relatively easy to add to the editor.

Related Work
The largest collection of Larch traits is the Larch Handbook[GHW85] (990 lines). The extended example in [GGH90] specifies some traits for a simple windowing system. Larch has also been used to specify properties of objects in a transaction-based distributed system [Win88]. To our knowledge, our specification is the largest Larch specification ever written and the only written for a "real" system.
In our specification, we assume that details about how Mir6 pictures are represented on the screen and what keyboard and mouse inputs activate the specified procedures are specified elsewhere. This in itself is a difficult problem, but has been addressed by others ([GGH90] and [Bow89]). Thus, we chose to focus at the next level of detail: properties of pictures and the editor. We have not seen any other specifications that deal with an interactive window application.

Further Work
As with any specification (or program), ours can be improved. We can generalize our traits to be suitable for more general graphical editors. We can extend the traits and interfaces to describe the more intricate behavioral aspects of the editor (e.g., other menu operations). However, the more interesting and challenging work we would rather pursue is to do mechanized proofs given that we have a formal specification and the Larch Prover [GG89]. There are two kinds of proofs we could perform: showing additional properties hold (e.g., the consequences discussed in Section 4.1 or the well-formedness invariant), and showing that the implementation of the editor satisfies our specification. There is evidence [GGH90] that the Larch Prover could "easily" be used to do the first kind of proof; doing the second kind would entail extensions to the Larch Prover itself.