Collection-oriented languages

The authors outline, compare, and contrast the collections and operations found in many collection-oriented languages by putting them into a common framework. In the process, many problems that can occur in specifying such languages are elucidated. These languages are ideal for use with massively parallel machines, even though many of them were developed before parallelism. Some extended examples of collection operations in several languages are given. A taxonomy of collections is introduced. Issues examined include the type of elements a collection can contain, whether a collection must be homogeneously typed, and the ordering among the elements of a collection. The apply-to-each form in collection-oriented languages is examined. This form applies a function to each element of a collection. Issues treated include whether the extension of a function over the elements is explicit or implicit and how the extension is applied to functions with multiple arguments. A variety of languages (including APL, SETL, CM-Lisp, Paralation Lisp, and Fortran 90) are critically compared. >

: Some basic collection-oriented operations 4 the collection. This notation was taken from FP. We call this form an explicit apply-to-each. In SETL and PARALATION LISP, it is necessary to bind a variable name to a representative element of a collection and then apply the negate to this variable. One can think of the expression as stating: "for each e in A, negate e. w We call this form, a binding apply-to-each. In SETL, the form is actually a special case of the more general set/tuple comprehension primitive discussed in Section 4.2.3. What effect do these three forms have on a collectionoriented language? We argue, for example, that implicit apply-to-each interacts badly with overloading of functions based on argument type (see Section 4.2.1).

Non-unary Apply-to-each
The second example in Table 2 demonstrates the case of applying the function + (addition) over the corresponding elements of two collections and then adding the constant 2 to each element of the results. Unlike the first example, the function takes more than a single argument, and one of the arguments is not a collection; it is a scalar. From this example, two new issues arise: element correspondence and argument extension. Table 2 have a different way of defining apply-to-each on multiple argument functions. APL requires the two arguments be of equal length. SETL requires the use of an explicit index set. PARALATION LISP requires the two arguments come from the same paralation; this is an even stronger requirement than being of the same length. CM-LlSP puts no requirements on the relationship between the two arguments-it adds elements which have the same key.

What does the phrase "corresponding elements of two collections" in the previous paragraph mean? Intuitively, we can think of lining up the two collections and applying the function + at each location. But what if the two collections cannot be "lined up" (they may be different lengths)? Or, what if the collections are not ordered (as with sets in SETL)? All the languages considered in
Another issue raised by this example is how to define what it means to "add a scalar to a collection"? There must be some mechanism for specifying that the scalar should be treated as a collection, each of whose elements has that particular value. We call this argument extension. In CM-LlSP this is accomplished via the same a form used for specifying an applyto-each; this is called explicit extension. In APL scalars are automatically extended as needed; this is called implicit extension. Implicit argument extension can lead to ambiguity when the collections to be extended are nested (see Section 4.2.2).

Rearranging Elements
In addition to having some way of applying a function to each element of a collection, collectionoriented languages supply operations that can rearrange the elements of a collection. The third example in Table 2 illustrates a permute operation. The permute operation rearranges the elements of a collection according to a collection of indices. Permute is an example of an operation that has different definitions in different languages: APL performs a permute that is the mathematical inverse of that performed by CM-LlSP. Permute is a special case of indexing elements in APL, FORTRAN 90 and SETL and the permutation indices all refer to the result collection. In CM-LlSP these indices refer to the argument collection. In languages like CM-LlSP with more complex collection types, the permute can be generalized greatly (Section 4.1.2).

Nested Collections and Operators
The final example demonstrates the utility of nested collections. A is a collection of collections, and the sum of the elements in each of these subcollections is needed. Not all collection-oriented languages allow nested collections: FORTRAN 90 and APL both do not permit them, although APL2 does. For computing with nested collections, those languages supporting them include explicit extensions to avoid possible ambiguities. Example 4 combines the apply-to-each concept with the operation of summing the elements of a collection. The latter computation is described in all of these languages as a plus reduction. Reduction (Section 4.1.1) is one example of a high-order function: it takes as arguments both a combining function and a collection to be reduced. Each of the collection-oriented languages in these examples, except FORTRAN 90, possess such operators.

Collections
The exact definition of a collection varies greatly from language to language. The simplest and most general characterization of a collection is that it is a group of objects viewed as a whole [16]. This captures the intent of our usage: collection-oriented programming languages should be able to encapsulate a group of objects into a collection and then manipulate this conglomeration of elements in useful ways. It is the methods of encapsulation and manipulation that are interesting. This section surveys the different kinds of collections that collection-oriented languages support.

General Classification of Collections
We categorize collections along three axes: what kinds of elements are allowed, whether or not these elements may be of mixed type, and whether the elements are implicitly and/or explicitly ordered within the collection.

Elements of Collection
The greatest distinction between the kinds of collections that a language supports is that between simple collections and nested collections. The elements of a simple collection may not be collections themselves. languages that support only simple collections include APL and FORTRAN 90. Nested collections are the most general type of collection: they may have collections as elements. Nested collections are useful for representing data more complex than vectors or sets such as trees and segmented vectors [3]. Languages permitting nested collections include APL2, SETL, CM-LISP, and PARALATION LISP. A useful subclass of the simple collections are the structure collections. An element is a structure if it has a fixed number of fields and the only operations that can be performed on the element are extraction and insertion of a field (for example, a PASCAL record, or a C or LISP structure). Both CM-LISP and PARALATION LISP support structure collections.

Type Homogeneity of Elements
An issue orthogonal to the types of the elements allowed in a collection is the question of whether and how elements of differing types may be combined in the same collection. A  In some languages the length of a collection might be part of the type of the collection. Pascal has no type consistent with each element of the example above; array [ 1.. 3 ] of int eger is fine for the first three elements, but not for the fourth. In such a language this collection is heterogeneous: the last subcollection is of a different length from the others and hence of different type. The type systems of most collection-oriented languages are not this stringent and to a lenient language this collection is just a vector of vectors and therefore homogeneous. To push this example further, if 16 bit and 32 bit integers were of distinct types in a language (say integer and long integer), then the third subcollection might itself be considered heterogeneous. 2 Table 3 shows the type homogeneity of some collection-oriented languages.

Collection Ordering
Another important property of collections is whether or not there is an ordering associated with the positions of elements in the collection: can we say that one element comes before another in the collection? This is independent of the value of the element, and depends only on position within the collection. For example, the elements of an array in C are ordered (by their index), while the elements of a mathematical set are not. The nature of the ordering of a collection has a strong influence on the collection operations that can be defined on it (see Section 4.1.2).
We distinguish between four classes of collection orderings: Unordered Unordered collections are essentially sets of elements, except that sets do not allow repetitions and a general unordered collection does. 3 Sequence-Ordered Sequence-ordered collections are vectors. These are also called linearlyordered collections.
Grid-Ordered Grid-ordered collections are arrays of arbitrary dimension. A 1-1 correspondence exists between ordered tuples of integers in some interval and elements of the collection. 2 These kind of considerations can allow a compiler to use knowledge of the homogeneity of a collection to generate more efficient code. It can recognize homogeneous collections and maintain type knowledge of the entire collection. In particular, if a language only admits homogeneous collections, all collections can be given a succinct type name (of length at most the depth of nesting of the collection). The type information can be calculated once for the entire structure. In a strongly typed language, the alternative is to infer the type of each element of the collection as we need it; the type of the collection may then be as complex as the data structure itself. For example, if we want to perform an element by element add with two collections of elements of type short integer, it might be advantageous to call a separate routine that adds collections of shorts together, rather than call a general routine that adds two collections and has to infer the type of each element as it procedes. One implementation of PARALATION LISP by the second author takes advantage of this idea for a sizable efficiency gain. 3 The distinction here is one of sets vs. multisets.
8 Key-Ordered Key-ordered collections are indexed via an arbitrary mapping function that has keys as its domain and values as its range. Some language further require all the keys of a collection to be unique.
Unordered sets are the foremost collection type in SETL. Sequence-ordered collections are the basic data structure of LlSP-like languages. Grid-ordered collections are the basic data structure of APL-like languages. Key-ordered collections are the most general since the domain of the mapping function can be a sequence of integers (giving a sequence-ordering) or can be tuples from a sequence of integers (for a grid-ordering). Table 3 shows the orderings supported by the languages under consideration and the names each language gives to these collections.
An additional kind of collection found in some functional languages is the infinite collection. In languages supporting either lazy or normal order evaluation it is possible to create collections that are potentially infinite in extent. Implementations of this collection generally only compute actual elements of the collection as they are needed by the functions acting upon them. As long as the manipulations deal with the collection itself, and not the individual elements, the implementation is free to avoid computing those elements. Languages supporting this feature include MIRANDA [26], HASKELL, and SCHEME [23]. Although the infinite collections in these particular languages are linearly ordered, this does not have to be true in general. For instance, it would be relatively easy in any of these languages to create the SETL-style set of the natural numbers: the collection would pick a new "random" natural number each time it is accessed, and would guarantee no repetitions. Similarly, it would be easy to build infinite collections of any of the other forms discussed in this section. 4 Table 3 summarizes the differences that exist between the collections supported by some collection-oriented languages. This section explores the distinctions between individual languages in greater detail.

Language Specific Collections
The heterogeneous nestable array is the fundamental collection in APL2. This contrasts with APL, which only allows homogeneous simple arrays. The introduction of nested collections into APL2 allows multiple arguments and multiple results from user defined functions, as well as the combination of numeric and character data in vectors. APL2 also adds a great many new operators to APL for handling nested collections. The following APL2 collection contains vectors, strings and arrays (the boxes indicate nesting): index (also called the key) and value can be any LISP object (including another xapping) but the indices in a given xapping must all be distinct. One example of a xapping is { b-*boat c-+car a-•apple }.

CM-LlSP also provides syntactic sugar for vectors: [one two three] = { l-*one 2-•two 3-•three }.
This abbreviated form is called a xector. A second shorthand notation is available to describe sets:

{ one two three } = { one-•one two-•two three-+three } This representation of a set is called a xet.
PARALATTON LISP adds a new data type to COMMON LISP [22], theparalation, a contraction of "parallel" and "relation". A paralation consists of two parts: a fixed number of sites which are numbered from 0 and a dynamic number of fields. Each field of a paralation has a value for each site of the paralation. It is helpful to think of a paralation as a database and a field as holding a piece of data for every element of the database. A typical paralation with two fields looks something like: Fields are named (in this case, name-field and year-field) and field values may be heterogeneous and can be fields themselves-allowing nested collection. Paralations can be created in two ways. The make-paralation function creates a new paralation of given length and one field, the numbers from 0 to (length -1). Alternatively, a field of a new paralation can be created using COMMON

Collection Operations
A collection-oriented language is characterized by two traits: the kinds of collections supported by the language and the operations allowed for those collections. The previous section focused on the first of these issues; we now turn to the second. This section studies some collection-oriented operations and explores their uses. Different languages describe these operations in slightly different ways. Section 5 discusses specific languages in detail.
There are two classes of collection operations: aggregate operations and apply-to-each forms. The aggregate operations of a language are those operations that operate on a collection as a whole. These operations take collections as arguments, compute some function of the collections and return a scalar or new collection (Section 4.1). The apply-to-each construct acts as an iterator: given a collection as one argument and a function as another argument, it returns the result of the function being applied to each the elements of the collection separately (Section 4.2). This distinction can be quite fuzzy; what appears to be an applyto-each in one language may be an aggregate operation in another. We emphasize that the space of collection-oriented operations has no simple topology. The classification scheme used in this paper is only one of several possible.
The collection operations described here should be thought of as abstract mathematical constructs. The result of performing a collection operation should depend only on the semantics of that operation in the language and not on the language's implementation. This statement may seem obvious, but it has non-obvious consequences. For example, whether or not a particular program is run on a parallel or serial machine, or whether a particular algorithm is used in the implementation of an operation should not affect the results of the calculation. Parallel implementation of an apply-to-each would be disallowed or have unpredictable results if the apply-to-each causes multiple side-effects to the same location. Problems such as this have resulted in the historic difficulty of defining language semantics for parallel machines.

Aggregate Operations
The aggregate operations are those operations that explicitly compute on a collection as a whole; this computation cannot be broken down into elementwise application of other 11 functions. There are many types of collection operations and they are categorized here according to the kinds of collections on which they operate.

4.L1 Generic Operations
Generic operations are the most general aggregate operations. There are few restrictions concerning the kinds of collections to which they can be applied. All operations discussed in this section apply equally well to both ordered and unordered collections.
Perhaps select is closely related to the pack operation described in the next section and can be implemented in terms of it if the collection argument is ordered. We defer further discussion until then.
A powerful generic operation is restricted reduction. The reduction operator (reduce) takes as arguments a collection C and a binary function f that is associative and commutative, The requirement that f be associative and commutative guarantees that the result of a restricted reduction be the same, regardless of the manner by which it might be evaluated. This means a compiler is free to convert the first example to (f (fab) (fed)), or (f b (f (f c a) d)) 5 Disclaimer: We will be using LISP notation throughout this paper in our non-language-specific examples. This is merely a syntactic convenience to avoid ambiguity and is not meant as an endorsement. or any other grouping or ordering. This allows possible parallel implementation using a treelike evaluation. Typical uses of reduce are: summing the elements of a collection, finding the Tninvmimn or maximum element of a collection, and using logical reduction to determine the and or or of a boolean collection. Some languages may impose restrictions on the functions f over which reductions may be performed: APL and FORTRAN 90 only allows a fixed set of reductions, while APL2, CM-LISP, PARALATION LISP, and SETL allow reduction on any binary function. See Section 5 to see the way different languages support reduction on generic collections.

4.L2 Ordered vs Unordered Collections
An important distinction between ordered and unordered collections is that the former makes possible an unambiguous correspondence between elements of two differing collections. If two collections are ordered in the same manner and are of equal sizes, we may "superimpose" one collection on top of another by merging elements with the same index or key. Such collections are said to be of the same shape or to be conformable. In this section, we assume both collection arguments to a binary collection operation are conformable. In Section 4.2, we consider what happens when these conditions are not met and discuss generalizations of "conformable". pack can be used to implement select by applying a predicate to all the elements of some collection to generate the booleans and then pulling out the elements satisfying the predicate. An important difference between these operations is that pack depends on the ordering of the collection arguments while select does not. permute Thus far, none of the operations considered rearrange the order of the elements in a collection. This is accomplished with the permute function. The arguments to permute are two ordered collections of the same shape: the data collection and the index collection. The latter is a collection whose elements are some permutation 7

of the indices for the first collection. The result of the operation is a collection in which the index collection specifies where the corresponding element of the data collection goes in the result collection. Expressed in C syntax, out[P[i]] -in[i],
where P is the permutation vector and in and out are the input and output vectors respectively. For example: 8

(permute [1 e a s t s] [6 5 2 3 1 4]) =» [t a s s e 1].
The inv-permute operator works the same way as permute, except that the index vector specifies where the corresponding result element comes from, instead of where it goes:

inv-permute can be generalized to the case where the index collection is not a proper permutation of the indices of the data collection: each element can be put into its proper position as long as the elements of the index collection are a subset of the indices of the data collection: (inv-permute [m i c k e. y] [2 1 1 2 4 5]) => [i m m i k e] (inv-permute [m i c k e yj [1 5]) =• [m e] (inv-permute [m i c k e y] [1 2 3 4 5 6 1]) => [m i c k e y m].
In a language with key-ordered collections, the permute and inv-permute operators can be defined analogously. The index collection of a permute is a collection of items whose keys are the same as the keys of the data and whose values are all distinct. The result is a collection whose keys are the values of the index collection and whose values are the values of the data collection and the pairing up is done according to the keys of both collections: 7 No index is repeated and all are present. 8 We use the convention that indices start at 1.

(permute {l-a 2-b 3-c} {1-2 2-3 3-1}) => {1-c 2-a 3-b}.
This is exactly what we would expect if we viewed sequence-ordered collections as key-ordered collections with their indices as keys. In this formalism, inv-permute just switches the roles of the key and value of the index collection: the result has keys from the keys of the index collection, and has values from the values of the data collection and the pairing up is according to the values of the index collection (compare with above):

(inv-permute {1-a 2-b 3-c} {1-2 2-3 3-1}) => {l-b 2-c 3-a}.
This definition of a key-ordered permute can be extended to cover the cases when the sets of keys for the index and data collections are not the same by simply omitting the keys not in both:

(permute {a-x b-y c-z} {a-5 b-6 d-7}) =• { 5-x 6-y }.
A further generalization can be made to the case where the values of the index collection are not distinct. 9

permute can take an extra argument specifying a way to resolve collisions (elements that are supposed to be moved to locations with the same key). In CM-IiSP this argument is a function and colliding elements are combined with this function: (permute {l-17 2 -19 3-23} {l-l 2 -5 3-1} +) => {1^40 5 -19}
This key-ordered, collision-resolving permute is a powerful operation and generalizes many of the operations given so far. For example: (

15
The scan operator is a generalization of reduce that is only defined on linearly ordered collections. 10 scan returns the prefix computation of an ordered collection C and a binary associative function f: that is, it returns a collection whose ith element is the reduce of the first i elements of C by f. For example. Note that the value of (reduce f A) is the last element of the result. Another issue to consider is that the underlying language may have its own rules governing associativity. Suppose we define reduce on linearly ordered collections so that the combining function is "put in text": Even if the collection is ordered, the additions could still be grouped in differing manners, with correspondingly different results. The above "in text* rule can be used to define an unambiguous result for reduce and scan of linearly ordered collections with non-associative functions by uniformly declaring an evaluation order. This, of course, suffers from the same problem with parallel implementation discussed previously.

Apply-to-each
The apply-to-eaeh forms are the second major class of collection operations. Apply-to-each forms act as iterators by calling for the application of a function to every element of a collection. The result of an apply-to-each operation is a collection whose shape is the same as the argument collection. This kind of operation maps perfectly to the massively parallel programming paradigm. A simple example of apply-to-each is negating each element of a collection (see Table 2, example 1). There are two styles of apply-to-each in collection-oriented languages: extension and binding. In the binding apply-to-each 11 a generic element of a collection is given a name and the computation to be performed on that element is described* This is the method used by PARALATION LISP (the elwise statement) and by SETL  On the other hand, extensions modify the evaluation of a function so that it operates over the elements of a collection. These extensions are specified in one of two ways: implicitly or explicitly. In some languages (APL, for example), extensions are performed automatically if the operation in question needs them in order to be well defined: this is the implicit case. Alternatively, explicit extension requires some mechanism to describe precisely those functions and/or arguments that must be extended. The tradeoff between these alternatives is largely one of convenience and conciseness vs lack of ambiguity Extensions are used by APL, FORTRAN 90, CM-USP, and FP.
Probably the most important difference between collection-oriented languages is whether user-defined functions are permitted as the functional argument to an apply-to-each form. All the languages under consideration in this paper with explicit apply-to-each allow any function, whether primitive or user-defined, to be so used. The languages with implicit applyto-each restrict the functions to a fixed set of primitive operators. It is no coincidence that these languages, FORTRAN 90 and APL, have very primitive type schemes, no polymorphism, and no nested collections; they are able to get away with implicit apply-to-eachs and yet incur no ambiguity.

Function Extension and Unary Functions
First we see what complications develop in the fairly straightforward case of apply-to-each with functions taking a single argument. Consider a simple example: suppose we have a collection A consisting of five integers. What should the value of (square A) be, where square of an integer returns its square? One possible way of defining this result is to create a collection identical to A and then replace each element with its square: The resulting collection has the same shape as the argument collection; if the input is unordered, so is the output. These examples use implicit functional extension: no extra notation is needed to get the apply-to-each of the function. Some languages require an explicit apply-to-each construct to indicate that an apply-to-each operation should be performed. In FP and CM-LlSP explicit apply-to-each is used and both these languages denote apply-to-each operations with the symbol a. This paper borrows their notation. With explicit extension, the square example becomes: Although it leads to increased code size, explicit functional extension is needed to remove possible ambiguity. Consider the square example again and suppose that it occurred in the context of some linear algebra code. In languages that allows operator overloading, it would be reasonable to add a definition of square for vectors that calculates the inner product of an argument with itself. In this case our example evaluates: Explicit functional extension allows whichever behavior is desired to be achieved. The previous example is still valid with this overloading; the a defines which version of square to use.
Particular care must be taken with nested collections in order to avoid ambiguity. Consider a nested collection C of three collections, each of which is a collection of three elements:

C => [[1 2 3] [4 5 6] [7 8 9]].
What should the value of (reverse c) be? There are at least two possible solutions, depending on the level of nesting at which reverse acts: we may apply reverse to the whole collection, or we may go into each element and reverse it. These two cases may be disambiguated with an explicit apply-to-each: The second example should be read as "apply reverse to the elements of C" A binding apply-to-each form can also be used to describe an apply-to-each operation. This is an explicit construct detailing the computation to be performed on each element of the collection. It is similar in form to a loop over the elements of a collection, but there are no explicit loop bounds. In PARALATTON LISP, the binding apply-to-each is denoted with the key-word elwise, which we adopt for this papier, elwise takes a list of pairs and a function body as arguments. Each pair consists of the name of a collection and a dummy name for a representative element of the collection. All collections in a single elwise must be conformable. The function body uses the dummy names as variables. Using this notation, the square example given previously becomes: (elwise ((a A)) (square a)) which should be read as "take each element a of A and square it." The second instance of the nested reverse example is written: ((a A)) (reverse a)).
The set comprehension primitive of SETL and HASKELL is another way to denote a binding apply-to-each. In SETL, the reverse example is expressed by: [reverse a : a in A], which is read as a create a tuple consisting of the reverse of each element a in A." From a purely notational perspective, both kinds of explicit apply-to-eachs, extension and binding, have advantages and disadvantages over their implicit counterpart. The primary advantage is one of clarity: code is quite dean and easy to read, and there is no ambiguity about the operations being performed. Unfortunately, for trivial operations, or when there is no possibility of ambiguity, inserting the extra syntax becomes tedious.
From the perspective of the compilation process, explicit apply-to-eachs are superior to implicit ones. Explicit apply-to-eachs exactly specify the depth of nesting at which a function should be applied. If a language is not strongly typed, it may be very difficult to do the necessary type inference to implicitly extend functions at compile time. This is one of the major difficulties in creating APL compilers: in APL there is no general mechanism for deciding the dimension of an array before runtime so it is impossible for a compiler to generate the correct code for a function call in all cases without making the call so general that it becomes inefficient. It may be necessary to generate code for the various cases and test at runtime [6]. With explicit extension, a compiler can decide how the apply-to-each should be computed, up to possible polymorphism/overloading of function names, even if the exact type of the collection and its elements are not known.
A second issue that we have been avoiding until now is side-effects. The result of performing an apply-to-each with a side-effecting function is problematical. Nowhere have we mentioned an order of evaluation, either implied or actual, for the apply-to-each form; in fact, we have discussed the utility of apply-to-each for specifying data parallelism. To get around this problem a language may explicitly define an ordering for the evaluation of an applyto-each (SETL does this with tuple formers), or it may explicitly say that the result of such an operation is undefined (PARALATION LISP), or it may impose restrictions to the functional arguments (APL).

Argument extension and non-unary functions
The preceding section discussed issues that arise when we only consider unary functions. How can we extend these ideas to n-ary functions? In this case, the primary new issues are argument extension? 2

and comformability. Given an operation like (+ A B), where A and B are collections of integers, what must be the relationship between A and B? What if A is an integer and B is a collection? What if A and B are unordered or nested?
For now we limit ourselves to the binary case, in which the collections are nested lists or vectors. One possible way of defining apply-to-each for binary functions is to proceed in the same manner as for unary functions. This means extending the definition of the binary function so that it takes a collection of ordered pairs as arguments and combines each pair with the function: Examining the definition of binary apply-to-each more closely reveals a few tacit assumptions. First note that the collections under consideration must be ordered. There is an inherent matching-up of indices that cannot occur if such indices are not present. This really is not too undesirable when it is remembered that the primary unordered collection is the set: trying to perform an elementwise addition on the elements of two sets does seem like a primitive set operation (in fact, SETL overloads + to union if its arguments are sets). The issue of performing apply-to-each operations on sets will be considered at greater length later in this section.
Another issue to consider is what happens when the collections being operated upon do not have the same structure. This breaks up into two different cases. The first is when the index/key sets of the collections are not identical; e.g. when the collections are of different lengths. One way to handle this situation is to simply signal a runtime error. Both APL and FORTRAN 90 do this. Languages that do define this operation try to make it meaningful. In particular, on intersection of the index sets, the apply-to-each should have the standard functionality. Thus a new collection should be created whose index set is the intersection of the index sets of the arguments. The values of the answers should be correct for these indices. For vectors of different lengths, this means making a new vector whose length is the minimum of the argument vector lengths and whose elements represent the apply-to-each of the truncated vectors. Another way of looking at this is to say that we argument extend the scalar s by converting it into a collection all of whose elements are s and whose shape is that of the other argument. This is implicit argument extension. As with function extension, there is an explicit version as well: (a* B a7) .
The a may be thought of as specifying that enough copies of the function and scalars are created to match the shape of the collection argument, at which point each function is applied to its arguments. The CM-LISP manual [21] discusses this idea in depth.
It may seem that if a language allows explicit function extension then there is no need for any explicit argument extension. In particular, in the previous example the a 7 seems unnecessary. Since * has been extended, the interpreter or compiler can deduce that 7 must be extended as well. However, just as with implicit function extension, implicit argument extension may sometimes result in ambiguity that may require explicit extension for clarification. An example using nested collections demonstrates this: Each of these results has a very different collection structure. In general without some sort of explicit argument extension, it may be impossible to specify which of these results is desired. Ambiguity can also result if overloaded operators are present in the language. A nested version of the square from the beginning of section 4.2.1 is an example of this. If square is overloaded to compute inner products when given a vector argument, all the problems of the append case crop up, plus the confusion with regard to which square is actually being applied (the vector version or the scalar version).

Set Comprehension
The set comprehension operation is found in SETL, MIRANDA, HASKELL, and other modern functional languages. As demonstrated by the SETL examples in this section, most of the collection operations that are defined on unordered collections can be described with set comprehensions. In this sense, set comprehensions subsume many of the other types of collection operations. They do hoewever have the problem that they cannot easily handle application of functions with more than a single argument.
A typical set comprehension operation looks like:

{e(x) : x in S I p(x)}.
Here S is any set-valued expression, e is a function and p is a boolean predicate. Both e and p are defined on elements of S. The result of this statement is the set of all e (x) with x chosen from all the elements in S that satisfy p. Performing a unary apply-to-each operation is set comprehension without an elimination predicate:

(af X)= [f (x) : x in X] .
How can this operation be implemented in terms of the already discussed collection operations on ordered sets? At first glance, set comprehensions looks just like the result of some sort of binding apply-to-each operation involving if: 13 (elwise <(s S)) (if (p s) (e s) (?))).
The question mark indicates the problem: what does an application of the if evaluate to when p is not true? Whatever is returned gets placed into the result collection, which would be incorrect. The next operation to try is some sort of pack: (ae (pack S (ap S) )).
Unfortunately this does not work: since all collections here are unordered, there is no correspondence between the two arguments of the pack. We must use the select generic operator, which ignores the ordering of its arguments: (ae (select p S)) is correct. For this same reason, pack cannot implement select for unorderd collections.
The set comprehension notation extends naturally to ordered collections (tuples in SETL) and can be useful in this case as well. Unfortunately there are some problems that may occur. For example, to use set comprehension to specify a binary apply-to-each on two tuples of the same length, one cannot do the following: In order to compute the pairwise product of the tuples one must write: [f(X(i), Y(i)) : i in domain(X) ], explicitly using the domain of the sets as indices in the specification. An interesting extension to set comprehensions in SETL would be to allow some notation for implicitly creating a correspondence between tuples (which could be used for binary apply-to-each operations) without resorting to index lists.