Automated Generalization and Refinement of Code Templates with Ekeko/X

Code templates are an intuitive means to specify source code snippets of interest, such as all instances of a bug, groups of snippets that need to be refactored or transformed, or instances of design patterns. While intuitive, it is not always straightforward to write a template that produces only the desired matches. A template could produce either more snippets than desired, or too few. To assist the users of EKEKO/X, our template-based search and transformation tool for Java, we have extended it with two components: The first is a suite of mutation operators that simplifies the process of modifying templates. The second is a system that can automatically suggest a sequence of mutations to a given template, such that it matches only with a set of desired snippets. In this tool paper, we highlight the key design decisions in implementing these two components of EKEKO/X, and demonstrate their use by walking through an example sequence of mutations suggested by the system.


I. INTRODUCTION
Code templates are ubiquitous in search and transformation tools, and can be used to concisely describe various kinds of source code snippets of interest. The learning curve to start writing templates also is quite low, as they are written in terms of concrete source code. However, code templates can still prove difficult to specify. A template may either be too general and produce false positives, or it could be too specific and result in false negatives. Additionally, the templates we consider not only specify syntactic constraints to describe a set of snippets, but they can also use several semantic constraints. While offering additional expressivity, it emphasizes the need for some type of assistance when writing templates.
In this tool demonstration paper, we present an extension to EKEKO/X [4], our template-based search and transformation tool for Java. 12 This extension consist of two components, both aiming to assist EKEKO/X users: the first component is a suite of mutation operators, which gradually grew in our own experience of writing templates. Rather than editing a template manually without any indication whether or not the template is valid, they can also be edited using our suite of mutation operators. This suite was designed such that an operator can only be applied when it leads to a syntactically valid template. There also are two types of operators: atomic and composite operators. Where an atomic operator performs a local change in a template, composite operators can affect multiple parts. Such composite operators can avoid making accidental errors in performing common scenarios, such as abstracting away the name of a particular variable declaration and all of its uses.
The second component to assist EKEKO/X users is a searchbased [9] system that can automatically generalize or refine a given template such that it matches only with a desired set of snippets. The idea is that the user can first write a rough draft of a template, and use the system to suggest a sequence of mutations that would bring the template closer to a solution.

II. OVERVIEW OF EKEKO/X
The EKEKO/X program transformation tool is built on top of the EKEKO [6] meta-programming library, which provides a logic API to perform code searches and transformations at the level of Java ASTs. As reasoning about code in terms of AST nodes requires a certain level of expertise, EKEKO/X was created to specify program searches and transformations in a more intuitive manner, in terms of code templates: A template is a snippet of Java code, in which parts can be replaced by wildcards and metavariables, and annotations called directives can be added. These constructs are used to either generalize or refine parts of a template. The process of matching a template essentially involves converting the template into a set of logic EKEKO constraints, and to find all concrete snippets of Java code that satisfy all constraints. EKEKO/X also provides support for template groups, in which multiple templates can be related to each other, making it possible to describe groups of related snippets. Consider the following example: This template group contains two templates: The first template matches with all calls to methods called acceptVisitor, and the second then looks for the corresponding method declarations. The use of an ellipsis indicates a wildcard. The acceptVisitor call is wrapped in square brackets, followed by @[(equals ?invocation)]. This notation indicates that the acceptVisitor method call is annotated with an equals directive, which binds the call to the ?invocation metavariable. To link the method call to its declaration, the invoked-by directive has ?invocation as its operand. Figure 1: Overview of the EKEKO/X template editor A screenshot of EKEKO/X's user interface is given in Fig. 1, in which our example is shown in a template editor (see [1] in Fig. 1). To modify the template, the user first selects a part to be modified, either in the textual view (top part of [1]) or the tree view (bottom of [1]). Next, a mutation operator is chosen in the list of operators (see [2]), operands are filled in (if any), and the operator can be applied. Note that the list of operators is context-sensitive to the selected part, which ensures the template remains syntactically correct.
At any time, the user can match the template and see which snippets it finds (see [3]). In case of our example, the match results contain all acceptVisitor declaration-call pairs found in all EKEKO/X-enabed Java projects.

III. MUTATION OPERATOR SUITE
A mutation operator, or simply operator, performs a modification in a template group. In EKEKO/X's implementation, a template is represented as a Java abstract syntax tree (AST), where nodes can be decorated with a list of directives and their operand values. As such, operators can modify an AST's structure, or add/remove directives to nodes. Table I presents  a list of representative atomic operators (top half of the table) and composite operators (bottom half). Each operator can have its own list of operands. Each operator can only be applied to certain subject nodes, as described in the Subject column of Table I. The subject node is the "part of the template" that is selected by the user, effectively corresponding to an AST node in the template. As the subject of each operator is constrained to certain types of nodes, this also makes it possible to create EKEKO/X's operator list context-sensitive.
While not shown here, the "Add directive" operator can add several (30+) different types of directives to a node, among which are directives to relate a method call to a method declaration (invokes), to bind nodes to a metavariable (equals), relate the subject node to an ancestor node (child/child * /child+), relate a method to an overriding method (overrides), etc. The "Add directive" operator in itself is quite straightforward, as it only attaches a directive to the subject node. The actual behavior of each directive only takes effect while matching a template, where each directive specifies which logic constraints need to be satisfied.

IV. SUGGESTING TEMPLATE MUTATIONS
Our second component to assist EKEKO/X users is a searchbased system that can automatically generalize and refine a template group, such that it matches only with a given set of desired code snippets. This system is based on a singleobjective genetic search algorithm, and it makes use of of our suite of mutation operators.
Genetic algorithm -In short, the algorithm consists of a loop that "evolves" a set of template groups, with the aim of approaching a solution template, with a fitness value of 1. The fitness value indicates "how good" a template group is, i.e. how close its results approach the desired set of snippets. Initially, the set of template groups to be evolved only consists of the template group we would like to improve. In each iteration of the evolution loop, we produce a new set/generation of template groups based on the previous one, in which S tournament selections are made, and M mutations (S and M are user-defined constants).
A tournament selection will choose one template group by randomly picking R (user-defined) groups from the current generation, and returning the one with the best fitness out of those R. Next to making S selections, M mutations are performed: This is done by first selecting a template group (also using tournament selection), and subsequently applying a random operator, chosen from our suite of mutation operators. This operator is applied to a random template from the template group, applied to a random applicable subject node, with random operand values (if any). The most common type of operand is a metavariable, so we can randomly choose among the metavariables already present in the template group, or generate a new metavariable. Once all S selections and M mutations are made, they are combined to form a new generation of template groups. If any of the template groups produces only the desired snippets, a solution is found and the algorithm stops. Otherwise, we repeat the selection and mutation process for the new generation.
Fitness function -To compute how good a template group is, a fitness function is required. Ours is defined as follows, in terms of template group t and the set of desired matches m (where |m| = n): The fitness function consists of two components, F 1 and partial , where each is associated with a weight (W 1 and W 2 , user-defined). The F 1 component is the traditional F-score, which considers how many true positive, false positive and false negative matches are produced by t. This results in a number in [0,1] where a value of 1 indicates that t only produces the matches in m. While this accurately describes our goal, F 1 is quite coarse-grained in the sense that F 1 only changes when a template group produces an additional (un)desired match. This is why the more fine-grained partial function is introduced. In short, the partial score measures for each desired match how many template nodes could be mapped to an AST node in the desired match (matchCount), out of all template nodes (nodeCount). The intuition here is that we want to measure how close a template is to producing each desired match, i.e. a template that almost produces a desired match is better than one that is far off.
Usage -The user interface to access the automated generalization and refinement system is presented in Fig. 2. Consider the scenario where the user is currently working on a template group to detect all instances of the Template Method design pattern in a Java project. He/she currently has a template group that only describes one instance of the design pattern, and would like to invoke our system to suggest an improved template. The first step consists of selecting the template to be improved ( [1] in Fig. 2). The next step is to specify the complete list of matches that are desired (see [2]), which would be all instances of the Template Method pattern. These matches can be gathered by selecting code snippets one by one, or they can be added more quickly by taking the matches of other (incomplete) templates. In case too many snippets were added to the list, they can simply be removed.
Once the list of desired matches is specified, the algorithm can be started. In our example run, the following solution was produced after 29 generations (with S=8,M =22,R=7,W 1 =0. 6 As each new generation is produced, the results view (see [3]) is updated, showing the best template group of the new generation, its fitness, F 1 and partial values. A fitness chart is updated as well, shown in Fig. 3. This chart shows the interplay between the two fitness components. The partial score gradually pushes the templates towards producing more true positives, but does not take into account false positives, whereas F 1 does. This is why partial can increase while F 1 drops, as seen in generations 9 and 20.
When the solution template is almost found, there is a jump in both components (around generation 25). If we are about to produce an additional desired match, the template may be sufficiently generalized that it will actually describe several additional desired matches at once.
Aside from inspecting the best fitness values per generation, each template group that was generated in the search process can be inspected in detail. It can be opened in a template editor, if the user wants to resume the manual editing process. A template group's mutation history can also be inspected, showing all mutations that were used to arrive at the selected template group, starting from the initial template group. The history of our solution template is given in Fig. 4. What can be deduced from this figure is that, early on, the algorithm added several wildcards. While in this stage, most other directives would either reduce the fitness value or keep it unchanged, Figure 4: Inspecting the mutation history of a template group wildcards can improve the partial score. When a wildcard replaces a non-leaf node, the total number of nodes in the template drops. This causes the partial score to rise, even if the template group still produces the same matches.
At some point, adding too many wildcards would increase the number of false positives. This is why the algorithm will then tend to choose a refining operator to reduce the false positives again. This is apparent in generations 20 and 27, where an invokes and an overrides directive are added.
Performance considerations -Most time in the algorithm is spent on computing fitness values, which needs to produce a template group's matches. To reduce matching time, all template groups in a generation are matched in parallel. As EKEKO/X is written in Clojure, designed with concurrency in mind, this was reasonably straightforward to implement.
To further reduce matching time, it also is possible to reduce the amount of code that needs to be searched. There is the option to only search within the classes containing desired matches. While this significantly improves the amount of time needed to match template groups, the algorithm now might produce a solution with false positives. Nonetheless, it is a sound approach to first apply the genetic algorithm against a subset of the code, then test whether the solution that was found also works against the entire codebase.
V. RELATED WORK Several program search and transformation languages exist that are based on code templates [3], [10], [2], [14]. However, the constraints available in these languages are limited to expressing syntactic and structural characteristics, but not semantic ones. When considering languages that focus solely on program searches [12], [8], [5], these languages do support various semantic constraints, but are not template-based.
With regards to our genetic search approach, several works in the field of program repair make use of genetic search or genetic programming techniques to either generate or evolve patches that fix an instance of a bug [7], [1], [11]. These approaches focus on repairing one instance, without looking for similar instances of the same bug. While our system does not perform any program repairs, templates can be used to describe multiple instances of a bug in one template. In this regard, the work of Meng et al. [13] is more closely related, as its goal is to repair similar changes. Based on two instances of the same bug fix, a transformation is generated that should find and fix all instances of the bug. This approach however does not support interprocedural modifications.

VI. CONCLUSION
In this tool paper we have presented an extension of EKEKO/X, which consists of a suite of mutation operators, and a system that can automatically generalize and refine templates. Current experiments using this system, in which one instance of a design pattern is generalized into a template group that produces all instances, indicate that the system is able to either substantially improve a given template group, or even find a solution that matches only the desired snippets. The main direction of future work is to extend the focus from program searches to program transformations, such that e.g. a transformation that repairs one instance of a bug can be generalized to a transformation that repairs all instances.