SOAR: An Architecture for General Intelligence

Abstract The ultimate goal of work in cognitive architecture is to provide the foundation for a system capable of general intelligent behavior. That is, the goal is to provide the underlying structure that would enable a system to perform the full range of cognitive tasks, employ the full range of problem solving methods and representations appropriate for the tasks, and learn about all aspects of the tasks and its performance on them. In this article we present SOAR, an implemented proposal for such an architecture. We describe its organizational principles, the system as currently implemented, and demonstrations of its capabilities.

Miscellaneous AI tasks: Dypar-Soar: Natural language parsing program (small demo) Version-spaces: Concept formation (small demo) Resolution theorem-prover (small demo) • Multiple weak methods with variations, most used in multiple small tasks: Generate and test, AND/OR search, hill climbing (simple and steepest-ascent), means-ends analysis, operator subgoaling, hypothesize and match, breadth-first search, depth-first search, heuristic search, best-first search. A*, progressive deepening (simple and modified), B* (progressive deepening), minimax (simple and depth-bounded), alpha-beta, iterative deepening, B* Multiple organizations and task representations: Eight puzzle, picnic problem, Rl-Soar Learning: Learns on all tasks it performs by a uniform method (chunking) Detailed studies on eight puzzle, Rl-Soar, tic-tac-toe, Korf macro-operators Types of learning: Improvement with practice, within-task transfer, across-task transfer, strategy acquisition, operator implementation, macro-operators, explanation-based generalization Major aspects still missing: Deliberate planning, automatic task acquisition, creating representations, varieties of learning, recovering from overgcneralization, interaction with external task environment

Preview
In common with the mainstream of problem-solving and reasoning systems in AI, Soar has an explicit symbolic representation of its tasks, which it manipulates by symbolic processes. It encodes its knowledge of the task environment in symbolic structures and attempts to use this knowledge to guide its behavior. It has a general scheme of goals and subgoals for representing what the system wants to achieve, and for controlling its behavior.

SOAR: AN ARCHITECTURE FOR GENERAL INTELLIGENCE
Beyond these basic communalitics. Soar embodies mechanisms and organizational principles that express distinctive hypotheses about the nature of the architecture for intelligence. These hypotheses are shared by other systems to varying extents, but taken together they determine Soar's unique position in the space of possible architectures. We preview here these main distinctive characteristics of Soar. The fxill details of ail these features will be given in the next section on the architecture.

Uniform task representation by problem spaces
In Soar, every task of attaining a goal is formulated as finding a desired state in a problem space (a space with a set of operators that apply to a current state to yield a new state) [49]. Hence, all tasks take the form of heuristic search. Routine procedures arise, in this scheme, when enough knowledge is available to provide complete search control, Le., to determine the correct operator to be taken at each step. In AI, problem spaces are commonly used for genuine problem solving [18,51,57,58,59,72], but procedural representations are commonly used for routine behavior. For instance, problem-space operators are typically realized by Lisp code. In Soar, on the other hand complex operators are implemented by problem spaces (though sufficiently simple operators can be realized directly by rules). The adoption of the problem space as the fundamental organization for all goal-oriented symbolic activity (called the Problem Space Hypothesis [49]) is a principal feature of Soar. Problem-space search occurs in the attempt to attain a goal. In the eight puzzle the goal is a desired state representing a specific configuration of the tiles -the darkened board at the right of the figure. In other tasks, such as chess, where checkmate is the goal, there are many disparate desired states, which may then be represented by a test procedure. Whenever a new goal is encountered in solving a problem, the problem solver begins at some initial state in the new problem space. For the eight puzzle, the initial state is just a particular configuration of the tiles. The problem-space search results from the problem solver's application of operators in an attempt to find a way of moving from its initial state to one of its desired states.
Only the current position (in Figure 1  Likewise, the states in a problem space, except the current state and possibly a few remembered states, do not preexist as data structures in the problem solver, so they must be generated by applying operators to states that do exist.

Any decision can be an object of goal-oriented attention
All decisions in Soar relate to searching a problem space (selection of operators, selection of states, etc.). Figure 1-1 represents the knowledge that can be immediately brought to bear to make the decisions in a particular space. However, a subgoal can be set up to make any decision for which the immediate knowledge is insufficient. For instance, looking back to state SI, three moves were possible: moving a tile adjacent to the blank left right or down. If the knowledge was not available to select which move to try, then a subgoal to select the operator would have been set up. Or, if the operator to move a tile left had been selected, but it was not known immediately how to perform that operator, then a subgoal would have been set up to do that. (The moves in the eight puzzle are too simple to require this, but many operators are more complex, e.g., an operator to factor a polynomial in an algebraic task.) Or, if the left operator had been applied and Soar attempted to evaluate the result, but the evaluation was too complicated to compute directly, then a subgoal would have been set up to obtain the evaluation. Or, to take just one more example, if Soar had attempted to apply an operator that was illegal at state SI, say to move tile 1 to the position of tile 2, then it could have set up a subgoal to satisfy the preconditions of the operator (that the position of tile 2 be blank).

The box in
In short, a subgoal can be set up for any problematic decision, a property we call universal subgoaling.
Since setting up a goal means that a search can be conducted for whatever information is needed to make the decision. Soar can be described as having no fixed bodies of knowledge to make any decision (as in writing a specific Lisp function to evaluate a position or select among operators). The ability to search in subgoals also implies that further subgoals can be set up within existing subgoals so that the behavior of Soar involves a tree of subgoals and problem spaces (Figure 1-2). Because many of these subgoals address how to make control decisions, this implies that Soar can reflect [73] on its own problem-solving behavior, and do this to arbitrary levels [64].

Uniform representation of all long-term knowledge by a production system
There is only a single memory organization for all long-term knowledge, namely, a production system [9,14,25,42,78]. Thus, the boxes in Figures 1-1 and 1-2 are filled in with a uniform production system.
Productions deliver control knowledge, as when a production action rejects an operator that leads back to the prior position. Productions also provide procedural knowledge for simple operators, such as the eight-puzzle moves, which can be accomplished by two productions, one to create the new state and put the changes in place and one to copy the unchanged tiles. (As noted above, more complex operators are realized by operating in an implementation problem space.) The data structures cxaminable by productions -that is, the pieces of knowledge in declarative form -are all in the production system's short-term working memory.
However, the long-term storage of this knowledge is in productions which have actions that generate the data structures.   Soar employs a specialized production system (a modified version of Ops5 [20]). All satisfied productions are fired in parallel, without conflict resolution. Productions can only add data elements to working memory.
All modification and removal of data elements is accomplished by die architecture.

Knowledge to control search expressed by preferences
Search-control knowledge is brought to bear by the additive accumulation (via production firings) of data elements in working memory. One type of data element, the preference, represents knowledge about how Soar should behave in its current situation (as defined by a current goal, problem space, state and operator).
For instance, the rejection of the move that simply returns to the prior state (in the example above) is encoded as a rejection preference on the operator. The preferences admit only a few concepts: acceptability, rejection, better (best, worse and worst), and indifferent. The architecture contains a fixed decision procedure for interpreting the set of accumulated preferences to determine the next action. This fixed procedure is simply the embodiment of the semantics of these basic preference concepts and contains no task-dependent knowledge.

All goals arise to cope with impasses
Difficulties arise, ultimately, from a lack of knowledge about what to do next (including of course knowledge that problems cannot be solved). In the immediate context of behaving, difficulties arise when problem solving cannot continue -when it reaches an impasse. Impasses are detectable by the architecture, because the fixed decision procedure concludes successfully only when the knowledge of how to proceed is adequate. The procedure fails otherwise (i.e., it detects an impasse).. At this point the architecture creates a goal for overcoming the impasse. For example, each of the subgoals in Figure 1-2 is evoked because some impasse occurs: the lack of sufficient preferences between the three task operators creates a tie impasse; the failure of the productions in the task problem space to carry out the selected task operator leads to a no-change impasse; and so on.
In Soar, goals are created only in response to impasses. Although there are only a small set of architecturally distinct impasses (four), this suffices to generate all the types of subgoals. Thus, all goals arise from the architecture. This principle of operation, called automatic subgoaling, is the most novel feature of the Soar architecture, and it provides the basis for many other features.

Continuous monitoring of goal termination
The architecture continuously monitors for the termination of all active goals in the goal hierarchy. Upon detection, Soar proceeds immediately from the point of termination. For instance, in trying to break a tie between two operators in the eight puzzle, a subgoal will be set up to evaluate the operators. If in examining the first operator a preference is created that rejects it, then the decision at the higher level can, and will, be made immediately. The second operator will be selected and applied, cutting off the rest of the evaluation and comparison process. All of the working-memory elements local to the terminated goals are automatically removed.
Immediate and automatic response to the termination of any active goal is rarely used in AI systems because of its expense. Its (efficient) realization in Soar depends strongly on automatic subgoaling.

The basic problem-solving methods arise directly from knowledge of the task
Soar realizes the so-called weak methods, such as hill climbing, means-ends analysis, alpha-beta search, etc., by adding search-control productions that express, in isolation, knowledge about the task (i.e., about the problem space and the desired states). The structure of Soar is such that there is no need for this knowledge to be organized in separate procedural representations for each weak method (with a selection process to determine which one to apply). For example, if knowledge exists about how to evaluate the states in a task, and the consequences of evaluation functions are understood (prefer operators that lead to states with higher evaluations), then Soar exhibits a form of hill climbing. This general capability is another novel feature of Soar.

Continuous learning by experience through chunking
Soar learns continuously by automatically and permanently caching the results of its subgoals as productions. Thus, consider the tie-impasse between the three task operators in Figure 1-2, which leads to a subgoal to break that tie. The ultimate result of the problem solving in this subgoal is a preference (or preferences) that resolves the tie impasse in the top space and terminates the subgoal. Then a production is automatically created that will deliver that preference (or preferences) again in relevantly similar situations. If the system ever again reaches a similar situation, no impasse will occur (hence no subgoal and no problem solving in a subspacc) because the appropriate preferences will be generated immediately.
This mechanism is directly related to the phenomenon called chunking in human cognition [63], whence its name. Structurally, chunking is a limited form of practice learning. However, its effects turn out to be wide-ranging. Because learning is closely tied to the goal scheme and universal subgoaling -which provide an extremely fine-grained, uniformly structured, and comprehensive decomposition of tasks on which the learning can work -Soar learns both operator implementations and search control. In addition, the combination of the fine-grained task decomposition with an ability to abstract away all but the relevant features allows Soar to exhibit significant transfer of learning to new situations, both within the same task and between similar tasks. This ability to combine learning and problem solving has produced the most striking experimental results so far in Soar [33,36,62]. PAGE

The Soar Architecture
In this section we describe the Soar architecture systematically from scratch, depending on the preview primarily to have established the central role of problem spaces and production systems. We will continue to use the eight puzzle as the example throughout.

The Architecture for Problem Solving
Soar is a problem-solving architecture, rather than just an architecture for symbolic manipulation within which problem solving can be realized by appropriate control. This is possible because Soar accomplishes all of its tasks in problem spaces.
To realize a task as search in a problem space requires a fixed set of task-implementation functions, involving the retrieval or generation of: (1) problem spaces, (2) problem-space operators, (3) an initial state representing the current situation, and (4) new states that result from applying operators to existing states. To control the search requires a fixed set of search-control functions, involving the selection of: (1) a problem space, (2) a state from those directly available, and (3) an operator to apply to the state. Together, the task implementation and search-control functions are sufficient for problem-space search to occur. The quality and efficiency of the problem solving will depend on the nature of the selection ftmctions.
The task-implementation and search-control functions are usually interleaved. Task implementation generates (or retrieves) new problem spaces, states and operators; and then search control selects among the alternatives generated. Together they completely determine problem-solving behavior in a problem space.
Thus, as Figure 2-1 shows, the behavior of Soar on the eight puzzle can be described as a sequence of such acts. Other important functions must be performed for a complete system: goal creation, goal selection, goal termination, memory management and learning. None of these are included in Soar's search-control or task-implementation acts. Instead, they are handled automatically by the architecture, and hence are not objects of volition for Soar. They arc described at the appropriate places below.
The deliberative acts of search-control together with the knowledge for implementing the task arc the locus of intelligence in Soar. As indicated earlier in Figure 1-1, search-control and task-implementation knowledge is brought to bear on each step of the search. Depending on how much search-control knowledge the problem solver has and how effectively it is employed, the search in the problem space will be narrow and focused, or broad and random. If focused enough, the behavior is routine.   (2) objects, such as goals and states (and their subobjects): and (3) preferences that encode the procedural search-control knowledge. The processing structure has two parts. One is the production memory, which is a set of productions that can examine any pan of working memory, add new objects and preferences, and augment existing objects, but cannot modify the context stack. The second is a fixed decision procedure that examines the preferences and the context stack, and changes the context stack. The productions and the decision procedure combine to implement the search-control functions. Two other fixed mechanisms are shown in the figure: a working-memory manager that deletes elements from working memory, and a chunking mechanism that adds new productions.
Soar is embedded within Lisp. It includes a modified version of the Ops5 production system language plus additional Lisp code for the decision procedure, chunking, the working-memory manager, and other Soarspecific features. The Ops5 matcher has been modified to significantly improve the efficiency determining sausfied productions [70]. The total amount of Lisp code involved, measured in terms of the size of the source code, is approximately 255 kilobytes -70 kilobytes of unmodified Ops5 code, 30

The Working Memory
Working memory consists of a context stack, a set of objects linked to the context stack, and preferences. contains four slots, one for each of the different roles: goal, problem space, state and operator. Each slot can be occupied cither by an object or by the symbol undecided, the latter meaning that no object has been selected for that slot. The object playing the role of the goal in a context is the current goal for that context; the object playing the role of the problem-space is the current problem space for that context and so on. The Gl is augmented with a desired state, Dl, which is itself an object that has its own augmentations (augmentaQons are directional, so Gl is not in an augmentation of Dl, even though Dl is in an augmentation of Gl). The attribute symbol may also be specified as the identifier of an object. Typically, however, situations are characterized by a small fixed set of attribute symbols -here, impasse, name, operator, binding, item, and role -that play no other role than to provide discriminating information. An object may   horizontal sequence of objects above the tiles). Each binding points to a tile 3nd a cell: each tile points to its value: and each cell points to its adjacent cells. Eight puzzle operators manipulate only the bindings, the representation of the cells and tiles does not change.
Working memory can be modified by: (1) productions, (2) the decision procedure, and (3) the workingmemory manager. Each of these components has a specific function. Productions only add augmentations and preferences to working memory. The decision procedure only modifies the context stack. The workingmemory manager only removes irrelevant contexts and objects from working memory.

The Processing Structure
The processing structure implements the functions required for search in a problem space -bringing to bear task-implementation knowledge to generate objects, and bringing to bear search-control knowledge to select between alternative objects. The search-control functions are all realized by a single generic control act: the replacement of an object in a slot by another object from the working memory. The representation of a problem is changed by replacing the current problem space with a new problem space. Returning to a prior state is accomplished by replacing the current state with a preexisting one in working memory. An operator is selected by replacing the current operator (often undecided) with the new one. A step in the problem space occurs when the current operator is applied to the current state to produce a new state, which is then selected to replace the current suite in the context. A replacement can take place anywhere in the context stack, e.g., a new state can replace the state in any of contexts in the stack, not just the lowest or most immediate context but any higher one as well. When an object in a slot is replaced, all of the slots below it in the context arc reinitialized to undecided. Each lower slot depends on the values of the higher slots for its validity: a problem space is set up in response to a goal; a state functions only as part of a problem space: and an operator is to be applied at a state. Each context below the one where the replacement took place is terminated because it depends on the contents of the changed context for its existence (recall that lower contexts contain subgoals of higher contexts).   be made on subsequent steps. This is a monotonic process (working-memory elements are not deleted or modified) that continues until quiescence is reached because there are no more elaborations to be generated. 4 The monotonic nature of the elaboration phase assures that no synchronization problems will occur during the parallel generation of elaborations. However, because this is only syntactic monotonicity -data structures are not modified or deleted -it leaves open whether semantic conflicts or non-monotonicity will occur.

The replacement of context objects is driven by the decision cycle.
The elaboration phase is encoded in Soar as productions of the form: The C are conditions that examine the context stack and the rest of the working memory, while the A. are actions that add augmentations or preferences to memory. Condition patterns are based on constants, variables, negations, pattern-ands, and disjunctions of constants (according to the conventions of Ops5 productions). Any object in working memory can be accessed as long as there exists a chain of augmentations and C 2 and... andC m then add A r In practice, the elaboration phase reaches quiescence quickly (less than ten cycles), however, if quiescence is not reached after a prcspccificd number of iterations (typically 100), the elaboration phase terminates and the decision procedure is entered PAG1-18

SOAR: AN ARCIIITI-CTURI-FOR GENERAL INTELLIGENCE and preferences from the context stack to the object. An augmentation can be a link in the chain if its identifier appears either in a context or in a previously linked augmentation or preference. A preference can be a link in the chain if all the identifiers in its context fields (defined in Section 2.3.? x appear in the chain.
This property of linked access plays an important role in work ing-memory management, subgoal termination, and chunking, by allowing the architecture to determine which augmentations and preferences are accessible from a context, independent of the specific knowledge encoded in elaborations. A production is successfully instantiated if the conjunction of its conditions is satisfied with a consistent binding of variables. There can be any number of concurrently successful instantiations of a production. All successful instantiations of all productions fire concurrently (simulated) during the elaboration phase. The only conflict-resolution principle in Soar is refractory inhibition -an instantiation of a production is fired only once. Rather than having control exerted at the level of productions by conflict resolution, control is exerted at the level of problem solving (by the decision procedure).

The decision procedure
The decision procedure is executed when the elaboration phase reaches quiescence. It determines which slot in the context stack should have its content replaced, and by which object. This is accomplished by processing the context stack from the oldest context to the newest (ie., from the highest goal to the lowest one). Within each context, the roles are considered in turn, starting with the problem space and continuing through the state and operator in order. The process terminates when a slot is found for which action is required. Making a change to a higher slot results in the lower slots being reinitialized to undecided, thus making the processing of lower slots irrelevant. This ordering on the set of slots in the context stack defines a fixed desirability ordering between changes for different slots: it is always more desirable to make a change higher up. The processing for each slot is driven by the knowledge symbolized in the preferences in working memory at the end of the elaboration phase. Each preference is a statement about the selection of an object for a slot (or set of slots). Three primitive concepts are available to make preference statements: 5 acceptability: A choice is to be considered. rejection: A choice is not to be made.

desirability: A choice is better than (worse than, indifferent to) a reference choice.
There is an additional preference type that allows the statement that two choices for an operator slot can be explored in parallel. This is a speaal option to explore parallel processing where multiple slots are created for parallel operators. For more details, see the Soar manual [30].

PAGE 19
Together, the acceptability and rejection preferences determine the objects from which a selection will be made, and the desirability preferences partially order these objects. The result of processing the slot, if successful, is a single object that is: new (not currently selected for that slot): acceptable; not rejected; and more desirable than any other choice that is likewise new, acceptable and not rejected. The desirability of the object for the slot is specified by the value attribute of a preference, which takes one of seven alternatives. Acceptable and reject cover their corresponding concepts: the others -best, better, indifferent, worse, and worst -cover the ordering by desirability. All assertions about ordering locate the given object relative to a reference object for the same slot. Since the reference object always concerns the same slot, it is only necessary to specify the object For better, worse, and some indifferent preferences, the reference object is another object that is being considered for the slot, and it is given by the reference attribute of the preference. For best, worst, and the remaining indifferent preferences, the reference object is an abstract anchor point, hence is implicit and need not be given. Consider an example where there are two eight-puzzle operators, named up and left, being considered for state SI in goal Gl. If the identifier for the eight-puzzle problem space is PI, and the identifiers for up and left are 01 and 02, then the following preference says that up is better than left: (preference tobject 01 trole operator tvalue better treference 02 tgoal Gl tproblem-space PI tstate SI) The decision procedure computes the best choice for a slot based on the preferences in working memory and the semantics of the preference concepts, as given in Figure  * The second source of modifications to the decision procedure is incompleteness. The elaboration phase will deliver some collection of preferences. These can be silent on any particular fact, e.g., they may assert that x is better than y, and that y is rejected, but say nothing about whether x is acceptable or not, or rejected or noL Indeed, an unmentioned object could be better than any that are mentioned. No constraint on completeness Primitive predicates and functions on objects, x, y, z, ... current The object that currently occupies the slot acceptable(x) x is acceptable reject(x) x is rejected (x > y) x is better than y (x < y) x is worse than y (same as y > x) (x ~ y) x is indifferent to y (x >> y) x dominates y s (x > y) and ->(y > x)

Basic properties Desirability (x > y) is transitive, but not complete or antisymmetric Indifference is an equivalence relationship and substitutes over > (x > y) and (y -z) implies (x > z) Indifference does not substitute in acceptable, reject, best, and worst, acceptable(x) and (x ~ y) does not imply acceptable(y), reject(x) and (x ~ y) does not imply reject(y), etc.
Default assumption All preference statements that are not explicitly mentioned and not implied by transitivity or substitution are not assumed to be true  select-eight-puzzle-space: If the current goal is solve-eight-puzzle. then make an acceptable-preference for eight-puzzle as the current problem space. define-initiai-state:

Impasse empty(maximal-choices) A reject(current) =» impasse -imutually-indifferent(maximal-choices) =» impasse(maximal-choices)
If the current problem space is eight-puzzle, then create a state in this problem space based on the description in the goal and make an acceptable-preference for this state, define-final-state: If the current problem space is eight-puzzle, then augment the goal with a desired state in this problem space based on the description in the goal, detect-eight-puzzle-success: If the current problem space is eight-puzzle and the current state matches the desired state of the current goal in each cell, then mark the state with success. The final aspect of the task definition is the implementation of the operators. For a given problem, many different realizations of essentially the same problem space may be possible. For the eight puzzle, there could be twenty-four operators, one for each pair of adjacent cells between which a tile could be moved. In such an implementation, all operators could be made acceptable for each state, followed by the rejection of those that cannot apply (because the blank is not in the appropriate place). Alternatively, only those operators that are applicable to a state could be made acceptable. Another implementation could have four operators, one for each direction in which Dies can be moved into the blank cell: up, down, left, and right. Those operators that do not apply to a state could be rejected.
In our implementation of the eight puzzle, there is a single general operator for moving a tile adjacent to the blank cell into the blank cell. For a given state, an instance of this operator is created for each of the adjacent cells. We will refer to these instantiated operators by the direction they move their associated tile: up, down, left and right To create the operator instantiations requires a single production, shown in Figure 2-10. Each operator is represented in working memory as an object that is augmented with the cell containing the blank and one of the cells adjacent to the blank. When an instantiated operator is created, an acceptable-preference is also created for it in the context containing the eight-puzzle problem space and the state for which the instantiated operator was created. Since operators are created only if they can apply, an additional production that rejects inapplicable operators is not required.
An operator is applied when it is selected by the decision procedure for an operator role -selecting an 2. niFSOARARCHITECnjRE PAGE 25 instantiate-operator If the current problem space is eight-puzzle and the current state has a tile in a cell adjacent to the blank's ceil, then create an acceptable-preference for a newly created operator that will move the tile into the blank's ceil. operator produces a context in which productions associated with the operator can execute (they contain a condition that tests that the operator is selected). Whatever happens while a given operator occupies an operator role comprises the attempt to apply that operator. Operator productions are just elaboration productions, used for operator application rather than for search control. They can create a new state by linking it to the current context (as the object of an acceptable-preference), and then augmenting it. To apply an instantiated operator in the eight puzzle requires the two productions shown in Figure 2-11. When the operator is selected for an operator slot, production create-new-state will apply and create a new state with the tile and blank in their swapped cells. The production copy-unchanged-binding copies pointers to the unchanged bindings between tiles and cells.

create-new-state:
If the current problem space is eight-puzzle, then create an acceptable-preference for a newly created state, and augment the new state with bindings that have switched the tiles from the current state that are changed by the current operator, copy-unchanged-binding: If the current problem space is eight-puzzle and there is an acceptable-preference for a new state, then copy from the current state each binding that is unchanged by the current operator. The seven productions so far described comprise the task-implementation knowledge for the eight puzzle.
With no additional productions, Soar will start to solve the problem, though in an unfocused manner. Given enough time it will search until a solution is found. 9 To make the behavior a bit more focused, search-control knowledge can be added that guides the selection of operators. Two simple search-control productions are shown in Figure 2-12. A void-undo will avoid operators that move a tile back to its prior cell.
Mea-operator-select is a means-ends-analysis heuristic that prefers the selection of an operator if it moves a tile into its desired cell. This is not a fool-proof heuristic rule, and will sometimes lead Soar to make an incorrect move.

avokJ'undo:
If the current problem space is eight-puzzle, then create a worst-preference for the operator that will move the tile that was moved by the operator that created the current state. mea-opera tor selection: If the current problem space is eight-puzzle and an operator will move a tile into its cell in the desired state, then make a best-preference for that operator.

Impasses and Subgoals
When attempting to make progress in attaining a goal, When an impasse occurs, returning to the elaboration phase cannot deliver additional knowledge that might remove the impasse, for elaboration has already run to quiescence. Instead, a subgoal and a new context is created for each impasse. By responding to an impasse with the creation of a subgoal. Soar is able to deliberately search for more information that can lead to the resolution of the impasse. All types of knowledge, task-implementation and search-control, can be encoded in the problem space for a subgoal.
If a tie impasse between objects for the same slot arises, the problem solving to select the best object will usually result in the creation of one or more desirability preferences, making the subgoal a locus of searchcontrol knowledge for selecting among those objects. A tie impasse between two objects can be resolved in a number of ways: one object is found to lead to the goal, so a best preference is created: one object is found to be better than the other, so a better preference is created; no difference is found between the objects, so an indifferent preference is created; or one object is found to lead away from the goal, so a worst preference is created. A number of different problem solving strategics can be used to generate these outcomes, including: further elaboration of the tied objects (or the other objects in the context) so that a detailed comparison can be

Default Knowledge for Subgoals
An architecture provides a frame within which goal-oriented action takes place. What action occurs depends on the knowledge that the system has. Soar has a basic complement of task-independent knowledge about its own operation and about the attainment of goals within it that may be taken as an adjunct to the 2. Make all operators acceptable. If there are a fixed set of operators that can apply in a problem space, they should be candidates for every state. This is accomplished by creating acceptablepreferences for those operators that are directly linked to the problem space.

No operator retry.
Given the deterministic nature of Soar, an operator will create the same result whenever it is applied to the same state. Therefore, once an operator has created a result for a state in some context, a preference is created to reject that operator whenever that state is the current state for a context with the same problem space and goal.

Diagnose impasses. When an impasse occurs, the architecture creates a new goal and context that provide
some specific information about the nature of the impasse. From there, the situation must be diagnosed by search-control knowledge to initiate the appropriate problem-solving behavior. In general this will be taskdependent, conditional on the knowledge embedded in the entire stack of active contexts. For situations in which such task-dependent knowledge does not exist, default knowledge exists to determine what to do.
1. Tie impasse. Assume that additional knowledge or reasoning is required to discriminate the items that caused the tie. The selection problem space (described below) is made acceptable to work on this problem. A worst-preference is also generated for the problem space, so that any other proposed problem space will be preferred.
2. Conflict impasse. Assume that additional knowledge or reasoning is required to resolve the conflict and reject some of the items that caused the conflict. The selection problem space is also the appropriate space and it is made acceptable (and worst) for the problem space role.

No-change impasse. a. For goal problem space and state roles. Assume that the next higher object in the context is responsible for the impasse, and that a new path can be attempted if the higher object is rejected. Thus, the default action is to create a reject-preference for the next higher object in the context or supercontext. The default action is taken only if a problem space is not selected for the subgoal that was generated because of the impasse. This allows the default action to be overriden through problem solving in a problem space selected for the nochangc impasse. If there is a no-change impasse for the top goal, problem solving is halted because there is no higher object to reject and no further progress is possible.
There has been little experience with conflict subgoals so far. Thus, little confidence can be placed in the treatment of conflicts and thev will not be discussed farther.

PAGE 31
b. For operator role. Such an impasse can occur for multiple reasons. The operator could be too complex to be performed directly by productions, thus needing a subspace to implement it, or it could be incompletely specified, thus needing to be instantiated. Both of these require task-specific problem speces and no appropriate default action based on them is available. A third possibility is that the operator is inapplicable to the given state, but that it would apply to some other state. This does admit a domain-independent response, namely attempting to find a state in the same problem space to which the operator will apply (operator subgoaiing). This is taken as the appropriate default response.

Rejection impasse.
The assumption is the same as for (nonoperator) no-change subgoals: the higher object is responsible and progress can be made by rejecting it. If there is a rejection impasse for che top problem space, problem solving is halted because there is no higher object The selection problem space. This space is used to resolve ties and conflicts. The states of the selection space contain the candidate objects from the supercontext (the items associated with the subgoal). Figure 2-14 shows the subgoal structure that arises in the eight puzzle when there is no direct search-control knowledge to select between operators (such as the mea-operator-selection production). Initially, the problem solver is at the upper-left state and must select an operator. If search control is unable to uniquely determine the next operator to apply, a tie impasse arises and a subgoal is created to do the selection. In that subgoal, the selection problem space is used.

Chunking
Chunking is a learning scheme for organizing and remembering ongoing experience automatically on a continuing basis. It has been much studied in psychology [7,12,43,50] and it was developed into an explicit learning mechanism within a production-system architecture in prior work [35. 61. 63]. The current chunking scheme in Soar is directly adapted from this latter work. As defined there, it was a process that acquired chunks that generated the results of a goal, given the goal and its parameters. The parameters of a goal were

chunk-production that will later control search; a goal to apply an operator to a state leads to the creation of a chunk-production that directly implements the operator. The occasions of subgoals are exactly the conditions where Soar requires learning, since a subgoal is created if and only if the available knowledge is insufficient
for the next step in problem solving. The subgoal is created to find the necessary knowledge and the chunking mechanism stores away the knowledge so that under similar circumstances in the future, the knowledge will be available. Actually, Soar learns what is necessary to avoid the impasse that led to the subgoal, so that henceforth a subgoal will be unnecessary, as opposed to learning to supply results after the subgoal has been created As search-control knowledge is added through chunking, performance improves via a reduction in the amount of search. If enough knowledge is added, there is no search; what is left is an efficient algorithm for a task. In addition to reducing search within a single problem space, chunks can completely eliminate the search of entire subspaces whose function is to make a search-control decision or perform a task-implementation ftinction (such as applying an operator or determining the initial state of the task).

The chunking mechanism
A chunk production summarizes the processing in a subgoal. The actions generate those working-memory elements that eliminated the impasse responsible for the subgoal (and thus terminated the subgoal). The conditions test those aspects of the current task that were relevant to those actions being performed. The Once a trace is created it needs to be stored on a list associated with the goal in which the production fired.

However, determining the appropriate goal is problematic in Soar because elaborations can execute in parallel
for any of the goals in the stack. The solution comes from examining the contexts tested by the production.
The lowest goal in the hierarchy that is matched by conditions of the production is taken to be the one affected by the production firing. The production will affect the chunks created for that goal and possibly, as we shall see shortly, the higher goals. Because the production firing is independent of the lower goals -it would have fired whether they existed or not -it will have no effect on the chunks built for those goals. The others are then recursively analyzed as if they were results, to determine the pre-subgoal elements that were responsible for their creation.

Earlier versions of chunking in Soar [36] implicitly embodied the assumption that problem solving was perfect -if a rule fired in a subgoal, then that rule must be relevant to the generation of the subgoafs results.
If there is a condition that tests for the absence of a working-memory clement, a copy of that negated condition is saved in the trace with its variables instantiated from the values bound elsewhere in the production.   Once the evaluation subgoal is generated, the production eval*select-role-operator fires and creates acceptable-preferences for the original task problem space (PI), the original task state (SI), and the operator being evaluated (01). The production also augments goal G3 with the task goafs desired state (Dl). Many of the production's conditions match working-memory elements that are a part of the definition of the evaluate-object operator, and thus existed prior to the creation of subgoal G3. These test that the subgoal is to implement the evaluate-object operator, and they access identifiers of super-objects so that the identifiers can be included in the preferences generated by the actions of the production. Following the selection of PI and SL a production instantiation fires to generate a best-preference for operator 01 for this specific goal, problem space, and state. This production firing is not shown because it does not add new conditions to the chunk.
The problem solving continues with the selection of 01 and the generation of a new state (S2). The unchanged bindings are copied by a rule that is not shown because it does not affect the subgoafs resulL S2 is selected and then evaluated by production evaTstate-plus-one, which augments object El with the value of the evaluation. This augmentation is a result of the subgoal because object El is linked to the state in the parent context. Immediately afterwards, in the same elaboration phase, a production generates a rejectpreference for operator 04, the evaluate-object operator. This production has no effect on the chunk built for subgoal G3 because it looks only at higher contexts. Once the reject-preference is created, operator 04 is rejected, another operator is selected, the no-change impasse is eliminated, subgoal G3 is terminated, and a chunk is built Only certain of the augmentations of the objects are included in the chunk; namely, those that played an explicit role in attaining rhe result For instance, only portions of the state (SI) and the desired state (Dl) are included. Even in the substructure of the state, such as binding B2, its tile (T2) has only its identifier saved, and not its value (6), because the actual value was never tested. The critical aspect to be tested in the chunk is that the tile appears as a tile-augmentation of both bindings B2 and DB1 (a binding in the desired state, Dl).
The exact value of the tile is never tested in the subgoal, so it is not included in the chunk. The conditions created from these working-memory elements will test: that a tile (in this case T2) in the current state (SI) is in a cell adjacent to the cell containing the blank: and that the ceil containing the blank is the cell in which the tile appears in the desired state. In other words, the chunk fires whenever the evaluate-object operator is selected in the selection problem space and the operator being evaluated will move a tile into place. which is similar to that used for Ops5 productions. Each production is a list, consisting of a name, the conditions, the symbol "-->", and the actions. Each condition is a template to be matched against workingmemory elements. Symbols in a production of the form "<...>" (e.g., <G1>) are variables, all others are constants. The actions are templates for the generation of working-memory elements. In building the chunk, all identifiers from the original working-memory elements have been replaced by variables. The constants in the working-memory elements, those symbols that have no farther augmentations (evaluate-object, eight-puzzle, blank), remain as constants in the conditions. Identifier variablization is also responsible for the additional negation predicates in the specification of objects <S1> and <B2>, such as { <> <B1> <B2> } in object <S1>. This is a conjunctive test that succeeds only if <B2> can be bound to a value that is not equal to the value bound to <B1>, thus forcing the objects that are bound to the two variables to be different.

Discussion
The Soar architecture has been fully described in the previous section. However, the consequences of an architecture are hardly apparent on surface examination. The collection of tasks that Soar has accomplished, exhibited in Figure 1, provides some useful information about viability and scope. However, simply that Soar can perform these tasks -that the requisite additional knowledge can be added -is not entirely surprising.
The mechanisms in Soar are variants of mechanisms that have emerged as successful over the history of AI research. Soar's accomplishing these tasks does provide answers to other questions as well. We take up some of these here. This discussion also attempts to ensure that Soar's mechanisms and their operation are clear.
We limit ourselves to aspects that will shed light on the architecture. The details of Soar's behavior on specific tasks can be found in the references.

The first question we take up is what Soar is like when it runs a real task consisting of multiple aspects with
varying degrees of knowledge. The second question is how Soar embodies the weak methods, which form the foundation of intelligent action. The third question involves learning by chunking.

PDP-11 computers [3, 41]. Rl-Soar is an implementation in Soar of a system that exhibits about 25% of the
functionality of Rl, using the same knowledge as obtained from Rl's Ops5 rules [65,75]. This is a big enough fraction of Rl to assure that extension to a complete version would be straightforward, if desired. 13 The part covered includes the most involved activity of the system, namely, the assignment of modules to backplanes, taking into account requirements for power, cabling, etc.
Rl-Soar was created by designing a set of problem spaces for the appropriate subpart of the configuration task. The problem spaces were added to the basic Soar system (the architecture plus the default knowledge, as described in the previous section). No task-dependent search-control knowledge was included. The resulting system was capable of accomplishing the configuration subtask, although with substantial search.

Rl-Soar's behavior was initially explored by adding various amounts of search control and by turning chunking on and off. Later experiments were run with variations in the problem spaces and their organization.
Thus, Rl-Soar is a family of systems, used to explore how to combine knowledge and problem solving.
In the eight puzzle there was a single operator which was realized entirely by productions within a single problem space. However, the configuration task is considerably more complicated. In an extended version of Indeed, a revision of Rl is underway at DEC that draws on the problem structure developed for Rl-Soar [76].

Rl-Soar [75]
, which covered about 25% of Rl (compared to about 16% in the initial version [65]), there were thirty-four operators. Twenty-six of the operators could be realized directly by productions, but eight were complex enough to require implementation in additional problem spaces. Figure 3-' shows the nine task spaces used in the extended version of Rl-Soar. This structure, which looks like a typical task-subtask hierarchy, is generated by the implementation of complex operators. In operation, of course, specific instances of these problem spaces were created, along with instances of the selection problem space. Thus,

Weak Methods
Viewed as behavior, problem-solving methods are coordinated patterns of operator applications that attempt to attain a goal. Viewed as behavioral specifications, they are typically given as bodies of code that can control behavior for the duration of the method, where a selection process determines which method to use for a 14 These runs took about 29. 4 and 2.5 minutes respectively on a Symbolics 3600 running at approximately one decision cycle per second. Each deasion cycle compnses about 8 production firings spread over two cycles of the elaboration phase (because of the parallel firing of rules). Tasks  Tl  T2  T3  T4  T5  T6  T7  T8  T9 T10 Til T12 T13 T14 T15   Components  5  5  2  7  5  8  2  3  5  5  15  2  11  79   Decisions  88  78  78 196  94 100  70  74  88  90 173  78  behavior of a weak method, but each are independent, providing links between some aspect of task structure and preferences for action. For instance a depth-limited lookahead has one production that deals with the evaluation preferences and one that deals with enforcing the .depth constraint. Soar would produce appropriate (though different) behavior with any combination of these productions. Another important determiner of a method may be specialized task structure, rather than any deliberate responses encoded in search control. As a simple instance, if a problem space has only one operator, which generates new states that are candidates for attaining the task, then generate-and-test behavior is produced, without any search control in addition to that defining the task.

SOAR: AN ARCIIITliCTLRn FOR GENERAL INTELLIGENCE
The methods listed in Figure 3- methods, such as analogy by implicit generalization and simple abstraction planning, the method was realized for a single task, and more general forms are currenly under investigation.
The descriptions of the weak methods in Figure 3-4 are extremely abbreviated, dispensing with the operating environment, initial and terminating conditions, side constraints, and degenerate cases. All these things are part of a full specification and sometimes require additional (independent) control productions. Figure   3-5 shows graphically the structural relationships among the weak methods implemented in Soar 2 [29]. The common task structure and knowledge forms the trunk of a tree, with branches occurring when there is different task structure or knowledge available, making each leaf in the tree a different weak method. Each of the additions as one goes down the tree are independent control productions.
These simple schemes are more than just a neat way to specify some methods. The weak methods play a central role in attaining intelligence, being used whenever the situation becomes knowledge lean. This occurs in all situations of last resort, where the prior knowledge, however great, has finally been used up without attaining the task. This also occurs in all new problem spaces, which are necessarily knowledge lean. The weak methods are also the essential drivers of knowledge acquisition. Chunking necessarily implies that there exists some way to attain goals before the knowledge has been successfully assimilated (i.e., before it has been chunked). The weak methods provide this way. Finally, there is no need to learn the weak methods themselves as packaged specifications of behavior. The task descriptions involved must be acquired and the linkage of the task descriptions to actions. But these linkages are single isolated productions. Once this happens, behavior follows automatically. Thus, this is a particularly simple acquisition framework that avoids any independent stage of program synthesis.

Learning
The operation of the chunking mechanism was described in detail in the previous section. We present here a picture of the sort of learning that chunking provides, as it has emerged in the explorations to date. We have no indication yet about where the limits of chunking lie in terms of its being a general learning mechanism [36]. Figure 3-6 provides a demonstration of the basic effects of chunking, using the eight puzzle [33]. The left-hand column (no learning) show the moves made in solving the eight puzzle without learning, using the representation and heuristics described in the prior section (the evaluation function was used rather than the mea-operator-selcction heuristic). As described in Figures 2-14 and 2-15, Soar repeatedly gets a tie impasse between the available moves, goes into the selection problem space, evaluates each move in an incarnation of the task space, chooses the best alternative, and moves forward.  Figure >6: Learning in the eight puzzle [33].

Caching, within-trial transfer and across-trial transfer
If Soar is run again after it has completed its with-leaming trial, column 3 (after learning) results. All of the chunks to be learned in this task have been learned during the one with-learning trial, so Soar always knows which move to make. This is the direct effect of practice -the use of results cached during earlier trials. The number of states examined (10) now reflects the demands of the task, not the demands of finding the solution.
This improvement depends on the original evaluation function being an accurate measure of progress to the goal. Chunking eliminates the necessity for the look-ahead search, but the path Soar takes to the goal will still be determined by the evaluation function cached in the chunks. were built for a goal only if no subgoals occurred. Enough runs with bottom-up chunking will yield the same results as all-at-once chunking (which was used in both the eight puzzle and initial Rl-Soar cases). Bottom-up chunking has the advantage of tending to create only the chunks that have a greater chance of being repeatedly used The higher up in the subgoal hierarchy (measured from the bottom, not the top), the more specific a chunk becomes -it performs a larger proportion of the task -and the less chance it has to be used [50]. Thus, in Rl-Soar all-at-once chunking will create many productions that will never be evoked again in any but identical reruns of the same task.  T2  T3   T4   T5   T6  T7  T8  T9  T10  Til  T12  T13  cross-situational applicability. But even so, we see clearly that the transfer action comes from the lowest level chunks (the first pass), which confirms theoretical expectations that they have the most generality. And, more globally, learning and performance always go together in Soar in accomplishing any task.

Chunking, generality, and representation
Chunking is a learning scheme that integrates learning and performance. Fundamentally, it simply records problem-solving experience. Viewed as knowledge acquisition, it combines the existing knowledge available for problem solving with knowledge of results in a given problem space, and converts it into new knowledge available for future problem solving. Thus it is strongly shaped by the knowledge available. This integration is especially significant with respect to generalization -to the transfer of chunks to new situations (e.g., as documented above). Generalization occurs in two ways in Soar chunking. One is variablization (replacing identifiers with variables), which makes Soar respond identically to any objects with the same description (attribute-value augmentations). This generalization mechanism is the minimum necessary to get learning at all from chunking, for most identifiers will never occur again outside of the particular context in which they were created (e.g., goals, states, operator instantiations).