ISSUES IN SYMBOL PROCESSING

The notion of mind as symbol processor is fundamental to AI and congitive science, but some connectionists are now arguing against it. challenges the validity of formal symbol manipulation as a level of mental description. We review some of the claims that have been made, and argue that most connectionist models, especially those constructed by learning algorithms, are operating at the level of pattern classifiers. Their (rather limited) success using non-symbolic representations demonstrates that they have not yet even approached the tasks which symbol processing models attempt to solve. in connectionist research may require reimplementation rather than rejection of the symbolic level. Abstract The notion of mind as symbol processor is fundamental to AI and cognitive science, but some connectionists are now arguing against it. Eliminative connectionism challenges the validity of formal symbol manipulation as a level of mental description. We review some of the claims that have been made, and argue that most connectionist models, especially those constructed by learning algorithms, are operating at the level of pattern classifiers. Their (rather limited) success using non-symbolic representations demonstrates that they have not yet even approached the tasks which symbol processing models attempt to solve. Continued progress in connectionist research may require reimplementation rather than rejection of the symbolic level.


ABSTRACT
The notion of mind as symbol processor is fundamental to AI and congitive science, but some connectionists are now arguing against it. Eliminative connectionism challenges the validity of formal symbol manipulation as a level of mental description. We review some of the claims that have been made, and argue that most connectionist models, especially those constructed by learning algorithms, are operating at the level of pattern classifiers. Their (rather limited) success using non-symbolic representations demonstrates that they have not yet even approached the tasks which symbol processing models attempt to solve.
Continued progress in connectionist research may require reimplementation rather than rejection of the symbolic level.

Introduction
The first thirty years of AI research proceeded on the entirely plausible assumption that the mind was a symbol processor. This idea has recently been challenged by a group of cognitive scientists known as connectionists, who construct neural network models which they claim have no equivalent description at the formal symbolic level. Rumelhart and McClelland (1986a) and McClelland and Rumelhart (1986) provide a good introduction to the connectionist approach, also known as "parallel distributed processing." The controversy surrounding connectionism has been heating up recently in response to three important papers. Smolensky (in press) provides the definitive statement of the hypotheses underlying connectionism. Pinker and Prince (1987) and Fodor and Pylyshyn (1987) challenge the connectionist view, both by attacking general claims that others have made about these networks and by criticizing particular models that have appeared in the literature. A review of the current status of connectionist symbol processing will allow the interested reader to follow the debate as it unfolds.
As in Pinker and Prince (1987, pp. 4-7), we divide connectionism into three schools. "Implementational connectionism* is concerned with how massively parallel arch-itectures might implement the classical notion of symbol processing. Examples include Touretzky and Hinton's neural network production system interpreter (Touretzky and Hinton, 1985) and Ballard's connectionist implementation of resolution (Ballard, 1986). "Eliminative connectionism," on the other hand, denies the validity of symbolic-level descriptions. (Smolensky's paper serves as a sort of u Eliminativist Manifesto.") Eliminativists believe that to accurately explain what goes on in the mind one must shift to a hypothetical sub-symbolic level, more abstract than the neural level, but also fundamentally different from (and not merely an implementation of) symbolic-level computation. Finally, "revisionist-symbol-processing connectionism" is suggested by Pinker and Prince as a middle ground where discoveries might lead to fundamental changes in our understanding of symbol processing without forcing us to abandon the classical commitment to symbolic-level descriptions. This paper is primarily concerned with the eliminative position, not because it is preferred, but because as the most radical version of connectionism it is most at odds with the classical account of intelligence.

The Symbolic-Level Paradigm
The symbolic-level paradigm underlies most research in AI and cognitive science. Language, commonsense reasoning, and conscious problem solving can all be described at this level in terms of structures, composed of symbols, that are manipulated by formal rules. Parse trees, semantic nets, frames, scripts, and axiom sets are examples of composite symbol structures. Their manipulation can be specified in various ways, some of which are computational, such as production rules, theorem proven, and Lisp functions, and others which are descriptive but not necessarily computational, such as the rules linguists write to describe syntactic or phonological processes.
The claim made by the symbolic-level paradigm is that intelligent behavior can be adequately explained purely in terms of formal operations on symbol structures. Newell (1980a) calls this the Physical Symbol System Hypothesis.
In other words, the mind contains symbol structures for concepts, goals, intentions, memories, and so forth, and intelligence derives from the effective manipulation of these structures. Eliminative connectionism denies this. Before getting into what the connectionists would have in place of symbols and rules, I should emphasize a key point in the definition of the symbolic-level paradigm. It isn't just a claim that the mind works by manipulating symbols; it is a claim that the structures the mind manipulates can be directly identified with the elements of mental life: they are words, thoughts, percepts; not arbitrary, meaningless atoms.
Consider a thermostat with setpoint To whose behavior is governed by the following rule:

IF T < To
THEN turn-on(furnace) ELSE turn-off(furnace) This rule constitutes a symbolic-level theory of thermostats. It is expressed in terms of ambient temperature, setpoint, and furnace activity: the language of the t her mostatic domain. It does not refer to the individual atoms that make up the thermostat, or to the motions of particles in the atmosphere of the room. It is a formal theory because it can be implemented, and it accurately predicts the thermostat's behavior. According to the symbolic-level paradigm, mental processes can also be explained by formal theories, without reference to phenomena such as neuron firings that exist only at a lower level of description.

Are There Rules?
If there are explicit representations of rules in the head, there must be an interpreter to execute the rules as thinking proceeds. Conscious problem solving behavior does appear to be rule-based. For example, John Anderson's ACT* model, which learns new production rules as it gains experience at tasks such as proving geometry theorems, offers a good account of how humans behave when performing the same tasks (Anderson, 1983;Anderson, in press). But it is important to distinguish between conscious, deliberate behavior and intuitive behavior. The latter is not explainable by introspection, nor is it decomposable into consciouslyaccessible steps such as occur in problem solving.
Intuitive-level phenomena certainly include such things as vision and motor control, which operate almost entirely below the level of conscious thought. Language and common sense reasoning also proceed largely at the subconscious level, and appear to be intuitive. Smolensky suggests that our linguistic facility actually serves as the rule interpreter for conscious problem solving, and that what we perceive as consciousness is a series of snapshots of the state of an intuitive processor that is not itself rule-based. Note, however, that rule-based behavior isn't necessarily conscious. Newell (1980b) used the production rule formalism to speculate about mental implementations of the Harpy speech understanding system.
In linguistics, the goal has been to explain phenomena by deriving the most economical set of rules that account for the data. Linguists shy away from claiming that these formal rules, with their associated interpreter, are what is actually in the head (Stabler 1983;Thompson, 1983). However, models of linguistic development (as opposed to competence) are often phrased in terms of rule acquisition and revision, which may require an explicit representation for rules. The symbolic-level paradigm is about the description of behavior in terms of symbols and rules; it says nothing about the explicit representation of rules. In the case of the thermostat, which clearly has no rule interpreter inside it, the physical structure of the device induces certain causal relationships between the ambient temperature, setpoint, and furnace activity which are accurately summarized by the rule we gave previously. That is all the symbolic-level paradigm requires.

Symbolic-Level Connectionism
Some connectionist models identify symbols with particular units. These are known as iocalist models, to distinguish them from the distributed models that are the focus of this paper. Cottrell (1985) and Waltz and Pollack (1985) use a Iocalist representation in which individual units stand for words or word senses; Shastri (1985) uses units to denote classes and properties in an inheritance hierarchy; Selman (1985) and Fanty (1986) use units to denote input atoms and grammatical tokens in networks that parse context-free languages. Localist models may have interesting dynamical properties, e.g., when units denoting competing word senses inhbit each other, they are in effect "fighting" to settle on the most plausible interpretation of the input. In Pollack's model of the garden path sentence "The astronomer married the star," the ASTRONOMER unit causes the HEAVENLY-BODY unit to have a higher initial activation than MOVIE-STAR, but HEAVENLY-BODY eventually loses out due to constraints imposed by MARRIED, much as humans revise their initial interpretation the first time they hear the sentence.
Fodor and Pylyshyn's criticsm of the localist approach focuses on the inability to compose symbols when they are tied directly to processing units. They point out that although one can designate individual units to stand for P, Q, and P&Q, the fact that P&Q references P cannot be expressed. An excitatory connection from P&Q to P would allow the network to infer P whenever P&Q is asserted to be true. But the network cannot decompose P&Q to get P, nor can it compose new structures such as P&Q&R from already existing ones. But Fodor and Pylyshyn go too far when they claim distributed models suffer the same difficulty; Touretzky (1986) and Touretzky and Geva (1987) provide counterexamples.
Most connectionists view the localist approach as a temporary compromise that allows them to conveniently explore certain dynamic constraint satisfaction phenomena. When the full power of the classical symbol processing model has been implemented in a distributed connection-ist architecture, the localist approach may no longer be attractive.

The Sub-symbolic Paradigm
In distributed connectionist models, symbols and symbol structures are represented by patterns of activity over a collection of units, rather than by individual units. Symbols are then points in a high-dimensional metric space, with a natural similarity measure being the dot product. Although, as Fodor and Pylyshyn note (p. 58), one may impose arbitary similarity measures on conventional symbol systems, in the connectionist case the similarity effects are rooted in the causal structure of the model.
A large class of distributed connectionist models are concerned with pattern classification or pattern transformation. For example, Sejnowski and Rosenberg's celebrated NET talk model maps input patterns that represent a seven letter window of text to output patterns that represent a phoneme (Sejnowski and Rosenberg, 1987). Rumelhart and McClelland's verb learning model maps phonemic representations of present tense verbs to phonemic representations of the corresponding past tense, e.g., "hug" to "hugged" and "go" to "went" (Rumelhart and McClelland, 1986b). The weights in both of these models are derived by connectionist learning procedures from repetitive exposure to example inputs. Rumelhart

and McClelland used a version of the perceptron learning algorithm, while Sejnowski and Rosenberg used the more recent back propagation algorithm of Rumelhart, Hinton, and Williams (1986).
Another large class of models perform constraint satisfaction by relaxation, such as Hopfield nets (Hopneld, 1982), the Boltzmann machine (Hinton and Sejnowski, 1986), and harmony theory (Smolensky, 1986). Boltzmann machines and harmony theory are stochastic models that relax by simulated annealing, in analogy with statistical mechanics. These networks also have learning algorithms.
An observation connectionists are fond of making is that there are no explicit rules in distributed models: all the knowledge is in the connection strengths. Since individual units are not meaningful as symbols (only activity patterns taken as a whole are meaningful), the connections between units cannot be regarded as symbolic-level rules, and the connectionist model's behavior is not rule-based (Smolensky, in press;Derthick and Plaut, 1986).
The natural (but incorrect) counter to this argument is that it could be made about any symbol manipulation system if we choose too low a level of description, e.g., describing a digital computer by the behaviors of individual transistors. The flaw in this reasoning is that the computer's circuitry is constrained a priori to implement a logicallydesigned instruction set. Therefore one can abstract away from the transistor level to an instruction-level of description without loss of information about the computational behavior of the machine. In contrast, distributed connectionist models are not constrained to implement machines with symbolic-level descriptions. They are typically constructed by learning procedures whose only goal is to min-imize an error measure by modifying connection strengths. Connectionists threfore claim that since the learning procedure is not required (or even trying) to implement a machine that possesses a symbolic-level description, it is unlikely that the networks they construct will have such descriptions. To the extent that these networks exhibit intelligent behavior, their intelligence is at the subsymbolic level, not at the level of formal operations on symbol structures; the latter is at best an approximate description of the computation taking place.
In the remainder of this paper I will argue against the connectionist position, beginning with a reexamination of the symbolic-level theory of thermostats.

Beyond Pattern Transformation
Consider a graph of ambient temperature vs. setpoint. We can draw a line with slope 1 that divides the graph into two regions, one labeled "furnace on," the other "furnace off." Given any point specified by T and T o , we can predict from the region the point falls in whether the thermostat will turn the furnace on or off. Furthermore, using back propagation we can train a one-unit connectionist network to do this, and it will automatically "generalize" to points not in the training set.
This example demonstrates that the thermostat can be faithfully modeled as a one-neuron linear discriminator rather than as a symbol processing device. More demanding discriminations, involving multiple regions with complex shapes, would require more units and several processing layers, but they are not fundamentally different from this example. Connectionist learning schemes are apparently quite good at learning to make pattern discriminations; they are often better than previously-known pattern recognition techniques. Furthermore, connectionist models can learn to transform patterns rather than merely classify them; under certain conditions a properly-trained network can transform a novel pattern into another novel pattern, once it's learned the "rule" for doing so. But this has little to do with symbol processing.
If we examine the connectionist models that have been held up as evidence against the symbolic paradigm, we see that rather than attacking the symbol manipulation problem head on to demonstrate the illusory nature of symbol processing, they have instead been trivializing complex behaviors to get simple tasks that can be solved by pattern transformation or just pattern recognition. The (rather limited) success of these models merely proves that symbol processing isn't required for such tasks, just as it isn't required to implement a thermostat.
There are several reasons why intelligent behavior should not be dismissed as simply a pattern transformation problem. First, as Fodor and Pylyshyn point out, language and thought have a highly combinatorial, compositional structure. Whether or not such structure is reflected at some hypothetical sub-symbolic level, connectionist models must at least behave as if they had such structures inside them. Pattern transformation systems do not meet this criterion unless they are trained on practically every structure they will ever encounter, as in (Allen, 1987). Second, the notion of "variables" is essential to intelligent behavior, as it permits the manipulation of structures that were not specified in advance. One example is filling in the participants in a script. If we see John go into a restaurant where Mary works, we know that it is Mary who will bring the menu and John who will read it. This is not conscious problem solving, it's the sort of common sense reasoning that takes place at the intuitive level. The restaurant script includes the variables Customer and Waitress, and our intuitive comprehension of the situation leads us to conclude that John and Mary play those respective roles; this in turn allows John and Mary to be instantiated elsewhere in the script to predict things like who reads the menu. Another example, due to Pinker and Prince, is morphological reduplication, which copies an entire stem as a unit, yielding forms such as "dum-dum" or "boom-boom". The variable, in this case, references the stem to be copied.
Third, as Drew McDermott notes in (McClelland et a/., 1986), the ability to contrast one thing with another (as in "She is more outgoing with her friends than with me, her advisor") is an important part of reasoning. McDermott calls this property "thirdness". How can a connectionist model formulate such propositions without doing symbol manipulation?

Conclusion
The nature of thought and language would appear sufficient to impose a symbol-processing level on connectionist networks, provided we choose a task rich enough to require this level rather than one so simple it can be solved by pattern transformation alone.
On the other hand, if the eliminative hypothesis is correct, connectionists should be able to point to mental phenomena that cannot be explained by symbol processing models, but are explained by connectionist ones. They are nowhere near the point where they can do this convincingly. Connectionist models are too primitive to reproduce even basic symbol processing behavior, and the space of symbolic-level models is huge, making it difficult to prove that no such model could ever account for a particular behavior.
Even if the eliminative hypothesis is disproved, leaving connectionism an implementation technique instead of an alternative paradigm for cognition, the distributed connectionist approach promises to be a source of many valuable insights. For example, Derthick's Micro-KLONE, a connectionist version of KL-ONE (Brachman and Schmolze, 1985), shows how counterfactual reasoning by constructing plausible models can be formulated as a massively parallel constraint satisfaction problem involving thousands of sub-symbolic micro-inferences (Derthick, 1987). This approach to reasoning is a distinct departure from traditional AI methods. The parallel application of knowledge is a serious problem that AI has largely failed to address.
In conclusion, connectionist models are too new for us to determine the validity of the eliminative hypothesis. But if this revolutionary idea is to have any chance of success, connectionists must first construct more complex models that go beyond simple pattern transformation and relaxation. They must at least approximate the powerful linguistic and inferential abilities human beings are known to possess.