Reasons as Causes in Bayesian Epistemology

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. Journal of Philosophy, Inc. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Philosophy.


REASONS AS CAUSES IN BAYESIAN EPISTEMOLOGY*
In everyday matters, as well as in law, we allow that someone's reasons can be causes of her actions, and often are. That correct reasoning accords with Bayesian principles is now so widely held in philosophy, psychology, computer science, and elsewhere that the contrary is beginning to seem obtuse, or at best quaint. And that rational agents should learn about the world from energies striking sensory inputs?nerves in people?seems beyond question. Even rats seem to recognize the difference between correlation and causation,1 and accordingly make different inferences from passive observation than from interventions. A few statisticians aside,2 so do most of us.
To square these views with the demands of computability, increasing numbers of psychologists and others have embraced a particular for malization, causal Bayes nets, as an account of human reasoning about and to causal connections.3 Such structures can be used by rational agents, including humans in so far as they are rational, to have degrees of belief in various conceptual contents, which they use to reason to expectations, which are realized or defeated by sensory inputs, which cause them to change their degrees of belief in other contents in accord with Bayes's Rule, or some generalization of it. How is all of this supposed to be carried out?

I. REPRESENTING CAUSAL STRUCTURES
The causal Bayes net framework adopted by a growing number of psychologists goes like this: Our representations of causal relations are captured in a graphical causal model, or causal Bayes net. We reason * Thanks to Alison Gopnik for suggesting dynamic Bayes nets, and the James S. McDonnell Foundation Causal Learning Collaborative for intellectual support.
1 Aaron P. Blaisdell, Kosuke Sawa, Kenneth J. Leising, and Michael R. Waldmann, "Causal Reasoning in Rats," Science, cccxi (2006) implicitly as though we were calculating explicitly (but often not quite accurately) with such a network in hand. The network is a mathematical object describing relations among features of a system or situation that are potentially variable?for example, having at least present or absent as possible values. Those features are vertices, or variables, in a network with directed edges from some vertices to others. A set of conditional probabilities is associated with the network, specifying for each vertex, V, the probability of each of its values conditional on each specification of values of the vertices in the graph that are parents of V?that is, those that have edges directed into V. The graph is almost always assumed to be acyclic: there is no sequence of directed edges leading from a variable back to that same variable. For example, a simple network relating a lamp to an electrical power source and a switch on a timer might be: Power and Switch have independent probabi state of Lamp is determined uniquely by its if Power is on and Switch is on. If the value unknown and varies from case to case, then t appear to be an indeterministic function of S tent is captured by the supposition that a di changes the state of a variable changes the st stream from it, but leaves the state of other v for example, an intervention that breaks the the state of the Lamp fixed at off, regardless of 4 A framework for specifying the consequences of interv in Peter Spirtes, Glymour, and Richard Scheines, Causation lin: Springer, 1993). Algorithms for computing the effec definite value for one or more variables are given in Ju Reasoning, and Inference (New York: Cambridge, 2000), and understanding causation by James Woodward, Making Thin Explanation (New York: Oxford, 2003).

THE JOURNAL OF PHILOSOPHY
Switch, or Timer, and leaves the probabilities of Power, Switch, and Timer unaltered, as if the edges from Switch and Power to Lamp were broken by the intervention, though the arrow from Timer to Switch is left unaltered.
The causal attribution carries with it expectations?in the human, not the probabilistic, sense?about the joint frequencies of events, and also expectations about the results of possible or hypothetical interventions on features of the system. If an agent believes a light switch causes a lamp to go on, then the agent expects that turning the switch on and off will turn the lamplight on and off. The agent can use the Bayes net to reason to a degree of belief in a particular event, or even to a specific predicted value for a variable, if other variables in the network have specified values: if the light switch is believed to be chancy (but not overly so), the agent may derive a degree of belief that the lamp goes on when the switch is thrown, or may conclude that the lamp will go on. In the other direction, the agent can use the Bayes net to reason backwards to alter her degrees of belief about the possible causes of some observed event, say that the lamp is on, in accord with Bayes's Rule. If the agent's prior degree of belief that the switch is on is r, that the lamp light is on is p, and that the lamp light is on given that the switch is thrown is q, then after coming to believe that the lamp light is on, the agent's degree of belief that the switch is on changes to rq/p. Further, the agent can use the Bayes net to reason hypothetically about the results of possible interventions that would fix one or more features from outside the system, and in particular, about the results of her or others' actions. Suppose the timer is set to turn on the light switch automatically at a certain time. The effect on the state of the lamp of an outside intervention that turns off the light switch at some later time is reasoned about hypothetically by supposing that the value of Switch is fixed in the Causal Bayes Net Ascribed to the World, but nothing else is altered. In particular, the degrees of belief in the Timer state and that the lamp is on given that the light switch is off are left the same. Switch, which formerly depended on its parent variable in the graph (Timer), now becomes independent (in the degree of belief measure) of its parent. Graphically, the intervention breaks the directed edges from the parent variables?whatever they may be?to Switch. The results are expected to match the causal consequences of the corresponding intervention in the world.
This nice theoretical picture is substantiated by a variety of ex periments that suggest that even young children make predictions and provide explanations that are patterned as a Bayes net requires, and change their confidence in outcomes roughly according to Bayes's Rule. Some bits of the account are better established than others, and of course people make errors and are computation limited. For example, no account is given of how people choose t attend to one phenomenon rather than another, and in psych ical experiments that focus is almost always provided by the ex menter: Rector ex machina. Computer science aids the psycholog account by providing a variety of algorithms for (comparatively ficiently computing conditional probabilities in a Bayes net, and computing probabilities given an intervention; that is, regardles whether people make inferences just as the computer algorithms the inferences are at least feasible. And, finally, recent work ha shown that neural firing frequencies in a recurrent neural netw one with feedback loops?can implement an algorithm that putes some of the conditional probabilities defined in a Bayes ne Moreover, the neural model corresponds well with firing frequen observed in the visual cortex.*5 But can reasons like these, observations of features of the wor that are causes or effects of other features, be causes? We do not m to suggest somehow that the degrees of belief, or changes in th are epiphenomenal and therefore not causal simply because the c putations of conditional probabilities are carried out by neural p cesses; we are content with local identifications of changes in deg of belief with instances of neural processes. Our concern is rath with how and whether the reasoning that psychologists supp agents do with the Causal Bayes Net Ascribed to the World can it be consistently represented using a causal Bayes net, as shoul possible if those reasons are causes. ii. connecting causal beliefs and inference Let us assume (for the moment) that the connections and me anisms needed for computing probabilities according to the Tim Switch ?> Lamp <? Power network are somehow implemented reasoning agent. Suppose now the agent wishes the lamp to light 6:00 p.m. Her reasoning to a timer setting presumably goes someth like this: "If I set the timer for 6, then the switch will go on at 6. If power is on at 6, then the lamp will certainly light at 6. The tim setting is independent of whether the power is on at 6. It is ver probable that the power will be on at 6. Therefore, if I set the tim for 6, then the light will very probably go on at 6." So she sets the timer to go on at 6:00, and expects the lamp to go on at 6:00. Her reasons include both a desire and a sequence of degrees of belief about consequences of an action. The reasons are causes, not only of her action, but also of the change in her degrees of belief that the switch will go on at 6:00 and that the lamp will go on at 6:00. As causes, her degree of belief reasons mirror the structure of the causal Bayes net structure she ascribes to the Timer/Switch/Lamp/ Power system, but the variables are now her own degrees of belief in various conceptual contents. The goal that the light go on at 6:00, whether hypothetical or desired, somehow determines the relevant variables for the Causal Bayes Net Ascribed to the World (since there must be a great many such causal networks available to the agent), and the course of reasoning to the conditional forecast: The variables now range over values of degrees of belief? these are all real values between 0 and 1, or some finite ra high, medium, low, makes no difference here.
The relations between degrees of belief in the Causal Bay Reasoning to a Forecast are chancy, as they describe some cau in the brain, which may be subject to various chance fluctu cordingly, there are conditional probabilities associated wi rected graph, and in agreement with the psychological hypo causes are represented as a graphical model, we will assume t tional probabilities together determine a joint probability distri Suppose now that the lamp does not light at 6:00, and the that it does not light. According to the psychological story, then reason using the Causal Bayes Net Ascribed to the Wor ditioning on Lamp = off to compute a new probability that is off, and a new probability that the Switch and/or Timer are off. These new computations will constitute reasoning from the observations to new degrees of belief corresponding to new probability ascriptions; the observation about the lamp is a reason to change one's belief about other features of the world. That picture seems eminently reasonable; any serious epistemological theory holds that we use observations to change our degrees or strengths of belief. But that picture does not work with the Causal Bayes Net of Reasoning to a Forecast just described.
The Causal Bayes Net of Reasoning to a Forecast specifies, prior to the agent perceiving at 6:00 that the Lamp is off, the causes of the agent's degree of belief that the Lamp is (or will be) on at 6:00. Those causes are her prior degrees of belief that the Switch is on at 6:00 and her prior degree of belief that the Power is on at 6:00, and more remotely, her prior degree of belief that the Timer is on at 6:00. In other words, perception of the light state is not a cause of degree of belief in the light state from the point of view of this system. Thus, the perception that the Lamp is off is an intervention on her Degree of belief that the Lamp is on, and so the perception that the Lamp is off at 6:00 cannot alter any of The psychological story has a problem: the view that degrees of belief, or changes in them, are causes seems incompatible with Bayesian learning from perception. Perception of the state of an ef fect should lead (by Bayesian updating) to changes in beliefs about the causes, but perception is an exogenous intervention in the stan dard reasoning network, and so breaks the connections between the effect and its causes.
Qualitatively, the agent's reasoning upon perceiving that the lamp is not lit at 6:00 goes something like this: "The lamp is not on, there fore the probability that the power is on is decreased and the prob ability that the switch is on is decreased; because the probability that the switch is on has decreased, the probability that the timer is on is also decreased." The unmentioned sensation is the initial cause of

III. COMBINING THE BAYES NETS
Human perception, we think, is often in part top-down, driven by prior conceptual structure and prior degrees of belief. For a Bayesian agent whose reasons are causes, the problems just discussed suggest that perception that accords or conflicts with a prior degree of belief should have a top-down contribution. In order for sensation to cause our imagined rational agent to form a new degree of belief that the lamp is lit, and to do so in a way that allows a Bayesian updating of the value of the agent's degrees of belief that the switch is on and that the power is on, the new degree of belief that the lamp is lit must be the collaborative, interactive effect of the values of her degree of belief that the switch is on, of her degree of belief that the power is on, and of the sensory input. The sensor input does not itself change the value of Degree of belief (Lamp = on), but rather it changes the degree of belief that the lamp is lit given the values of parents of Degree of belief (Lamp ? on)vs\ the Causal Bayes Net of Reasoning to a Forecast.
For different values of Degree of belief (Power = on) and Degree of belief (Switch = on), the input of sensation will result in different values of Degree of belief (Lamp = on), and so the intervention of sensation will not make the agent's Degree of belief that the Lamp is on independent of her other Degrees of belief.
Since reasoning goes from beliefs about circumstances to fore casts of perceptions, and from perceptual changes in belief to new beliefs about circumstances, it seems that the "reasons are causes" view requires a representation of the causal connections that like wise goes in both directions. It seems that we need, in other words, a cyclic causal graph among degrees of belief, with appropriate asso ciated probabilities. The problem is we do not know much about cyclic graphical repre sentations of causal relations, or how to update them by Bayes's rule, and what we do know is problematic for this view. In the scientific literature, causal Bayes nets are generally taken to be acyclic, but that is not strictly necessary. One can have networks with cycles, even with cycles that have edges in each direction between two variables. Probabilistic constraints that generalize the Markov property for acyclic networks still hold, necessarily, for linear cyclic systems, and can consistently be assumed for cyclic networks with variables that have a finite range of possible values. So we might consider whether the sensory input can nudge the degrees of belief in values of a variable, which nudges the degrees of belief in its parents, which nudges the degrees of belief in the variable again, which nudges... and so on, until an equilibrium is reached.
That is certainly possible, but there are two related difficulties: How can updating on evidence occur, and can it be Bayesian? Consider the second difficulty first, in the simplest case in which the variable that is directly influenced by sensory inputs, denote it by S, has a single parent variable, F. The idea is that the value of S causes the value of Yto be updated, which causes the value of S to be updated, and so on, until no more changes result. On the Bayesian perspective, each step in each direction, no matter how implemented in the brain, should result in updating one of the variables conditional on the currently updated value of the other, and we should therefore expect that at equilibrium the joint degree of belief in S and Y together should be the product of their conditional probabilities on each other: for all values of S and F, DOB(S,F) = DOB(S I F)DOB(FI S). But this equation implies that Fand 5are independent!7 Applied to our exam ple, upon learning that the lamp is not lit at 6, the agent's degrees of belief would then be altered in such a way that the degree of belief that the switch is on and the degree of belief that the timer is on have no relation to one another. We should not welcome such a theory.
Not only does a Bayes Rule requirement for updating lead to absurd results in cyclic networks, no correct updating algorithm is known for such systems and certainly no algorithm of the kind that neural systems plausibly implement for acyclic Bayes nets.8 Some other resolution is needed.

IV. DYNAMICS TO THE RESCUE?
The general problem is that the causal direction of influences of degrees of belief must go one way when forecasting, and the reverse 7Pr(S8cY) =Pr(S\ Y)Pr(Y\ S) = Pr(S8c Y)Pr(Y8c S) / Pr(Y)Pr(S) ==> Pr(Y)Pr(S) = Pr(Y8cS). direction when learning from experience, and the conditional prob abilities of the changes must be in phase. If we can presume that fore casting and learning are not simultaneous, then there is a Bayesian solution, using structures that are sometimes called "dynamical Bayes nets" but which are really the same sorts of structures we have con sidered so far, except that the variables?in this case degrees of All of the probabilities of conditional degrees of belief in this net work can be consistently estimated by Bayes's Rule, even while the values of the variables?the degrees of belief?are themselves deter mined, up to chance variation, by Bayes's Rule applied to the external evidence, setting the timer and sensation.9 So we have a solution in which reasoning is Bayesian almost all the way through, and reasons are causes. We do not know of any neural implementation of dynamic Bayes nets, and any neural realization that involves both forecasting and learning, rather than only visual recognition, will not be localized in the visual cortex. Verifying the hypothesis that, in humans, reasoning is Bayesian and reasons are