Uncomputability: the problem of induction

I argue that uncomputable formal problems are intuitively, mathematically, and methodologically analogous to empirical problems in which Hume's problem of induction arises. In particular, I show that a version of Ockham's razor (a preference for simple answers) is advantageous in both domains when infallible inference is infeasible. A familiar response to the empirical problem of induction is to conceive of empirical inquiry as an unending process that converges to the truth without halting or announcing for sure when the truth has been reached. On the strength of the analogies developed, I recommend the adoption of a similar perspective on uncomputable formal problems. One obtains, thereby, a well-defined notion of "hyper-computability" based entirely on classical computational models and on standards of success that have long been regarded as natural in the empirical domain.


Introduction: relations of ideas, matters of fact
Hume [6] held that all mathematical and logical reasoning is anchored in the mere 23 "relation of ideas" within the mind. According to Hume, ideas are surveyable in their entirety by a mind of su cient acuity, so that all a priori truths can be reduced to 25 mental inclusions (e.g., "unmarried" is part if the concept of "bachelor = unmarried male"). Hume conceived of empirical "matters of fact" quite di erently. To be certain 27 of an empirical law, one must already have seen all possible cases covered by the law at an instant, but only ÿnitely many instances are observed by a given stage of 29 with it. One response to Hume's problem [3,4,8,10,17,19] is to drop the tacit requirement that 5 successful inquiry must halt, ring a bell [9], or otherwise infallibly signal having found the right answer. Then inquiry can be said to succeed in the sense that it stabilizes, 7 eventually, to the right answer, possibly with some surprises and retractions of earlier answers along the way. In this manner, empirical inquiry can be both fallible and 9 truth-directed. One would prefer a scientiÿc method that is guaranteed to signal its arrival at the truth, but there is no such procedure for drawing general conclusions 11 from particular instances. In such cases, a weaker kind of success must be entertained if one is to speak of scientiÿc success at all. 1 13 Hume seems to assume that formal problems are infallibly solvable and that empirical problems are not, but neither claim is true in general. The empirical question 15 "will it rain tomorrow?" is decidable infallibly by waiting to see 2 and uncomputable formal problems are not infallibly solvable by algorithmic means. Indeed, infallibly 17 solvable empirical problems are quite analogous to computably decidable formal problems and empirical problems that have no infallible solution are strongly analogous to 19 uncomputable formal problems. That is no accident. In light of Turing's philosophical analysis of algorithmic computability in terms of Turing machines [21], algorithmic 21 unsolvability arises out of ÿniteness and locality conditions of the agent quite analogous to those that give rise to the problem of induction, e.g., the agent (the Turing 23 machine's read-write head) cannot scan or write on inÿnitely many tape squares in an instant or discriminate letters from an inÿnite alphabet (because the di erences would 25 end up being sub-microscopic), etc. My aim in this paper is to emphasize some analogies, intuitive, mathematical, and 27 methodological, between formal and empirical reasoning. 3 Intuitively, uncomputable problems like the halting problem seem to demand certainty that something will never 29 happen (e.g., that a computation will never halt) based only on a ÿnite run of experience (the computation has not halted yet). Mathematically, some well-known theorems in 31 the theory of computable functions can be interpreted as providing deep connections between formal and empirical reasoning. Methodologically, I argue that "Ockham's 33 razor", a systematic bias in favor of "simple" answers in the face of uncertainty, can 1 Alternatively, one can retain the halting condition if one substitutes some notion of "conÿrmation" or "coherence" for truth as an aim of inquiry. My view, for what it is worth, is that this "ersatz" approach gives up too quickly on truth and underestimates the problem of computing the "ersatz" conÿrmation or coherence relation. As I have discussed this issue at length elsewhere [11,12,15], I will not do so here. 2 Yes, empirical infallibility in such cases demands the exclusion of such philosophical doubts as being a brain in a vat that is fed neural stimulation giving rise to rain-like sensations. But one must also suspend doubts about properly following an algorithm or about the possible inconsistency of arithmetic in the formal case, so again the two cases are analogous. Infallibility is always relative to some restricted range of possibilities.
3 I have discussed analogies between computability and empirical reasoning before in several places including [10,11,[13][14][15]. On the basis of the preceding analogies, I recommend a convergent, defeasible per-3 spective on uncomputable formal problems as well as on inductive empirical problems. 4 This approach yields a natural concept of "hyper-computation" based entirely on classi-5 cal computational models. The basic idea is not new. It was pioneered by Putnam [19] and Gold [4] and has since then served as a fertile source of ideas within computational 7 learning theory. 5 2. The problem of induction 9 Hume freely admitted his debt to ancient skepticism. In the late middle ages, Buridan recounted the ancient argument for inductive skepticism as follows: 11 Let us assume... that from the will of God, whenever you have sensed iron, you have sensed it to be hot. It is sure that... you would judge the iron which you see 13 and which in fact is cold, to be hot, and all iron to be hot. And in that case there would be false judgments, [ unending input stream of discrete inputs coded as natural numbers. Let N N denote the set of all input streams. An empirical proposition says something about what this 21 inÿnite sequence will be like. For example, "always hot" says that the input stream will be an unending sequence of "hot" observations. Each empirical proposition is 23 identiÿed with the set of input streams of which it is true, and hence is a subset of N N . An empirical presupposition P is an empirical proposition that delimits the 25 range of possible input streams over which one would like to succeed. An empirical question is a countable collection of mutually exclusive empirical propositions called 27 potential answers to the question. If is an input stream satisfying an answer to , let ans ( ) denote the unique answer in that contains . Answers are inÿnite sets. 29 To give methods something concrete to output, let numbers be assigned to answers by an injective mapping : → N called the answer coding function for . An empirical 31 problem is a triple (P; ; ), where P is an empirical presupposition, is an empirical question whose potential answers cover P, and is an answer coding function for . It 33 is assumed that every input stream in the presupposition satisÿes some potential answer to the question. In Buridan's problem, the empirical presupposition is that the input 35 stream will consist entirely of observations of "hot" or "non-hot" and the empirical question is whether or not every observation will be "hot" (i.e., whether the input 37

ARTICLE IN PRESS
stream is the constant "hot" sequence). Code "hot" as 1 and "non-hot" as 0. Then the 1 presupposition of Buridan's problem is the set 2 N of all inÿnite Boolean sequences and the question is the partition = {{1}; 2 N − {1}}, where 1 is the unit constant 3 function. The question is binary, so let arbitrarily code {1} as 1 and 2 N − {1} as 0. So Buridan's empirical problem can be represented as the triple (2 N ; ; ). 5 An empirical method for problem (P; ; ) responds to each ÿnite, initial sequence of an input stream with some code number in the range of or with '?', which indicates 7 a refusal to choose an informative answer. Let N * denote the set of all ÿnite sequences of natural numbers. Thus, an empirical method for problem (P; ; ) is a map of type 9 M : N * → (rng( ) ∪ {'?'}). What makes an empirical method "empirical" is that it never gets to see the whole, inÿnite input stream at once; it only gets to see ever larger 11 initial segments and must "leap" from the current observations to some opinion whose truth may depend on what the tail of the input stream will be like for eternity. The 13 aim of guessing is, straightforwardly enough, to ÿnd the right answer. Say that method M solves empirical problem (P; ; ) in the limit just in case in each input stream 15 satisfying the presupposition of the problem, there is a stage after which each output produced by M along is the (unique) potential answer in satisÿed by . Notice 17 that this success concept requires only stabilization to the right answer. The transition from error to truth is silent. No bell or halting state certiÿes success when it occurs. 19 It is easy to construct a method that solves Buridan's problem in the limit: select the answer "always hot" until a "non-hot" input is encountered and switch to the 21 answer "not always hot" thereafter. This method not only converges to the right answer eventually; it is guaranteed, in the worst case, to retract its earlier views at most 23 once (when "always hot" is replaced with "not always hot"). In general, a retraction occurs when an informative answer is dropped for some distinct answer, informative or 25 uninformative. Retractions are the painful but unavoidable symptom of fallibility. But needless retractions are another matter entirely. It is desirable to minimize retractions 27 in the design of empirical methods in much the same way that computational time and space are routinely minimized in the design of computing strategies. 29 The method just described starts with "always hot", in the sense that the method's initial output is "always hot". When a method is guaranteed to succeed with one 31 retraction starting with h, say that the method is a refutation method for h. Such a method favors h over ¬h until some problem arises and then prefers ¬ h forever 33 after. Since the rejection of h cannot be "taken back" without another retraction, the rejection of h is analogous to the halting of a computation, for it certiÿes that the truth 35 has been found. If h is true, however, the truth has been "found" from the outset but is never announced by an infallible sign. That is the characteristic situation of successful 37 empirical science. Similarly, one can say that a method is a veriÿcation method for "not always hot", since it starts out with the denial of "not always hot" and switches 39 to "not always hot" when verifying evidence is received. A veriÿcation method for h is a method that solves the problem with one retraction starting with ¬h.

41
Finally, what Buridan's argument proves is that there is no veriÿcation method for "always hot" in the empirical problem he describes. For suppose there were one. As 43 a veriÿcation method, it starts with "not always hot". The method succeeds in each input stream of hot or non-hot observations, so feed it the constantly hot sequence 45 until it outputs "always hot" (which it must, on pain of failing to converge to the right 1 answer in the constantly hot input stream). That is one retraction. Now switch over to non-hot inputs, forcing the method to retract again to "not always hot", for a total of 3 two retractions. Contradiction. Buridan's God is merely a colorful personiÿcation of the preceding, mathematical construction of a possible input stream on which the learner 5 fails, a construction that depends on the intrinsic di culty of the problem rather than upon the actual presence of a malicious agent in nature. 7 For another example, let h = "there will be exactly one hot observation" and let the question be whether h is true. It is easy to solve this problem in the limit: output ¬h 9 until the ÿrst hot observation, output h until the second hot observation, and output ¬h forever after. This procedure neither refutes nor veriÿes either side of the question 11 because it retracts twice (starting with ¬h) if there are at least two hot observations. Furthermore, no possible method succeeds with just one retraction. For assuming that 13 the method succeeds, God (or nature) can present cold observations until the method converges to ¬h. Then God can present a hot observation followed by all cold obser-15 vations until the method retracts to h. One more hot observation forces a retraction to ¬h, for a total of at least two retractions. If the hypothesis in question is "there will 17 be exactly one or strictly more than two hot observations", then three retractions are required, and so forth. If it is presupposed that there will be at most ÿnitely many hot 19 observations and the question is how many there will be, the problem is not solvable under any worst-case retraction bound. Some problems are not even solvable in the 21 limit. Suppose you wish to know whether there will be ÿnitely or inÿnitely many hot observations. Buridan's God can show you hot observations while you say "ÿnitely 23 many" and non-hot observations while you say "inÿnitely many". Whichever answer you converge to, you are wrong and if you do not converge you also fail. 25 Slight modiÿcations of Buridan's problem give rise to increasingly complex problems empirically solvable in successively weaker senses. The harder problems do not merely 27 embody the problem of induction; they involve nested problems of induction, of which retractions are merely the painful, outward sign. In the base case, there are problems 29 that require no retractions at all: these are the empirical problems that involve no problems of induction at all and that can, therefore, be answered with infallibility, in 31 analogy to solvable formal problems.
3. The "problem of computation" 33 It sounds perfectly natural to speak of the "problem of induction" (the impossibility of a veriÿcation procedure for many scientiÿc questions) but one rarely, if ever, speaks 35 of "the problem of computation" (the impossibility of a veriÿcation procedure for many formal questions). And yet, the two situations are quite similar, both on the face of 37 it and at a deeper, structural level. In this section, I describe how formal problems give rise to degrees computational unveriÿability matching the degrees of empirical 39 unveriÿability discussed in the preceding section.
To begin with, a formal question is just like an empirical question except that 41 the possible input streams are replaced with single, numerical inputs. More precisely,

ARTICLE IN PRESS
a formal problem is a triple (P; ; ), where P is a subset of N called a formal 1 presupposition, is a partition of P called a formal question, and is an injective assignment of code numbers to answers in called an answer coding function. 3 Formal problems present single number inputs that can be "received" all at once, whereas empirical problems present inÿnite sequences of numbers that can only be 5 "received" in a piece-meal fashion. So far, Hume's position seems right: in formal problems you have the input and you have the concept to be applied to it, so you 7 merely have to focus your "mind's eye" on the two of them to see with mathematical certainty whether the concept applies to the input. In empirical reasoning, infallibility 9 may be impossible because you never see the whole input stream all at once. But the dichotomy between infallible formal reasoning and fallible empirical reason-11 ing was already questionable in Hume's day. If full "clarity and distinctness" could be achieved on each input, then formal reasoning would, indeed, always terminate with 13 certainty. But if full "clarity and distinctness" is never achieved on some inputs, and if the process of achieving it has bumps and surprises along the way, one may as 15 well think of formal reasoning as an ongoing, fallible process analogous to empirical inquiry [16]. 17 The theory of computability underscores the preceding point with mathematical precision. In the familiar halting problem, the input domain is the set N of all natural 19 numbers and the question is whether the Turing machine with code number n eventually returns an output when started on input n. Let K denote the set of all n for which 21 the answer to the question is a rmative. Let (K) = 1 and (N − K) = 0. Then the halting problem is the triple (N; {K; N − K}; ). In the ensuing discussion, I will use 23 ¬K as an abbreviation for N − K.
Intuitively speaking, the di culty posed by the halting problem is empirical. When 25 an algorithm takes a long time to return an answer, one begins to suspect that the algorithm will never terminate, but how can one be sure? No amount of waiting yields 27 certainty that the computation will never halt, any more than it can result in certainty that every observation will be "hot". 29 This empirical argument falls short of a proof, however, for perhaps the achievement of clarity and distinctness with respect to the input and the concept to be applied to 31 it involves something more clever than just sitting around and waiting for a simulated computation to halt-after all, you already have the program and the input and every-33 thing about the computation is mathematically determined by this pair. As it happens, such means are bound to fail, but the usual proof of this fact is a static, diagonal 35 argument with more a nity to Cantor than to Hume. The strong impression that the halting problem involves something like the problem 37 of induction is vindicated, however, by an alternative proof strategy that looks quite similar to Buridan's argument for inductive skepticism. 7 As in Buridan's argument, 39 the proof shows that there is no computable veriÿer for non-halting (i.e., for ¬K), in the sense of a Turing machine that eventually halts with 1 (i.e., "yes") if and only if 41 it is provided with the index of a machine that does not halt on its own index. The "fooling" strategy of Buridan's God can be implemented against a would-be veriÿer 43 behavior, since the manner in which M d produces its input-output behavior is irrelevant to the membership of d in K. Let M u be the universal Turing machine, 8 which has the 29 property that for each Turing machine index i and input y; M u (i; y) returns the output (if any) of the computation M i (y). Recall that m is the index of the Turing machine 31 we wish to "fool". Hence, the partial function (y; x) ≈ u (m; y), 9 whose value is the result of ignoring x and passing along the result of simulating M m on input y, is Turing 33 computable. To complete the construction, one must show that there exists a Turing passes along M m (d), the ÿnal response (if any) of would-be veriÿer M m of ¬K. To obtain such a d, apply the s-m-n theorem to obtain a total, computable function s such 37 that s(y) (x) ≈ (y; x). Then by the Kleene recursion theorem, there exists a Turing index d such that s(d) = d . 39 8 The basic computability results cited in this paragraph are presented in many texts on the theory of computable functions. A nice, elementary source is [2]. 9 The relation ≈ signiÿes that either both functions have the same deÿnite value or that both functions are undeÿned.

ARTICLE IN PRESS
Although ¬K is not computationally veriÿable, ¬K is computationally refutable (in 1 the sense that some Turing machine halts with "no" on input n i n is in K). On input n, simply simulate the computation of M on input n until the computation halts and 3 halt with "no" when it has done so. So just like "all observations are hot", the formal question "the computation of M on input n never halts" is computationally refutable 5 and not veriÿable. The preceding connection between formal and empirical reasoning is strengthened by 7 allowing Turing machines to output successive answers on an output tape in response to a given input without ever halting. One can then redeÿne computational veriÿcation 9 just as in the empirical case, as convergence to the right answer with at most one retraction starting with "no", and similarly for refutation. This is equivalent to the 11 usual deÿnition of veriÿability in terms of halting. 10 Say that a Turing machine solves a formal problem in the limit just in case the machine converges to the right answer 13 eventually, no matter which possible input is provided. Formal problems can also involve close analogues of the "nested" problems of in-15 duction mentioned earlier. Let K 1 denote the set of all Turing machine indices i such that the computation of M i on input i returns exactly one output (in sequence, according 17 to the convention described in the preceding paragraph). There is an obvious method for solving K 1 with 2 retractions starting with "no": just simulate the computation M i 19 on input i. Say "no" until an output is produced, say "yes" until a second output is produced and then say "no" forever after. Also, an extended skeptical argument shows 21 that two retractions are not enough starting with "yes". Let d be a Turing machine index that feeds itself to the given machine M and refuses to produce any outputs until 23 the computation of M retracts to "no". Then M d writes one output on its output tape and refuses to write any more outputs until M retracts to "yes". Finally, M d writes 25 another output, forcing M to retract again to "no", for a total of three retractions. Putnam [19] noticed the analogy between such formal predicates and the problem of 27 induction and referred to formal predicates that can be decided with n retractions as n-trial predicates. The theory of such predicates is tidier if one also keeps track of 29 whether the ÿrst output is "yes" or "no", as in the empirical case [11]. Next, suppose you know in advance that you will be given only indices of machines 31 that produce at most ÿnitely many outputs and the question is how many outputs a given machine will produce. The obvious method is to count the current number of 33 outputs of the simulated computation. That method does not succeed under any ÿnite retraction bound, but no method possibly could. For let M aspire to succeed with k 35 retractions. Index d can feed itself to M and elicit M to k + 1 retractions by the preceding recipe. Since d produces at most k + 1 outputs, it satisÿes the problem's 37 formal presupposition.
To obtain a formal problem that is not even computably solvable in the limit, ask 39 whether the given index gives rise to an inÿnite sequence of outputs (just what was presupposed in the preceding example). To see why, mimic the empirical, skeptical 41 argument presented in the preceding section, using the Kleene recursion theorem to 1 achieve self-reference in the manner illustrated in the preceding examples. Thus, each of the empirical problems in the preceding section has been shown to have an analogue 3 in the formal domain that is solvable in a closely analogous, fallible sense.

Topological complexity 5
The structural features of empirical problems that are responsible for the problem of induction are neither logical nor probabilistic, but topological. there exists an empirical method of some sort that halts with "yes" if the proposition is true and that always says "no" otherwise. 19 The set W is the vacuous proposition that is true in all possible worlds. This proposition is trivially veriÿable (say "yes" no matter what). The contradictory proposition ? 21 is veriÿable (say "no" no matter what). Finite conjunctions (intersections) of veriÿable propositions are veriÿable (wait for a "yes" for each conjunct before returning "yes") 23 and an arbitrary disjunction (union) of veriÿable propositions is veriÿable (wait for a "yes" for at least one disjunct before returning "yes"). Hence, the veriÿable proposi-25 tions are the open sets in a topological space on W , which may be called veriÿability space. Furthermore, axiom (3) cannot be strengthened to arbitrary intersection under 27 this interpretation, for suppose you have an inÿnite conjunction of veriÿable propositions. The respective veriÿcations could arrive at ever later times, so there is no time by 29 which you can be sure that all of the conjuncts are veriÿed (the problem of induction). So the striking asymmetry between axiom (3) and axiom (4), which is characteristic of 31 all topological reasoning, is a re ection of the problem of induction. Topology is often thought of as "plastic geometry". It is equally, if not more generally, the mathematical 33 theory of ideal veriÿability.
Here is another way to make a similar point. to the empirical presupposition P. This is a widely studied topological space [5] called the Baire space restricted to P. Now it can be proved (rather than intuitively assumed, 9 as in the preceding paragraph) that open sets are veriÿable: wait until the input stream extends a basis element contained in the open set before saying "yes". The converse 11 can also be proved: if h is veriÿable, then h may be expressed as the union of all basis elements corresponding to ÿnite input sequences on which the method says "yes". 13 Dually, the refutable propositions are exactly the closed propositions and the decidable (veriÿable and refutable) propositions are the clopen (closed and open) propositions. 15 A limit point of ¬h in the restricted Baire space is an input stream whose ÿnite initial segments can always be extended to input streams in ¬h. So if a limit point of 17 ¬h happens to satisfy h, then h is true but inputs never guarantee the truth of h, which is again the problem of induction. So the problem of induction arises, topologically 19 speaking, precisely when the actual input stream is a limit point of a false answer. This happens exactly when the actual world is on the boundary of at least two answers (i.e., 21 it is a limit point of both answers). So the problem of induction is the problem of boundary points. 23 Veriÿcation and refutation make sense only with respect to a ÿxed hypothesis h. More generally, an empirical problem (in the sense deÿned above) is solvable with zero 25 retractions i each answer is open: just wait for veriÿcation of a potential answer. Since the answers constitute a partition of the presupposition, it follows that each potential 27 answer is also closed, or clopen for short. So the easily solved empirical problems can be characterized in terms of the topological structure of the problems themselves. 29 The idea generalizes to problems requiring k retractions. The di erence complexity It follows that a problem is solvable with k retractions i it has di erence complexity k. 11 11 Given a method that succeeds with k retractions and given i6k, let S i be the set of all input streams on which the method retracts at least k − i times. Then S i is open because it is a union of basic open sets, so (1) is satisÿed. Also, the method retracts along an input stream in S i − S i−1 exactly k − i times. Since the method succeeds, the answer output by the method after the k − ith retraction is true. So answer A is deÿnable within S i − S i−1 as the set of all input streams on which method M produces A after retraction k − i, which is a union of basic open sets in the restricted space S i − S i−1 , so (2) is satisÿed. Conversely, suppose that the di erence complexity of (P; ; ) is k. Let method M output answer A i A is veriÿed given S i − S i−1 , where i is least such that S i is veriÿed by the current inputs (recall that S −1 = ? by convention). of some answer A in ; 2. input stream is a k +1-interior point of (P; ; ) i is an interior point of some 19 answer A in in the problem that results when P is restricted to input streams that are not k-interior points of (P; ; ); 12 21 3. input stream is a k-limit point of (P; ; ) i is not a k-interior point of (P; ; ). 23 It follows that ÿnite sequence (S 0 ⊆ · · · ⊆ S k ) witnesses that (P; ; ) has di erence complexity 6k i for each i6k, S i contains only i-interior points of (P; ; ). Hence, 25 a problem is solvable with k retractions i it contains only k-interior points. So the k-limit points are the input streams in which one faces at least a k + 1-fold problem 27 of induction: removing them from the problem results in a problem solvable with just k retractions. 29 Since k-limit points are where k-fold problems of induction are faced, it is worth taking a closer look at them. A 0-limit point is just an input stream satisfying an answer 31 A such that no matter how much you have seen, there is an input stream satisfying some other answer compatible with what you have seen already. This is just a single 33 problem of induction, as in Buridan's example of inferring "always hot". A k + 1 limit point is an input stream satisfying an answer A such that no matter how much you 35 have seen, there is a k-limit point satisfying some distinct answer compatible with what you have seen already. For example, the input stream in which no hot observations are 37 seen is a 2-limit point in the problem in which it is known that the color will change at most two times and the question is how many times. 39 Solvability in the limit has its own topological characterization. A 0 2 Borel set is a countable union of closed sets. An empirical problem is solvable i each answer is 0 2 41 12 The concepts k-limit point and k-interior point can be extended by transÿnite induction over an extension of the ordinals giving rise to a transÿnite version of the relationship between retractions and empirical complexity [12].  [10]. This condition is equivalent to saying that there is an inÿnite, increasing, nested, 1 !-sequence (S 0 ⊆ · · · ⊆ S k : : :) of open sets such that 1.

ARTICLE IN PRESS
∞ i=0 S i = P and 3 2. for each i, each potential answer in is open in the restricted space S i+1 − S i . Recall that the ÿnite retraction characterization is the same, except that the nested 5 sequence of open sets is ÿnite, which provides a nice, structural insight into the difference between the two cases. 7

Formal complexity
Closely analogous concepts of structural complexity apply to strictly computational 9 problems. Say that a function is computable over restricted domain P i there exists a Turing machine M that returns f(x) for each input x in P. Now deÿne that the 11 e ective di erence complexity of formal problem (P; ; ) is 6k i there exists a ÿnite, increasing sequence (S 0 ⊆ · · · ⊆ S k ) of recursively enumerable sets such that 13 1. There is a point of disanalogy, however, for e ective di erence complexity admits 23 of no point-wise characterization. Recall that an empirical problem has di erence complexity exceeding k i the problem contains a k-boundary point. Hence, adding a single 25 k-boundary point to a problem solvable with k retractions makes the problem intrinsically harder. But there is no single input one can add to a formal problem to make it 27 intrinsically harder. For any single input n that is added to formal presupposition P in 13 Suppose Turing machine M solves (P; ; ) with k retractions. Then let S i denote the set of all n such that M retracts at least k − i times on input n. Let x be in S i − S i−1 , where i6k. Then M retracts exactly k − i times on input x. Let f i (x) be the k − ith output of M on input x. Observe that f i is computable (using M as a subroutine) and the domain of f i covers S i − S i−1 . Also, since M succeeds with k retractions, f i agrees with ( • ans ) over the restricted domain S i − S i−1 . Conversely, let (S 0 ⊆ · · · ⊆ S k ) witness that the e ective di erence complexity of (P; ; ) is no greater than k. Since each such S i is recursively enumerable, let Turing machine M i formally verify membership in S i . Also, for each i¡k, let Turing machine L i compute ( • ans ) over restricted domain S i − S i−1 . On input x, let the computation of M on input x proceed in stages as follows. At stage n, let m be the least i such that M i halts on input x with output 1 within n steps of computation. Then return the result of the computation of Lm on input x. By construction, M retracts at most k times on input x. Let x be in P. To see that M converges to the right answer on input x, let i be the unique value such that x is in S i − S i−1 . Eventually, a stage n is reached at which M i (x) returns 1 in n steps of computation. Thereafter, M (x) returns the result of the computation L i (x), which is the correct answer ( • ans )(x). a formal problem (P; ; ) solvable with k retractions, there is some Turing machine 1 that employs a rote "lookup table" to associate n with the right answer for n and that passes control to a k-retraction solution to the problem. 3 Formal problems that are not solvable under any retraction bound are also structurally analogous to empirical problems with the same property, for a formal problem 5 with ÿnitely many possible answers is solvable in the limit i each answer is a 0 2 arithmetical set, where such a set has form 7 where R is a recursively enumerable set and W i is the (recursively enumerable) do-9 main of i . In other words, countable unions of closed sets in the empirical picture are replaced with r.e. unions of complements of recursively enumerable sets in the 11 formal picture. In general, Borel complexity in topology is analogous to arithmetical complexity in the theory of computability. This analogy is another familiar theme in 13 descriptive set theory [18].

Index problems 15
The analogy between formal and empirical reasoning is tighter still if one focuses on a special collection of formal problems sometimes referred to as index problems. 14 17 An index problem is a formal problem in which the natural number input is viewed as the index of a Turing machine and the question posed concerns only the input-19 output behavior of the machine indexed by the numerical input. Equivalently, an index problem is a formal problem in which no two numbers that index the same partial 21 computable function satisfy distinct answers. An index problem is non-trivial i its question has at least one answer that is neither 23 N nor ?. Rice's theorem [2] says that no non-trivial index problem is e ectively solvable without retractions. The theorem can be proved by means of a "skeptical 25 argument". Let (P; ; ) be a non-trivial, index problem. Since the problem is an index problem, all indices for the everywhere undeÿned function ? are in some answer A in 27 . Since the problem is also non-trivial, there is some distinct function whose indices are all in some distinct answer B in . Let M be a would-be decision procedure for 29 (P; ; ). On an arbitrary input x, let the "fooling strategy" M d simulate the computation of M on input d (via Kleene's recursion theorem) until such time as M returns the 31 unique answer true of all indices for the everywhere undeÿned function. Thereafter, the fooling strategy returns the result of simulating a program for on input x. In short, 33 M d refuses to produce any outputs until M becomes sure that M d will never produce any outputs, and then produces outputs in accordance with (note the analogy to 35 Burdian's skeptical argument determining some feature of the input-output behavior of a given index is to perform "computational experiments" on the indexed program, running it for various amounts 5 of time on various inputs to see what sorts of outputs are produced. 15 A much more sophisticated approach would involve some formal analysis of the code of the pro-7 gram, itself. The Rice-Shapiro theorem [2] is the remarkable claim that a computational agent can determine no more about the input-output behavior of an arbitrary 9 program by looking at the program than it could by performing computational experiments on it, treating it as an otherwise unknown "black box", for if an empirical agent 11 could not verify the input-output property from experiments, then no amount of effective analysis of the code could formally verify the same property over arbitrary 13 Turing machine indices. This is not a mere analogy: it is a deep and striking mathematical relationship between empirical and formal reasoning that holds for all index 15 problems. Some topological concepts are required to state the theorem precisely. Let (P e ; e ; e ). The Rice-Shapiro theorem can be proved by means of yet another skeptical argu-31 ment. Suppose that (N; ; ) is an index problem with possible answer S. Suppose, further, that the set of functions whose indices are all in S is not open in the topo-33 logical space just described in the preceding paragraph. Then there exists ∈ such that is a limit point of the complement of (with respect to the space of all partial 35 computable functions). In other words, ( * ) each ÿnite subfunction of is extended by some partial computable function in the complement of . 37 Consider the case in which some ÿnite subfunction Â of is also in . Then by ( * ), some partial, computable extending Â is not in . Implement a "fooling strategy" M d 39 for an arbitrary, would-be formal veriÿer M of S, as follows. Now consider the alternative case in which every ÿnite sub-function of is in the complement of (so that , itself, is inÿnite). Implement a "fooling strategy" M d for 5 an example because the Kleene recursion theorem may produce a tricky index that is not in P.

21
The Rice-Shapiro argument generalizes to index problems requiring k retractions in the following way: if (N; ; ) is formally solvable with k retractions, then (N e ; 23 e ; e ) is empirically solvable with k retractions. So iterated problems of induction give rise to iterated formal retractions in the corresponding formal problem. The proof 25 iterates the two cases of the Rice-Shapiro theorem. Suppose that (P e ; e ; e ) has di erence complexity ¿k. Then (N e ; e ; e ) has a partial recursive k-limit point . When 27 k = 0, it follows that is a limit point of some answer A e . The two cases of the proof of the Rice-Shapiro theorem now arise: either each ÿnite sub-function of satisÿes 29 a distinct answer, or some ÿnite sub-function Â of satisÿes A e , in which case some proper extension of Â satisÿes a distinct answer, since is a limit point of a distinct 31 answer. In either case, a fooling strategy can be constructed. When k¿0, no ÿnite sub-function of is a k − 1-interior point, for if it were, then would be as well. 33 Hence, the set of all input streams of complexity greater than k − 1 includes all ÿnite sub-functions of . Again, either none of these functions satisfy A e or one of them 35 does. If none does, construct a fooling strategy that pretends to be until M concludes A (the set of all indices of functions in A e ) and that pretends to be a ÿnite sub-function 37 Â of until M concludes the (distinct) answer satisÿed by Â. After that, since Â is itself a k − 1-limit point, the induction hypothesis guarantees that control can be passed 39 to a fooling strategy that achieves another k retractions, for a total of k + 1. If some ÿnite sub-function Â of satisÿes A e , then since is a k-limit point of some answer 41 incompatible with A e , it follows that among k − 1-limit points, there exists a k − 1limit point satisfying a distinct answer from A e that extends Â. A fooling strategy 43 can pretend to be Â until M concludes A e and can pretend to be until M retracts to the incompatible answer satisÿed by . Since is a k − 1-limit point, the induction 45

ARTICLE IN PRESS
hypothesis says that control can be passed to a fooling strategy that achieves k more 1 retractions, for a total of k + 1.

Empirical simplicity and Ockham's razor 3
A characteristic feature of empirical science is Ockham's razor, a preference for simple theories when several competing theories account for the current data. But why? 5 There is no shortage of explanations: we like simplicity, simpler theories are easier to understand or compute with, simple theories explain better or are easier to cross-check, 7 etc. But such arguments are instances of wishful thinking, for the simplest theory might be false, regardless of our good reasons for wishing it to be true and the task of science 9 is to ÿnd the truth, not to varnish it. If one prefers the simplest theory because one knows in advance that the world is simple, then the complex alternative theories are not 11 really alternatives after all and the empirical question is trivial (it has just one possible answer).
If the simplest answer is assumed to be more a priori probable than the other 13 answers, then the other answers probably are not real alternatives. If one prefers the simplest theory because it is better "conÿrmed or supported" than the other theories, 15 the question arises afresh: what do "conÿrmation" or "support" have to do with ÿnding the true answer? If one uses the simplest theory to accurately predict new observations 17 even when we know that the simplest theory is false (as in linear regression), then one concedes that Ockham's razor is opposed to ÿnding the true theory. In each case, 19 it is hard to see how Ockham's razor could serve the interest of ÿnding the truth. The connection between truth and simplicity is arguably the most fundamental puzzle in 21 the philosophy of science and induction.
Here is an answer to the conundrum that ÿts with the convergent perspective on 23 inquiry discussed above: choosing the simplest theory compatible with experience is necessary if we are to minimize the number of times we retract earlier answers en 25 route to the truth in the worst case (which, incidentally, will be a complex rather than a simple world) [12]. Hence, simplicity does not indicate the truth (the world 27 may be complex and may even probably be complex) but simplicity nonetheless helps us to ÿnd the truth in the sense that any other bias results in avoidable, worst-case 29 ine ciency en route to the truth. For a very rudimentary illustration of the argument for this claim, suppose you know 31 that there are at most three golf balls in a box and the question is how many balls there are. Each ball is exhibited, without replacement, at some time of Nature's choosing. 33 There are four intuitive senses in which "no balls are in the box" is the simplest of the three possible answers to this problem. First, it involves the least existential 35 commitment of all the answers, since it posits no balls. Indeed, Ockham's original statement of his principle was to not multiply entities beyond necessity. Second, it is 37 most uniform, in the sense that it is satisÿed only if no ball ever appears, whereas the other answers imply a mixture of "no new ball" experience with "new ball" experience. 39 Third, it is the most testable, in the sense that it is empirically refutable (if it is false, the appearance of a ball eventually establishes this fact) but the other answers are not 41 refutable (e.g., "one ball" is false in the zero-ball world but is consistent with any ÿnite amount of ball-free experience). Fourth, it has fewer free parameters than the 1 other potential answers. If there is no "hot" observation, then there is no question as to when hot observations occur, but if there is a hot observation, it must occur at some 3 time t 1 and if there are two, they must occur at distinct times t 1 ; t 2 , etc. Now suppose that a method prefers an answer other than "no balls" prior to seeing 5 any balls. Nature can continue to exhibit ball-free experience until the method concludes "no balls" on pain of converging to the wrong answer. Then nature can present a ball 7 followed by ball-free experience until the method concludes "one ball", etc., for a total of four retractions. But an alternative method succeeds with at most three retractions 9 in the worst case: just output "n balls", where n is the number of balls seen so far. That method follows Ockham's razor at each stage. 11 The di erence complexity of an answer to a problem can be deÿned as the greatest k such that the answer contains a k-interior point. The answer "no balls" has di erence 13 complexity 3, the answer "one ball" has di erence complexity 2, and, in general, the answer "n balls" has di erence complexity k −n, where k is the known upper bound on 15 the number of balls. So simpler answers (in the intuitive, scientiÿc sense) have higher di erence complexity in the topological sense. More generally, one may think of di er-17 ence complexity degrees as degrees of empirical simplicity. Such intuitive re ections of simplicity as uniformity of experience, minimal existential commitment, testability, 19 and fewer independent parameters tend to line up with high di erence complexity in a given empirical problem. 21 Ockham's razor is vaguely understood to be a preference for the simplest hypothesis compatible with current experience. However, in the sense just deÿned several answers 23 can have the same, maximum simplicity degree. In such cases, the proposed version of Ockham's razor says that one may not select an answer unless it is currently the unique 25 answer of maximum complexity. Intuitively, this makes sense: if several hypotheses are simplest, simplicity cannot guide the choice among them. 27 Suppose that the maximum simplicity degree (i.e., the problem's di erence complexity) is n and that method M violates Ockham's razor by choosing a hypothesis that 29 is not uniquely simplest, among hypotheses compatible with experience so far. Then nature can continue to present inputs compatible with some distinct, simplest answer 31 A until M converges to A on pain of converging to the wrong answer, which counts as one retraction. Thereafter, Nature can exact n more retractions as before, for a total 33 of n + 1. Furthermore, an obvious method that complies with Ockham's razor succeeds in 35 each case with no more than n retractions. The method outputs the (unique) answer A veriÿed relative to the assumption that the world has simplicity degree = k, where 37 k is least such that it is currently veriÿed that the world has simplicity degree 6k. This method retracts only when it is veriÿed that the world has a lower simplicity 39 degree than previously thought, and hence retracts at most n times. It converges to the right answer because eventually it is veriÿed that the world has simplicity degree 6k, 41 where k is the true simplicity degree, and then the true answer is veriÿed relative to the assumption that the world has simplicity degree = k. 43 It follows from the two preceding paragraphs that for each problem of ÿnitely bounded di erence complexity, violating Ockham's razor on the initial conjecture 45 outputs the argument is similar: violating Ockham's razor then results in a needlessly high, worst-case bound on retractions in the sub-problem faced from that point 3 onward. 16 A paradigmatic application of Ockham's razor is curve ÿtting. Suppose you know 5 that the curve to be ÿt is a polynomial of degree no greater than three. The question is to guess the polynomial degree, where it is known that the true curve has degree 63. 7 That is not very hard: two points determine a line, three points a quadratic, four a cubic, etc. But the game is more interesting when the data points may contain error. Consider 9 a simpliÿed version of curve-ÿtting in which the method may query any data point and the data points may involve less than ¿0 error. In this problem, the polynomial 11 degrees run in inverse order to simplicity degrees, so that the answer "cubic" has simplicity degree zero, the answer "quadratic" has simplicity degree one, and so forth. 13 Suppose that the data points seen so far are all closer than to some constant c and that a given method violates Ockham's razor by saying that the degree of the true 15 function exceeds zero. Nature can present data within of c forever until the method converges to "degree zero". Thereafter, Nature can choose a slightly inclined line that 17 still saves all the data presented so far to within and can then continue to present data from the line, etc. for a total of four retractions. The Ockham method that always 19 sides with the simplest hypothesis compatible with experience requires at most three. 17 In the preceding examples, the size of the box and the a priori bound on the degree 21 of the unknown polynomial are necessary to arrive at a ÿnite bound on the number of retractions required. Neither problem is solvable under any transÿnite retraction bound 23 according to the theory just mentioned, so the preceding argument for Ockham's razor does not apply. However, there is still a sense in which Ockham's method is best 25 [20]. Think of a method as "accepting" an answer when it outputs that answer and as rejecting the answer when it outputs any alternative answer. Then we can view a 27 method for a problem as a test for any given answer to the problem. It is then desirable that the method decide each answer in the limit with the fewest possible retractions. 29 In both the ball counting problem and the curve ÿtting problem, a method minimizes worst-case retractions in each subproblem of each decision problem determined by an 31 answer to the original problem only if it conforms to Ockham's razor at every stage.

Ockham's formal razor 33
It sounds odd to entertain desperate, empirical "guessing" rules like Ockham's razor in purely formal contexts, but the preceding analogies between uncomputability and 35 the problem of induction suggest a second look. In the empirical domain, Ockham's razor is a preference for the uniquely simplest 1 answer compatible with the inputs. Recall that the simplicity degree of an answer to an empirical question is explicated by the answer's di erence complexity. Similarly, 3 the methodological simplicity of an answer to a formal question is explicated by its e ective di erence complexity. Let (P; ; ) be a formal problem of e ective di erence 5 complexity k and let A be an answer in . Deÿne the e ective di erence complexity of answer A in (P; ; ) to be the greatest j such that for each sequence (S 0 ⊆ · · · ⊆ S k ) 7 of recursively enumerable sets satisfying conditions (1) and (2) in the deÿnition of e ective di erence complexity, A − S k is non-empty. This is quite analogous to the 9 deÿnition given in the empirical case. Ockham's razor is a rule for choosing among several possible answers compatible 11 with current experience, but in formal problems at most one answer is compatible with a given input, so it seems that Ockham's razor is gratuitous. What is intended, 13 of course, is that a Turing machine prefer the simplest answer compatible with the machine's "internal" experience on the path toward "clarity and distinctness", but that 15 is a tricky concept to deÿne in general. It makes sense for Turing machines of a certain kind (those that explicitly simulate di erent computations for ever greater numbers of 17 computational steps or that seek ever longer proofs of contradictions), but not for arbitrary Turing machine programs, most of which are unintelligible. 19 An alternative, more "behavioristic" statement of Ockham's razor is that one should never output a simpler answer after a more complex answer has been output, where 21 simplicity of answers can be deÿned in terms of e ective di erence complexity as was done in the empirical case. In the empirical case, this follows from the usual deÿnition, 23 assuming that the method converges to the truth at all, for if one at some point chooses an answer more complex than the data require, then there exists a (simple) way of con-25 tinuing the data such that a convergent method must shift back to the simpler answer. The converse holds as well, if one assumes, further, that the method never produces 27 an answer that has already been refuted. Neither argument works for formal reasoning, but one can simply stipulate the new statement of Ockham's razor in formal problems. 29 For an easy illustration, recall the problem in which it is known in advance that the input is an index of a Turing machine that produces at most three sequential outputs 31 and the question is how many sequential outputs will be produced. Let M be a Turing machine that solves the problem in the limit. Let d be the index of a tricky Turing 33 machine that refuses to produce an output until M says "no outputs", that produces one output until M says "one output", etc. The simplest answer is "no outputs".

ARTICLE IN PRESS
to the simplest answer thereafter. Otherwise, M follows Ockham's razor. As long as 1 the problem requires at least one retraction in the worst case, M succeeds under the optimal retraction bound in spite of violating Ockham's razor on some input. This is 3 yet another consequence of the possibility of lookup tables. For a striking example of the analogy between formal and empirical reasoning, con-5 sider a purely formal version of empirical curve ÿtting. Think of a total, computable function f as a map g from rationals to rationals by decoding naturals as pairs and in-7 terpreting pairs as rationals. It is assumed in advance that for some polynomial function h of degree 63, g(x) is always closer than ¿0 to h(x) (think of this as observational 9 error). The question is to determine the least polynomial degree k63 such that for some polynomial function h of degree 63, g(x) is always closer than ¿0 to h(x). 11 Since the index of f determines everything about g, it "gives away" the answer to the question once for all, but to a computational agent the problem is similar to 13 the empirical one. The tricky index d for this problem pretends to be for a constant function with error ¡ until M gives in and reports that d is a constant function. 15 This is accomplished by producing some constant, say zero, on input x if M does not output "constant" on input d in n computational steps. After M says "degree zero", d 17 pretends to be a linear function with non-zero slope until M gives in and believes it is a linear function, etc, to exact a minimum of three retractions from M . Now, suppose 19 that M outputs a higher polynomial degree than zero before saying degree zero. Then M retracts four times on d when the obvious Ockham method succeeds with at most 21 three retractions in the worst case. If no upper bound on polynomial degree is known, one still obtains the result that Ockham's razor is necessary if a single computational 23 method is to decide each answer with a minimum of retractions.
The preceding e ciency arguments assume that the violation of Ockham's razor 25 consists of a complex guess followed by the simplest possible guess. What goes wrong if the method violates Ockham's razor by saying "n + 1 outputs" prior to "n outputs", 27 where "n + 1 outputs" is not the simplest guess? In analogy with the empirical case, one may hope that there are sub-problems of the original problem that could have been 29 solved with fewer retractions had the method not violated Ockham's razor on some (tricky-for-the-method) inputs. In the preceding example, let the sub-problem be all 31 indices that produce at least one output. Let M be given. A tricky index d can be constructed that produces one output right away and then continues with the preceding 33 strategy to exact at least two retractions from M if M solves the sub-problem. But if M ever precedes the answer "one output" with some more complex answer in response to 35 d, M will retract at least three times even though the sub-problem is formally solvable with just two retractions. In this way, Ockham's razor applies across time in formal, 37 as well as in empirical problems.

Conclusion: Hume and hypercomputability 39
Formal problems and empirical problems are not exactly the same. In the former, the right answer is determined by what is "given" and in the latter it usually is not. In the 41 former, performance can always be augmented to a ÿnite degree by means of lookup tables and in the latter it cannot (one cannot tell, without seeing the future, whether the 1 actual, empirical world is a world listed on the table). Philosophical tradition has seized upon such di erences to draw a sharp boundary between formal reasoning concerning 3 mere relations of ideas and empirical reasoning concerning matters of fact. The former can supposedly be made infallible by a process of mental rigor guaranteed to terminate 5 in clarity and distinctness; the latter cannot be infallible, since the right answer is never determined by any ÿnite number of inputs. 7 I have argued for an alternative view, according to which uncomputability is an "internalized" problem of induction. True, the input is given all at once in a formal 9 problem and the input ideally determines the right answer, but the input is not fully taken until the computational agent's journey toward full "clarity and distinctness" (i.e., 11 its computation) is complete. In uncomputable problems, the process never comes to full fruition, just as empirical inquiry never halts with infallible knowledge of universal 13 laws.
One would like a bell to ring when inquiry has succeeded. Weaker senses of success 15 are tolerated in empirical science only because bells that signal success are infeasible.
In light of the many detailed parallels between the problem of induction and uncom-17 putability, a parallel weakening of standards is warranted in the formal domain. If a formal problem is not decidable, perhaps it is veriÿable or refutable. If neither of those 19 success concepts is feasible, then perhaps it is defeasibly solvable with no more than n retractions. If there is no such bound n, then perhaps it is decidable in the limit, etc. 21 If empirical science can be said to progress toward the truth in spite of the problem of induction, then ordinary Turing machines can be said, on closely analogous grounds, 23 to progress fallibly toward the truth in spite of uncomputability. The literature on "hyper-computation" aims at an expanded but plausible sense of 25 computability according to which Turing-uncomputable problems are solvable. There are two paths to this end. Most directly, one can try to "power-up" the computational 27 model itself, by appealing to uncomputable oracles, by incorporating exact real numbers that encode unsolvable problems, by computing in space-times that permit one to see 29 inÿnite computational traces in an instant etc. (cf. the other articles in this issue). Similarly, one can attempt to "power-up" empirical science by inventing crystal balls, 31 by extending the scientist's present eyes and mind through all of space and time, etc. The trouble is to actually implement any of these hyper-methodologies in a manner 33 that would inspire conÿdence that the implementation is correct (who checks that the real-valued parameter is set precisely to the right value and who checks that the crystal 35 ball really reveals the future)? An alternative approach is to retain standard computational models, whose 37 implementation issues are (relatively) unproblematic, and to follow the lead of empirical science by relaxing the halting condition on algorithmic success when success 39 with halting is not feasible. Such an approach contradicts neither the Church-Turing thesis nor the empirical problem of induction, for these principles govern infallible 41 solvability (i.e., solvability with zero retractions). There are extended Church-Turing theses and problems of induction for 1; 2; 3; : : : retractions, etc. and all of these theses 43 are mutually consistent. Uncomputability is not a reason to put aside Turing machines, any more than the problem of induction is a reason to abandon empirical science. 45

ARTICLE IN PRESS
Instead, it is a reason to seek Turing machines that converge to the truth in the strongest 1 possible sense. Hume held that skeptical arguments leave inductive reasoning unjustiÿed, for they 3 reveal it to be fallible, unlike purely formal reasoning. His challenge is to show wherein the justiÿcation of fallible reasoning consists. He was doubly mistaken. First, formal 5 reasoning is subject to uncomputability, which is, itself, a kind of "internalized" problem of induction. Second, a method of reasoning (like any other strategy) is justiÿed 7 in a given problem insofar as it solves the problem in the best possible sense. So it is essential to the justiÿcation of a given method M to show that no possible method 9 converges to the truth in a stronger sense than M does. That requires a skeptical argument to the e ect that stronger senses of success are infeasible. Therefore, skeptical 11 arguments are both the principal motivation for Hume's challenge and the answer to it.
In a similar manner, generalized uncomputability arguments justify the application of 13 convergent programs in standard programming languages to uncomputable problems.