A General Theory of Time Discounting: The Reference-Time Theory of Intertemporal Choice

We develop a general theory of intertemporal choice: the reference-time theory, RT. RT is a synthesis of ideas from the generalized hyperbolic model (Loewenstein and Prelec 1992), the quasi-hyperbolic model (Phelps and Pollak 1968, Laibson 1997) and subadditivity of time discounting (Roelofsma and Read 2000, Read 2001 and Scholten and Read 2006a). These models are extended to allow for (i) reference points for time and wealth, and (ii) different discount functions for gains and losses. RT is able to account for all the 6 main anomalies of time discounting: gain-loss asymmetry, magnitude effect, common difference effect, delay-speedup asymmetry, apparent intransitivity of time preferences, and non-additivity of time discounting. We provide a class of utility functions compatible with RT. We show how RT can be extended to incorporate uncertainty and attribute models of intertemporal choice.


Introduction
It is commonly believed that the exponential discounted utility model of intertemporal choice (henceforth, EDU) is contradicted by a relatively large body of empirical and experimental evidence; see the survey by Frederick et al. (2002). 1 Moreover, it appears that these anomalies are not simply mistakes; see Frederick et al. (2002), section 4.3. If we wish to develop models that provide a better explanation of economic behavior over time, then it is imperative to take account of these anomalies. Furthermore, certain types of behavior, and several institutional features, can be explained by decision makers attempting to deal with time-inconsistency problems that arise from non-exponential discounting; see, for instance, Frederick et al. (2002), especially their section 5 titled "Alternative Models".

Anomalies of intertemporal choice
Loewenstein and Prelec (1992), henceforth 'LP', described the following four anomalies, all with good empirical support 2 : 1. Gain-loss asymmetry. Subjects in a study by Loewenstein (1988b) were, on average, indi¤erent between receiving $10 immediately and receiving $21 one year later (an implied discount rate of 74% per annum 3 ). They were also indi¤erent between loosing $10 immediately and losing $15 dollars one year later (an implied discount rate of 40:5% per annum). Note that this is a refutation of neoclassical economics for two reasons. First, the implied discount rates are di¤erent, second, they are both too high (even allowing for capital market imperfections and liquidity constraints).
1. Generalized hyperbolic discounting (LP). LP provided the …rst coherent explanation of anomalies 1-4. They also provided an axiomatic derivation of their generalized hyperbolic discount function, which we shall call the LP-discount function. LP explained the magnitude e¤ect through a value function with increasing elasticity. This makes higher magnitudes more salient. They explained gain-loss asymmetry by adopting a value function with greater elasticity for losses than for gains, which makes losses more salient. The common di¤erence e¤ect is explained in LP by the notion of declining impatience (roughly, one is more impatient as the date of the reward approaches). Hence, despite identical intervals separating two time-outcome pairs, the choice among the two depends on how close to the current period they are. However, LP cannot explain anomalies 5 and 6.
2. Quasi-hyperbolic discounting (PPL). Phelps and Pollak (1968) and Laibson (1997), 'PPL'for short, provided the very tractable quasi-hyperbolic discount function. This is also known in the literature as the -discount function but we shall call it the PPL-discount function. It is, by far, the most popular discount function in applications (after, of course, exponential discounting). The explanation of the common di¤erence e¤ect in PPL is by a sudden drop in impatience at time zero (followed by constant impatience thereafter). Using the other LP assumptions, PPL can explain anomalies 1, 2 and 4 in the same way. Like LP, PPL cannot explain anomalies 5 and 6.
3. Subadditivity and intransitivity (RRS). Roelofsma and Read (2000) provided experimental evidence for non-transitivity of time preferences. The experimental work of Read (2001) con…rmed the common di¤erence e¤ect but rejected declining impatience in favour of constant impatience and, hence, rejected the LP (and PPL) explanation of the common di¤erence e¤ect in terms of declining impatience. Read (2001) and Scholten and Read (2006a) found experimental evidence for subadditivity and introduced the concept of an interval discount function 5 . Furthermore, based on empirical evidence, Scholten and Read (2006a) developed a speci…c interval discount function, which we shall call the RS-discount function. The RS-discount function (depending on parameter values) can explain the common di¤erence e¤ect as due to either declining impatience, subadditivity or a combination of both. We shall refer to this work collectively as 'RRS'.
larger-later pair, then there is a con ‡ict between the attribute of money and that of time. In this case, choice is lexicographic. If either the time or the money dimension is 'similar' then it is ignored and the two pairs are then compared across the remaining dimension and a sensible choice made (e.g. more money or earlier time). Failing this, if there is no similarity in either dimension, then no decision can be made. Thus the preference relationship in Rubinstein (2003) is incomplete.
4.2 Manzini and Mariotti (2006), in their 'theory of vague time preferences', develop an attribute model that can explain the common di¤erence e¤ect. Manzini and Mariotti propose three criteria to choose between a smaller amount of money delivered sooner (SS) and a larger amount of money delivered later (LL). The primary criterion is to choose whichever has the highest present utility value. If the two present values are not 'signi…cantly'di¤erent, then the subject chooses the one with the highest monetary value (secondary criteria). If they have the same monetary values, so that the secondary criterion fails, then the subject behaves according to the third criterion: 'choose the outcome that is delivered sooner'. If all three criteria fail, then the subject is indi¤erent. Thus, Manzini and Mariotti achieve a complete, though intransitive, ordering. In particular, indi¤erence here is not an equivalence relationship. On the other hand, the experimental results of Roelofsma and Read (2000) supported 'sooner is better than larger'against 'larger is better than sooner'. However, if the order of the secondary criteria is reversed, so that sooner is better than larger (in agreement with the experimental results of Roelofsma and Reed, 2000), then we would not get a common di¤erence e¤ect. However, whether Manzini and Mariotti's explanation of the common di¤erence e¤ect is acceptable or not, to us the main contribution of their paper lies in the use of primary and secondary criteria. This appears to us to be a more accurate description of actual decision making then the assumption of a single criterion (see subsection 5.3, below).

Scholten and Read
(2006b) present a critique of the psychological basis for discounting models (including their own). They develop an attribute model based on …rmer psychological foundations. In subsection 5.1, we argue that their tradeo¤ model is equivalent to a discounted utility model that is a generalization of their model (Scholten and Read, 2006a). If this is accepted, then their tradeo¤ model lends further support to their own discount function, the RSdiscount function. 5. Uncertainty and exponential discounting. Under uncertainty the common di¤erence e¤ect never arises when we use expected utility with exponential discounting. Hence, it is quite possible that the experimental …nding of the common di¤erence e¤ect is a rejection of expected utility rather than exponential discounting. Halevy (2007) shows that when non-expected utility is combined with exponential discounting, the theory is consistent with the presence of a common di¤erence e¤ect, provided uncertainty is present but su¢ ciently small (see subsection 5.2, below). 6. Intransitivity and relative discounting. Ok and Masatlioglu (2007) assume neither transitivity nor additivity. In its present formulation, it cannot account for either gain-loss asymmetry or delay-speedup asymmetry. But a more serious problem is the lack of transitivity. Thus it appears that it will be hard to work with this theory, as the authors themselves explain. On the other hand, these problems can all be resolved in the special case of a transitive preference relation. But then their model becomes additive. In this case, Ok and Masatlioglu would reduce to a standard discounting model (see subsection 5.4, below).

Towards a general theory of intertemporal choice: The reference-time theory (RT)
Within the class of time preference models which are separable in time and outcomes (also known as delay-discounting models), what should a general theory of time preference aspire to? We suggest two desirable elements. First, it should be able to explain anomalies 1-6. Second, it should provide a framework that can incorporate recent developments in time discounting such as uncertainty (Halevy, 2007) and attribute models Mariotti, 2006 andRead, 2006b). The aim of this paper is to develop a theory of intertemporal choice that incorporates the two desirable elements mentioned above. We call this theory the reference-time theory of intertemporal choice, 'RT' for short. It is a synthesis of three earlier seminal works, namely, LP, PPL and RRS. In a nutshell, RT is basically LP extended to allow for nonadditive time discounting by incorporating a reference point for time. 6 We explain the anomalies 1-6 as follows.
1. Like LP, we explain the magnitude e¤ect (anomaly 2) by assuming that the elasticity of the value function is increasing. However, in section 3, we show that several popular classes of utility functions violate this condition. These includes CARA (constant absolute risk aversion), CRRA (constant relative risk aversion), HARA (hyperbolic absolute risk aversion), logarithmic and quadratic. We develop a scheme for generating value functions that exhibit increasing elasticity, as required to explain the magnitude e¤ect. The simplest class that has this property we call the class of simple increasing elasticity value functions (SIE). Each member of this class is formed by a product of a HARA function and a CRRA function and, therefore, is quite tractable. We provide a class of utility functions compatible with any theory where preferences are separable in time and outcomes. This includes RT theory, as well as LP, PPL, RRS, Halevy (2007) and Manzini and Mariotti (2006) (sections 3, 4 and 5, below).
2. LP explained the gain-loss asymmetry (anomaly 1) by assuming that the elasticity of the value function for losses exceeds the elasticity for gains. This allows them to use the same discount function for gains and losses, in agreement with the strict separability of time and outcomes. The downside of their approach is that the coe¢ cient of loss aversion is then variable, which contradicts the empirical evidence. In Example 4, section 4, below, we explore the implications of having the same discount function for gains and losses, as in LP. We show that, as time goes to in…nity, the coe¢ cient of loss aversion also goes to in…nity. This is in contrast to the empirical …ndings of a constant coe¢ cient of loss aversion, approximately equal to 2.25 Kahneman, 1991, 1992). For this reason, we assume a constant coe¢ cient of loss aversion. This, in turn, forces us to adopt di¤erent discount functions for gains and losses to explain anomaly 1. 7 3. LP provided an axiomatic derivation of their generalized hyperbolic discount function (which we call the LP-discount function). For this, they added the extra assumption of linear delay to that of the common di¤erence e¤ect. While there is considerable empirical evidence for the common di¤erence e¤ect, the assumption of linear delay is added purely for convenience. We extend the LP derivation as follows. At the most general level, which requires neither linear delay nor the common di¤erence e¤ect, we have our Representation Theorem 2 (Proposition 12, below). We introduce a weaker notion of subadditivity, which we call -subadditivity (De…nition 12, below). According to our Characterization Theorem 4 (Proposition 21, below), preferences exhibit the common di¤erence e¤ect if, and only if, -subadditivity holds. We also introduce a generalization of the concept of linear delay of LP. We call this -delay. Our Proposition 23, below, then shows that -delay implies the common di¤erence e¤ect. Imposing additivity, as well as -delay, gives our Proposition 24, below. The special case of the latter with = 1 gives the LP-discount function. Our more general approach also allows us to derive the RS-discount function (Proposition 25, below). In particular, as with RRS, we can explain the common di¤erence e¤ect 7 We are grateful for the comments of a critical referee, which helped us clarify these issues.
(anomaly 3) as due to either declining impatience, subadditivity or a combination of both. However, our approach is more general, as RT can also explain the common di¤erence e¤ect as due to the presence of a small amount of irremovable uncertainty, as in Halevy (2007) 6. Given a reference point for wealth, w 0 , and a reference point for time, r, our preferences are complete and transitive (subsection 2.3, below). Thus they may be called conditionally complete and conditionally transitive (conditional on w 0 and r). We explain observed intransitivity as due to a change in the reference point for time (see subsections 2.4 and 2.8, below). This is in contrast to Ok and Masatlioglu (subsection 5.4, below), where preferences are complete but intransitive (in our terminology we may describe such preferences as unconditionally complete but not even conditionally transitive). Thus the relative-discounting theory Ok and Masatlioglu and the reference-time theory of this paper are not compatible and neither is a special case of the other.
To summarize, the theory presented in this paper (reference-time theory or RT) can explain anomalies 1-6 (section, 4, below) and can be extended to incorporate uncertainty, as in Halevy (2007), and the attribute models of Manzini and Mariotti (2006) and Scholten and Read (2006b) (section 5, below).
All proofs are contained in the appendix.

A reference-time theory of intertemporal choice (RT)
This section is structured as follows. We …rst discuss the anomalies of intertemporal choice (subsections 2.1). Next we outline prospect theory (subsection 2.2), which is an essential building block of RT. We de…ne preferences for RT as early as possible (subsection 2.3). This is followed by further essential material on discount functions, additivity, impatience, intransitivity and the common di¤erence e¤ect (subsections 2.5, 2.6, 2.7, 2.8 and 2.9). The main technical machinery: representation, extension and characterization theorems are developed in subsections 2.10, 2.11 and 2.12). Finally, in subsection 2.13 we are in a position to extend LP to allow for non-additive time discounting by incorporating a reference point for time (and also separate discount functions for gains and losses).

Anomalies of intertemporal choice:
Refutations of the exponentially discounted utility model or of neoclassical economics more generally?
To …x ideas, we start with a simple neoclassical example.
Example 1 (A neoclassical example): Consider a single consumer who lives for two periods. He has exogenous incomes I 1 and I 2 in periods 1 and 2, respectively. He faces the exogenous …xed real interest rate, r, per period, compound continuously. Let c 1 and c 2 be real consumption in periods 1 and 2, respectively. Then the consumer's intertemporal budget constraint is c 1 + e r c 2 = I 1 + e r I 2 . (2.1) Let the consumer's utility function be The consumer's problem is then to maximize U (c 1 ; c 2 ) subject to the budget constraint (2.1) and non-negativity constrains: Note well: the income stream (I 1 ; I 2 ) is relevant to the decision problem only through its present value, I 1 + e r I 2 , discounted by the real interest rate available to the consumer, r. In particular, if a subject is o¤ered a choice between real monetary rewards, x in period 1 and y in period 2, then the subject should choose x over y if, and only if, x > e r y (and choose y over x if, and only if, x < e r y). So, if a subject exhibits behavior inconsistent with this rule, that behavior would constitute a refutation of neoclassical economics, and not just a speci…c functional form for U (as no such assumption has been made above). For example, if subjects exhibit a clear preference for increasing income streams over constant or declining income streams, all with the same present value, that behavior would constitute a refutation of neoclassical economics, and not just the exponentially discounted form. The same can be said about the anomalies reported in subsection 1.1, above. Each one of them is a refutation of neoclassical economics, not just the exponentially discounted utility model as is usually, but mistakenly, claimed.
Therefore, each attempt at explaining anomalies of intertemporal choice (as far as we know) involves some element of behavioral economics. The mental accounting literature (Thaler, 1999) argues that separate income streams might not fully be integrated. LP (and this paper) use prospect theory Tversky, 1979 andKahneman, 1992) as the underlying decision theory. Scholten and Read (2006b) and Manzini and Mariotti (2006) use attribute models, Halevy (2007) uses non-linear transformation of probabilities and Ok and Masatlioglu (2007) assume non-transitive preferences.

Prospect theory
We follow LP in using prospect theory Tversky, 1979, andKahneman, 1992) rather than standard utility theory. Prospect theory distinguishes between two phases of decision making: editing and evaluation.
In the editing phase, a decision maker simpli…es a real world problem to make it amenable to formal analysis and reduce the associated cognitive load. As part of the editing phase a reference point is chosen to which outcomes are to be compared.
In the evaluation phase, a value (a real number) is attached to each feasible action by the decision maker. The action with highest value is chosen. The function, v, that assigns values to actions in prospect theory is called the value function, it is the analogue of the indirect utility function of standard utility theory. In standard utility theory carriers of utility are the outcomes of actions. But in prospect theory carriers of utility are deviations in outcomes from the reference point. In general, the action chosen in the evaluation phase will depend on the reference point chosen in the editing phase.
The value function, v, in prospect theory has four main properties: reference dependence, monotonicity, declining sensitivity, loss aversion. Furthermore, in prospect theory, there is non-linear transformation of probabilities. There is good empirical support for these feature; see, for instance, Kahneman and Tversky (2000).
We take v to be the value function introduced by Kahneman  v (0) = 0 (reference dependence) and is twice di¤erentiable except at 0.
Following LP, we de…ne the elasticity of the value function as follows.

Preferences
We consider a decision maker who, at time t 0 , takes an action that results in the level of wealth w i at time t i , i = 1; 2; :::; n, where t 0 r t 1 < ::: < t n . (2.8) Time r is the reference time: the time back to which all values are to be discounted. We can choose any moment of time as time zero and measure all other times relative to it. We choose to set t 0 = 0, i.e., the time a decision is made is always time t = 0. If it is desired to set t 0 6 = 0, then simply replace all times, t, below, with t t 0 . The decision maker's intertemporal utility function is given by: where v is the value function introduced by Kahneman and Tversky (1979). x i = w i w 0 is the di¤erence between the wealth level, w i , at time, t i , and the reference level for wealth, More formally, we assume that, for each (w 0 ; r) 2 ( 1; 1) [0; 1), the decision maker has a complete transitive preference relation, w 0 ;r on ( 1; 1) [r; 1). We think of w 0 as the reference point for wealth and r the reference point for time. If (w; t) 2 ( 1; 1) [r; 1), with w w 0 , we say that (w; t) is an outcome in the domain of gains. If (w; t) 2 ( 1; 1) [r; 1), with w w 0 , we say that (w; t) is an outcome in the domain of losses. We assume that w 0 ;r is represented by a utility function v (w w 0 ) D (r; t). Thus (w 1 ; t 1 ) w 0 ;r (w 2 ; t 2 ) if, and only if, v (w 1 w 0 ) D (r; t 1 ) v (w 2 w 0 ) D (r; t 2 ).
Using (2.9), we extend w 0 ;r to a complete transitive preference relation on sequences from ( 1; 1) [r; 1), as follows 8 : ((x 1 ; s 1 ) ; (x 2 ; s 2 ) ; :::; (x m ; s m )) w 0 ;r ((y 1 ; t 1 ) ; (y 2 ; t 2 ) ; :::; (y n ; t n )) , V r ((x 1 ; s 1 ) ; (x 2 ; s 2 ) ; :::; (x m ; s m )) V r ((y 1 ; t 1 ) ; (y 2 ; t 2 ) ; :::; (y n ; t n )) (2.10) We depart from LP in the following ways. LP have a reference point for wealth but not for time. We have a reference point for wealth, w 0 , and a reference point for time, r. LP assume the same discount function for gains and losses. We allow the discount function for gains to be di¤erent from that for losses. LP implicitly assume that the discount function is additive (De…nition 5, below). We allow the discount function to be non-additive, to accommodate the empirical evidence of RRS. If the discount function is additive and is the same for gains and losses (as in LP), then the choice of the reference point for time is irrelevant, since (x; s) w 0 ;r (y; t) if, and only if, (x; s) w 0 ;0 (y; t) (Proposition 7, below). However, if the discount function is non-additive, then the choice of the reference point for time matters. We use this to explain (apparent) intransitivity as a framing e¤ect due to a change in the reference point for time.

Determination of the reference point for time
Let S be a non-empty set of sequences from ( 1; 1) [0; 1). Suppose a decision maker is interested in comparing the members of S. For example, for the purpose of choosing the optimal member (if S is compact). For this he needs a reference point for time. Let T be the set of times involved, i.e., T = ft 2 [0; 1) : t = t i for some sequence f(x 1 ; t 1 ) ; (x 2 ; t 2 ) ; :::; (x i ; t i ) ; :::g in Sg .
(2.11) Since T is bounded below (by 0) and non-empty, it follows that T has a greatest lower bound, r. We make the following tentative assumption: A0 Reference time. Given S, T , r, as described just above, we assume that the decision maker takes r as the reference point for time.
For example, if a decision maker wants to compare x received at time s with y received at time t, s t, then S consists of just two sequences, each with just one element: S = f(x; s) ; (y; t)g and T = fs; tg. Thus A0 implies that r = s. If v (x) < v (y) D (s; t) then the decision maker chooses (y; t) over (x; s).
A0 does not have the status of the LP assumptions A1-A4, introduced in subsection 2.13, below. While there is considerable, though debated, empirical evidence for A1-A4, A0 should be regarded as a tentative assumption, whose implications are to be explored.
We will only use A0 in subsections 2.8 and 2.13. In subsection 2.8, we use A0 to explain (apparent) intransitivity as due to a shift in the reference point for time. In subsection 2.13, we use A0 to prove that assumption A4 (Delay-speedup asymmetry) follows from the other assumptions (Proposition 30).

Discount functions
The …ve discount functions that will be important for this paper are: Exponential: D (r; t) = e (t r) , > 0. ; > 0; > 0. (2.13) The exponential discount function (2.12) was introduced by Samuelson (1937). Aside from its tractability, the main attraction of EDU is that it leads to time-consistent choices. If the plan (x 1 ; t 1 ) ; (x 2 ; t 2 ) ; :::; (x n ; t n ) is optimal at time 0, then at time t k the plan (x k+1 ; t k+1 ) ; (x k+2 ; t k+2 ) ; :::; (x n ; t n ) is also optimal. But this may no longer be true for more general speci…cations of the discount function.
The (or quasi-hyperbolic) discount function (2.13) was proposed by Phelps and Pollak (1968) and Laibson (1997). Its popularity in applied work is second only to EDU. 9 The generalized hyperbolic discount function (2.14) was proposed by Loewenstein and Prelec (1992). For the special case, = , (2.14) it reduces to the hyperbolic discount function. These three discount functions are additive (De…nition 3, below). They can account for the common di¤erence e¤ect through declining impatience (De…nition 6, below) but they cannot account for either non-additivity or intransitivity.
The interval discount function (2.15) was introduced by Scholten and Read (2006a). It can account for both non-additivity and intransitivity. It can account for the common di¤erence e¤ect though declining impatience, subadditivity or a combination of both (subsections 2.8 and 2.9, below).
In subsection 5.1, below, we shall show that the attribute model of Scholten and Read (2006b) is equivalent to a discounted utility model with the discount function (2.16), which is a generalization of their RS-discount function (2.15).
Note that (2.14) approaches (2.12) as ! 0. In general, neither of (2.14) or (2.15) is a special case of the other. However, for r = 0 (and only for r = 0), (2.15) reduces to (2.14) when = = 1. Scholten and Read (2006a) report incorrectly that the LP-discount function is a special case of the RS-discount function. One needs to restrict r = 0 (in addition to = = 1) in order to generate the LP from the RS-discount function. While ; are parameters, r is a variable. Hence, neither discount function is a special case of the other.
We now give a formal de…nition of a discount function. This will be the …rst (and, of course, the most important) of …ve functions we will introduce (the others are: the generating function, the delay function, the seed function and the extension function).  Furthermore, if D satis…es (i) with 'into'replaced with 'onto', then we call D a continuous discount function.
Our terminology suggests that a continuous discount function is continuous. That this is partly true, is established by the following Proposition.
Proposition 1 : A continuous discount function, D (r; t), is continuous in t.
Proposition 2 : Each of (2.12), (2.14) and (2.15) is a continuous discount function in the sense of De…nition 2. However, (2.13) is a discount function but not a continuous discount function.
The reason that (2.13) fails to be a continuous discount function is that lim From (2.14) and (2.15) we see that the restrictions r 0 and t 0 are needed. From (2.15) we see that the further restriction r t is needed. 10 From (2.12) we see that the 'into'in De…nition 2(ii) cannot be strengthened to 'onto'.
Proposition 3 (Time sensitivity): Let D be a continuous discount function. Suppose r 0. If 0 < x y, or if y x < 0, then v (x) = v (y) D (r; t) for some t 2 [r; 1). 11 Proposition 4 (Existence of present values): Let D be a discount function. Let r t. Let y 0 (y 0). Then, for some x, 0 x y (y x 0), v (x) = v (y) D (r; t). 12

Additivity
We now de…ne additivity and related concepts. superadditive if D (r; s) D (s; t) > D (r; t) , for r < s < t. (2.20) Additivity (2.18) implies that discounting a quantity from time t back to time s and then further back to time r is the same as discounting that quantity from time t back to time r in one step.
To aid further development, we de…ne a generating function, whose interpretation will become apparent from Proposition 5 that follows the de…nition. A 'continuous generating function' is continuous. The proof is the same as that of Proposition 1 and, therefore, will be omitted. (iii) If ' is onto (so that ' is a continuous generating function), then D is an additive continuous discount function and D (0; t) = ' (t).
Thus, if we have the same additive discount function for gains and losses (as is the case with LP) then the choice of the reference time, r, back to which all utilities are discounted, is irrelevant. Discounting back to time r is equivalent to discounting back to time 0. Hence, the same result holds, even if we have di¤erent additive discount functions for gains and losses, provided all outcomes are in the domain of gains or all outcomes are in the domain of losses. However, if our discount function is not additive, or if the discount function for gains is di¤erent from the discount function for losses and we have a mixture of gains and losses, then the optimal choice of the decision maker may depend on the reference point for time.

Impatience
The following concepts are also useful.
In the light of Proposition 8, we can now see the interpretation of the parameters and in the RS-discount function (2.15).
controls impatience, independently of the values of the other parameters , and : 0 < < 1, gives declining impatience, = 1 gives constant impatience and > 1gives increasing impatience. If 0 < 1, then we get subadditivity, irrespective of the values of the other parameters , and . However, if > 1, then (2.15) can be neither subadditive, additive nor superadditive (depending on the particular values of r, s and t, we may have D (r; s) < D (r + t; s + t), D (r; s) = D (r + t; s + t) or D (r; s) > D (r + t; s + t)).

Intransitive preferences: Real or apparent?
Consider the following hypothetical situation. A decision maker prefers a payo¤ of 1 now to a payo¤ of 2 next period, i.e., (2, next period) (1, now). The decision maker also prefers a payo¤ of 2 next period to a payo¤ of 3 two periods from now, i.e., (3, 2 two periods from now) (2, next period). Finally, the same decision maker prefers a payo¤ of 3 two periods from now to a payo¤ of 1 now, i.e., (1, now) (3, 2 two periods from now). Schematically: (1, now) (3, 2 two periods from now) (2, next period) (1, now) . (2.24) Ok and Masatlioglu (2007, p215) use a similar example to motivate their intransitive theory of relative discounting. Alternatively, we may view (2.24) as due to a framing e¤ect resulting in a shift in the reference point for time. Assume that the choice of reference time in each pairwise comparison is the sooner of the two dates, in conformity with Assumption A0, subsection 2.4. Then (2.24) can be formalized as follows.
Thus, the decision maker prefers a payo¤ of 1 now to a payo¤ of 2 next period, both discounted back to the present. The decision maker also prefers a payo¤ of 2 next period to a payo¤ of 3 the following period, both discounted back to next period. Finally, the decision maker prefers a payo¤ of 3 in two periods from now to a payo¤ of 1 now, both discounted back to the present. If this view is accepted, then the apparent intransitivity in (2.24) arises from con ‡ating V 0 (3; 2) with V 1 (3; 2) and V 1 (2; 1) with V 0 (2; 1). The following example shows that (2.25) is consistent with a reference-time theory of intertemporal choice.
Example 2 : Take the reference point for wealth be the current level of wealth, so each payo¤ is regarded as a gain to current wealth. Take the value function to be 14 con…rming (2.25).
A consequence of Proposition 7 is that no additive discount function (e.g., exponential (2.12), PPL (2.13) or LP (2.14)) can explain (apparently) intransitive choices as exhibited in (2.24). The reason is that, under the conditions of that proposition, all utilities can be discounted back to time zero and, hence, can be compared and ordered.
Let us reconsider the common di¤erence e¤ect, using Thaler's apples example (anomaly 3 in the list of subsection 1.1). A decision maker prefers one apple today to two apples tomorrow, so that However, the decision maker, today, prefers to receive two apples in 51 days' time to receiving one apple in 50 days'time, so that . Thus, the common di¤erence e¤ect is consistent with constant impatience if the discount function is su¢ ciently subadditive.
Example 3 : (Thaler's apples example) Take the reference point to be 'no apples'and take the value function to be 15 (2.39) 15 The reasons for this choice will become clear in sections 3.1 and 3.2, below. Many other choices will also do.
Thus (working to …ve signi…cant …gures), v (1) = 1: 414 2 and v (2) = 2: 449 5. (2.40) We compare the resolution of the 'common di¤erence e¤ect' anomaly under the LPdiscount function (2.14) and the RS-discount function (2.15). To simplify as much as possible, choose the parameters: = = = = 1. We shall use these parameters in other examples too. We tabulate the relevant magnitudes below: Recall that the decision maker prefers one apple today to two apples tomorrow if, and only On the other hand, the decision maker, today, prefers to receive two apples in 51 days' time to receiving one apple in 50 days'time if, and only if, . Thus, the LP-discount function explains the common di¤erence e¤ect as exclusively due to declining impatience, while the RS-discount function explains this e¤ect as due (in this example, exclusively) to subadditivity. More generally, and provided 0 < 1, the RS-discount function can combine subadditivity with declining impatience (0 < < 1), constant impatience ( = 1) or increasing impatience ( > 1).
Of course, and as Read (2001) pointed out, the common di¤erence e¤ect could be due to both declining impatience and subadditivity. Read (2001), conducted a series of experiments that tested for the common di¤erence e¤ect and could also discriminate between subadditivity and declining impatience. He found support for the common di¤erence e¤ect and for subadditivity but rejected declining impatience in favour of constant impatience. Read (2001) also discusses the psychological foundation for subadditivity.

Representation theorems
Suppose that x received at time 0 is equivalent to y received at time t (when both are discounted back to time 0), so that v (x) = v (y) D (0; t). Suppose that the receipt of x is delayed to time s. We ask, at what time, T , will y received at time T be equivalent to x received at time s (when both are discounted back to time 0)? Or, for what time, T , will the following hold: For the exponential discount function (2.12) the answer is clear: T = s + t. More generally, does such a T exist? Is it unique? Does it depend on x; y as well as s; t? What are its properties? These questions are answered by Propositions 9 and 10, below. But …rst, a de…nition. Then we call a delay function corresponding to the discount function, D. We also say that the discount function, D, exhibits -delay.
Proposition 9 (Properties of a delay function): Let D be a discount function and a corresponding delay function. Then has the following properties: (a) is unique, Suppose that x received at time 0 is equivalent to y received at time t (when both are discounted back to time 0), so that v (x) = v (y) D (0; t). Suppose that the receipt of x is delayed to time s. Then, according to Proposition 9(e), the delay function, (s; t), if it exists, gives the time to which the receipt of y has to be deferred, so as to retain equivalence to x (when both are discounted back to time 0). Therefore, we called a delay function.
Proposition 10 (Existence of a delay function): A continuous discount function has a unique delay function.
We now introduce our fourth de…ned function (the others were: the discount function, the generating function and the delay function).
A 'continuous seed function'is continuous. The proof is similar to that of Proposition 1 and, therefore, will be omitted.
The following de…nition gives a useful representation for discount functions.
Proposition 11, below, establishes the existence of ( ; )-representations for continuous discount functions and shows their connection to delay functions.
From Proposition 11, we see that if is to be the delay function of some continuous discount function, then it must take the form given in part (c) of that proposition. In the light of this, when considering possible delay functions, we can restrict ourselves, without loss of generality, to the class of functions of the form (s; , where is as in part (a), i.e., a continuous seed function.
The following proposition is a generalization of LP's derivation of their generalized hyperbolic discount function. According to Propositions 9(a) and 10, a continuous discount function determines a unique delay function. Hence, we can partition the set of all continuous discount functions into equivalence classes, two continuous discount functions being in the same equivalence class if, and only if, they have the same delay function. Many di¤erent discount functions could have the same delay function. In fact, according to Representation Theorem 1 (Proposition 11), all (the di¤erent) continuous discount functions, D, for which D (0; t) = [1 + (t)] (…xed and , di¤erent 's) have the same delay function and, hence, lie in the same equivalence class. But does an equivalence class contain other continuous discount functions? Representation Theorem 2 (Proposition 12) gives the answer 'no': Consider an arbitrary equivalence class. Choose some member of that class. Let it have . Then all members of its class can be obtained by varying , keeping and …xed.

Extension theorems
Proposition 8(a) and (b) established that the RS-discount function (2.15) is not additive and, hence, cannot be obtained as an additive extension of a strictly decreasing function ' : [0; 1) ! (0; 1], ' (0) = 1. We, therefore, need a more general way to extend such a strictly decreasing function to a discount function. This is what we turn to in this subsection. We start by introducing our …fth, and …nal, de…ned function.
Then we call f an extension function. If, in (i), 'into'is replaced with 'onto', then we call f a continuous extension function.
A 'continuous extension function', f (r; t), is continuous in t. The proof is the same as that of Proposition 1 and, therefore, will be omitted.
, then (a) we call f an extension function corresponding to D, (b) we refer to D (r; t) as an f -extension of D (0; t), or just an f -extension.
De…nition 10 de…nes extension functions independently of any discount function. By contrast, De…nition 11 de…nes an extension function corresponding to a give discount function. Our terminology suggests that 'an extension function corresponding to a given discount function' is, in fact, 'an extension function'. That this is indeed the case, is established in the following proposition.
Proposition 13 (Extension Theorem 1): Let D be a discount function. Let f be a corresponding extension function in the sense of De…nition 11(a). Then: Suppose that x received at time r is equivalent to y received at time t, 0 r t, time r being the reference time; so that v (x) = v (y) D (r; t). Suppose that the receipt of x is brought forward to time 0. We ask, at what time, T , will y received at time T be equivalent to x received at time To summarize, given a generating function, ', and an extension function, f , by Extension Theorem 3 (Proposition 15), we can construct a discount function D so that D (r; t) is the f -extension of ' (t) = D (0; t). Extension Theorem 2 (Proposition 14) tells us that all continuous discount functions are obtainable in this way from continuous generating functions and continuous extension functions.

Characterization theorems
We can combine the representation and extension theorems of the previous two subsections to produce further useful results, which we now turn to.
i .
Proposition 19 : The following two tables give a seed function, , the generating function, ', the extension function, f , and the delay function, , of each of the discount functions D (r; t) (2.12) to (2.15).
Starting with a continuous seed function, , an > 0 and a > 0, we can 'grow'from them a unique generating function, (which turns out to be continuous). Given this generating function and a continuous extension function, f (r; t), we obtain a unique discount function D (r; t) = ' (f (r; t)) = [1 + (f (r; t))] (which also turns out to be continuous). This discount function determines a unique delay function, Conversely, a continuous discount function, D, determines a unique (continuous) generating function, ' (t) = D (0; t) and a unique (continuous) extension function, f , so that D is the f -extension of ': D (r; t) = ' (f (r; t)).
Although a continuous discount function, D, determines unique generating, extension and delay functions, ', f and , it does not determine unique , or in the representation D (0; t) = [1 + (t)] . For example, the LP-discount function D (r; t) = (1 + r) (1 + t) , > 0, > 0, has, obviously, the representation D (0; t) = (1 + t) (with (t) = t) and, hence, the delay function (s; t) = s + t + st. But it also has many other representations: , for all a > 0 and all b > 0. However, it can easily be check that Since the delay function, , but not the seed function, , is uniquely determined by D, it is better to say that D exhibits delay rather than delay.

Assumptions and consequences
LP introduce four assumptions, all with good experimental support (LP, II pp574-578). We adapt these assumption to allow for (i) discount functions that are not, necessarily, additive and (ii) discount functions that may be di¤erent for gains and losses. Under these latter two conditions, the reference point for time becomes important (Proposition 7). Let the discount functions for gains, D + , and for losses, D , be given by: where + and f + are, respectively, the seed and extension functions for gains. Analogously, and f are, respectively, the seed and extension functions for losses. If D + is a continuous discount function then it can always be represented in the form (2.45), where + is a continuous seed function and f + is a continuous extension function. Moreover, f + is determined uniquely by D + (Characterization Theorem 1 (Proposition 16)). Analogous statements hold for the discount function for losses, D . Furthermore, under the assumption of continuity, D + and D will have unique delay functions, + and , respectively (Propositions 9 and 10 and Representation Theorem 1 (Proposition 11)), and are given by: Thus, what is regarded as anomalous behavior from the neoclassical point of view is at the core of the RT theory. Given discount functions, D (r; t), for gains (+) and losses ( ) respectively, the assumption A1 to A4, below, place restrictions only on D (0; t), i.e., only on discounting from an arbitrary time, t 0, back to time zero. Hence, to derive results for D (r; t), further assumptions are needed. In particular, for Proposition 24 we assume that D (r; t) is an additive extension of D (0; t), while for Proposition 25 we assume that D (r; t) is A function, , is subadditive (in the standard sense) if, for all s and t for which is de…ned: (s + t) (s) + (t). A function that is -subadditive, for some > 0, need not be subadditive. However, a function is subadditive if, and only if, it is -subadditive for all > 0. 18 17 A4 is to be understood as follows. In the LHS of the inequality, the reference stream of the decision maker is 0; 0; c (for dates 0; s; s+t) i.e. a reward is contractually promised at time s+t > 0: The individual is then o¤ered a choice to receive the reward early, at time s (speedup). Given the assumption on reference time, in A0, the income stream, relative to reference wealth, 0; c; c, can be explained as follows: The individual was not expecting anything at times 0 and s so relative to reference wealth, he gets 0 0; c 0 at times 0 and s: Having received a reward of c at time s; his reference wealth is c: Hence, at time t + s his wealth relative to the reference wealth is 0 c = c: For the RHS of the inequality, the contractually promised income stream is 0; c; 0 (for dates 0; s; s + t): The individual is then told that the reward will now instead be available only at time s + t (delay). Proceeding as before, the stream of income relative to the reference point is now 0; c; c. 18 Similarly, a function, , is additive (in the standard sense) if, for all s and t for which is de…ned: (s + t) = (s) + (t). Consider the exponential discount function, D (r; t) = e (t r) , > 0. Then ln D (0; t) is additive in this sense. And, of course, D (r; t) is additive in the sense of De…nition 3. Also note that -subaddivity, as in De…nition 12, neither implies nor is implied by subadditivity of the discount function, as in De…nition 3. Proposition 21 (Characterization Theorem 4): Let D be a continuous discount function. Then preferences exhibit the common di¤erence e¤ect for gains if, and only if, the seed function for gains, + , is + -subadditive. 19 Similarly, preferences exhibit the common di¤erence e¤ect for losses if, and only if, the seed function for losses, , is -subadditive.
In particular, if + = = 1, we say that preferences exhibit linear delay.
A delay function, if it exists, is unique (Proposition 9) and it always exists for a continuous discount function (Proposition 10). Hence, De…nition 13 is a sound de…nition. However, it should be remembered that -delay is a property of the delay function, , not of the seed function, (see discussion at end of subsection 2.12).
Proposition 22 : If preferences exhibit -delay, then they also exhibit the common difference e¤ect. (a) Losses are discounted less heavily than gains in the following sense: The value function is more elastic for losses than for gains: Proposition 27 (LP): For a continuous discount function, A2 implies that the value function is (a) subproportional: , for a > 1, (b) more elastic for outcomes of larger absolute magnitude: Intuitively, increasing elasticity of the value function implies greater sensitivity of v to increases in x. This in turn increases the weight of larger outcomes (D (r; t) v(x t )) in intertemporal plans. A similar intuition applies to the result in Proposition 26(b).
We now add two standard assumption from prospect theory. The …rst is that the value function is strictly concave for gains and strictly convex for losses (Kahneman and Tversky, 1979): A5 Declining sensitivity. For x > 0, v 00 (x) < 0 (strict concavity for gains). For x < 0, v 00 (x) > 0 (strict convexity for losses).
Combining A5 with Proposition 27 we get: Proposition 28 : A2 and A5 imply that 0 < v < 1. 20 Strictly speaking, what we call as the RS-discount function is a slight generalization of the discount function in Scholten and Read (2006a). They do not have a reference point for time/wealth, nor do they have separate discounting for gains and losses. 21 This proposition is stated incorrectly in al-Nowaihi and Dhami (2006a).
The second assumption that we add from prospect theory is constant loss aversion. While this is not core to prospect theory, it is very useful and has good empirical support Kahneman, 1991 and: With the aid of these two extra assumptions, we get the following two theorems: Proposition 29 : From A1 and A6 it follows that, for a continuous discount function: Proposition 30 : Assumption A4 (delay-speedup asymmetry) follows from the other assumptions, in particular A0 (reference time), A1 (gain-loss asymmetry) and A6 (constant loss aversion).
As mentioned above, LP assume that D + = D . While this is consistent with their theory, it does not follow from it. Assuming that D + = D is obviously attractive. However, it implies (Proposition 26) that gain-loss asymmetry (A1) can only be satis…ed In the light of Proposition 29, this would exclude value functions exhibiting constant loss aversion. While constant loss aversion is not core to prospect theory, this auxiliary assumption considerably simpli…es application of the theory and is consistent with the evidence Kahneman, 1991, 1992).
Propositions 26 and 27 are due to LP. Proposition 24 extends LP from linear delay to -delay.

Simple increasing elasticity value functions (SIE)
A natural question that arises is 'Is the RT theory developed in section 2 consistent?' A related question is 'Is there a tractable functional form for the value function which can be combined with RT theory to produce a useful model?' We address these questions in this section. In subsection 3.2, below, we answer the second question in the a¢ rmative. We call the value function developed there a simple increasing elasticity (SIE) value function.
Our a¢ rmative answer to the second question also provides an a¢ rmative answer to the …rst question. But …rst, in subsection 3.1, immediately below, we show that none of several popular families of functions is compatible with RT theory or, indeed, any theory (e.g., LP) that attempts to explain the magnitude e¤ect on the basis of increasing elasticity of the value function.

Incompatibility of HARA value functions with the reference-time theory
We consider several popular classes of value functions including CARA (constant absolute risk aversion), CRRA (constant relative risk aversion), HARA (hyperbolic absolute risk aversion), logarithmic and quadratic. Proposition 31, below, shows that each member of this family exhibits constant or declining elasticity, contradicting LP's Proposition 27, which holds in the RT theory. Hence, none of these families is compatible with the RT theory (and, hence, none is compatible with the LP theory either). First, we give the de…nitions and main properties of this family of functions, followed by the main result of this subsection: Proposition 31.
Notation: We use the notation, A and R respectively, for the coe¢ cients of absolute risk aversion and relative risk aversion. So for a utility function v (x) The general restriction is that 6 = 1. However, we need the stronger restriction, 0 < < 1, in order to satisfy Proposition 28. It is clear, from the last line of (3.1), that members of the CRRA class of functions violate Proposition 27 and, hence, are not compatible with RT theory.
Note that, traditionally, the HARA class is de…ned by v (x) = 1 + x 1 , and that v (x) = (1 ) 1 + x 1 , which is increasing in x, as required by Proposition 27.
While an additive constant, of course, makes no di¤erence in expected utility theory; its absence here would violate the assumption v (0) = 0. However, including the constant 1 1 , to make v (0) = 0, results in v (x) decreasing with x, as will be shown by Proposition 31, and, hence, violates Proposition 27.
The following three classes of functions are also regarded members of the HARA family.

Constant absolute risk aversion functions (CARA)
v From the last line of (3.3), we see that v (x) is decreasing with x. Hence, the CARA class is not compatible with the RT theory.

Logarithmic functions
Proposition 31, below, establishes that v (x) is decreasing with x. Hence this class is not compatible with the RT theory.

Quadratic functions
Proposition 31, below, establishes that v (x) is decreasing with x. Hence this class is also not compatible with the RT theory.

A value function compatible with the reference-time theory
We do two things in this subsection. First, we provide a simple tractable functional form for the value function that is compatible with the RT theory. We call this a simple increasing elasticity (SIE) value function. Second, we provide a scheme for generating further such functions. This is important for two reasons. First, it provides a model for the RT theory and, therefore, establishes its internal consistency. Second, these functional forms may aid applications and further theoretical development.
The following method can be used to generate candidates for value functions compatible with the RT theory. Choose a function, h (x), satisfying: then solve the following di¤erential equation for v (x): x This method only yields candidate value functions, which then have to be veri…ed. For example, choose Substituting from (3.8) into (3.7), separating variables, then integrating, gives: The restrictions a > 0, b > 0, c > 0, a + c 1 give: 0 < < 1 and = > 0. To ensure that v 0 > 0, take > 0. Hence > 0. For Putting all these together gives the candidate value function where is the (constant) coe¢ cient of loss aversion. It may be interesting to note that (3.11) is a product of a CRRA function, x , and a HARA function, 1 + x 1 .

Explaining the anomalies
Here, we put together the results of sections 2 and 3. In (4.1), (4.3) and (4.4), below, we reproduce the discount functions (2.12), (2.14) and (2.15) but with the parameter restrictions implied by the propositions of section 2. For completeness, we also reproduce the PPL-discount function (2.13), with the relevant parameter restrictions, as (4.2).
< + ensures that D + (0; t) < D (0; t) for t > 0, as required by Proposition 29. PPL (4.2): 0 < + , 0 < + ensures that D (s; t) is strictly increasing in s and strictly decreasing in t. They also ensure that A1, gain-loss asymmetry, is satis…ed. If either < + or < + , then A3, the common di¤erence e¤ect, is also satis…ed. LP (4.3): > 0 and > 0 ensure that D (r; t) is strictly increasing in r and strictly decreasing in t. 1 is required to ensure that -delay is satis…ed (with = ) and, hence, A3. 22 Generalized RS (4.5): The restrictions guarantee that (4.5) is a continuous discount function in the sense of De…nition 2 and that D + (0; t) < D (0; t) in compliance with Proposition 29.
For ease of reference, we reproduce (3.11) below. v

A summing up
To sum up so far, exponential discounting (4.1) (even with di¤erent discount rates for gains and losses) cannot explain the common di¤erence e¤ect (but see subsections 5.2 and 5.3, below). On the other hand, the PPL-discount function (4.2), the LP-discount function (4.3) and the RS-discount function (4.4) can all explain the common di¤erence e¤ect. But 22 To see why we need + = , suppose 0 < + < < 1. If t > 1, then t + < t . But if 0 < t < 1, then t + > t . For the same reason, we need + = .
they explain it in di¤erent ways. The PPL (4.2) and LP (4.3) discount functions explain the common di¤erence e¤ect with declining impatience. For PPL, there is a sudden drop in impatience from time t = 0 to times t > 0, with impatience being constant for all times t > 0. For LP, on the other hand, the decline in impatience is continuous (recall Example 3). By contrast, the RS-discount function (4.4), on account of its subadditivity (for 0 < 1) can explain the common di¤erence whether we have declining impatience (0 < < 1), constant impatience ( = 1) or increasing impatience ( > 1), provided 1 (recall Proposition 8 and Example 3). On the other hand, none of the discount functions (4.1), (4.2) or (4.3) can explain (apparent) intransitive preferences such as that exhibited by (2.24), recall Proposition 7 and subsection 2.8.
Thus, it emerges that of discount functions (4.1), (4.2), (4.3) and (4.4), the RS-discount function (4.4) is the most satisfactory because, when combined with the SIE value function, reference time/wealth, and di¤erent discount functions for gains and losses, it can explain all the anomalies: gain-loss asymmetry, the magnitude e¤ect, the common di¤erence e¤ect, delay-speedup asymmetry as well as subadditivity and (apparent) intransitivity.
Finally, we give an example that illustrates the di¢ culty inherent in taking the discount function for gains to be same as that for losses.
The LP (4.3) and RS (4.4) discount functions take the forms:  Note that for both the LP and RS-discount functions, we have:

Alternatives and extensions
In this section, we compare the reference-time theory (RT) of section 2 with four recent developments. First, we consider the tradeo¤ model of intertemporal choice of Scholten and Read (2006b), SR for short. We will argue that SR's tradeo¤ criterion can be represented by a discount function. Hence, it can be incorporated within RT. The gain is that their psychological arguments for their tradeo¤ model give support for RT theory and, in particular, their own RS-discount function.
The second development we consider is Halevy (2007), H for short, who shows that the common di¤erence e¤ect is compatible with exponential discounting, provided subjects are non-expected utility maximizers and exhibit the certainty e¤ect. The certainty e¤ect was …rst proposed as an explanation of the Allais paradox: subjects are much more sensitive to a change from certainty to uncertainty than they are to changes in the middle range of probabilities.
The third is the theory of vague time preferences of Manzini and Mariotti (2006), MM for short. Again, they can explain the common di¤erence e¤ect without departing from exponential discounting. However, we believe that the importance of H and MM far transcends their ability to explain the common di¤erence e¤ect. On the other hand, and because in their present formulations they do not include any reference dependence, they are unable to explain gain-loss asymmetry, delay-speedup asymmetry, subadditivity and (apparent) intransitivity. By contrast, RT theory can explain all the anomalies. Nevertheless, we believe that it is desirable, and easy, to extend RT theory to incorporate uncertainty, as in H, and multiple criteria, as in MM. We show this, below, in the context of simple examples.
The fourth recent development we discuss here is the theory of intransitive preferences and relative discounting of Ok and Masatlioglu (2007), OM for short. This is the most radical of all the theories considered so far. From the outset it neither assumes transitivity nor additivity and, hence, is compatible with these two phenomena. In its present formulation, it cannot account for either gain-loss asymmetry or delay-speedup asymmetry. Furthermore, the lack of transitivity will make it hard to work with this theory, as the authors themselves explain. On the other hand, these problems can all be resolved in the special case of a transitive preference relation. But then their model becomes additive. In this case, OM would reduce to a standard discounting model.
Finally, all …ve theories (SR, H, MM, OM and RT) can explain the magnitude e¤ect, when combined with the SIE value function developed in this paper.

The tradeo¤ model of intertemporal choice
Read and Scholten's critique of discounting models, including their own, led them to develop their tradeo¤ model of intertemporal choice (Read and Scholten, 2006). It is worth quoting their abstract in full: "Research on intertemporal judgement and choices between a smaller-sooner and a larger-later outcome has revealed many anomalies to the discountedutility model. Attempts to account for these anomalies within the discounting paradigm have resulted in convoluted and psychologically opaque models. We therefore develop a new model of intertemporal choice, the tradeo¤ model, in which choice results from a tradeo¤ between the perceived time di¤erence (interval) and the perceived outcome di¤erence (compensation). This model is both more parsimonious and more intuitive than any rival discounting model of comparable scope. Moreover, it accurately describes archival data as well as data from new experiments." We argue that the tradeo¤ model of Scholten and Read (2006b) can be incorporated within RT-theory. If this is accepted, then their tradeo¤ model lends further support to the RT-theory and, in particular, their own discount function (2.15) and its generalization (5.13), below.
We proceed by …rst recasting their model in a more general form (and indicate how their model is to be obtained as a special case). However, there should be no presumption that they would agree with our reformulation. They develop their model through three successive versions. We concentrate on their fourth and …nal version, page 15.
Let r 0 be the reference point for time. 23 The tradeo¤ model establishes preference relationships, r and r between outcome pairs (x; s) and (y; t). Thus (x; s) r (y; t) if, and only if, y received at time t is strictly preferred to x received at time s. Similarly, (x; s) r (y; t) if, and only if, y received at time t is equivalent to x received at time s. These relationship are established using three functions, a value function, u, a tradeo¤ function Q and a delay-perception function, w. We make the following assumptions: Q : is strictly increasing (the same as in (5.13), below). 24 First, let x > 0; y > 0 and s r 0; t r. Then: Second, let x < 0; y < 0 and (as before) s r 0; t r. Then: For completeness, we also need (again, s r 0; t r): y > 0 ) (0; s) r (y; t) , (5.7) x < 0; y > 0 ) (x; s) r (y; t) . (5.8) To get the tradeo¤ model of Read and Scholten, set r = s in the above equations. 25 To de…ne a discount function, D, that expresses these preferences, let v (x) = e u(x) , for x > 0, (5.9) v (x) = e u(x) , for x < 0.
(5.12) 23 To ease the burden of notation, we shall suppress reference to the reference point for wealth, w 0 . Thus, in what follows, we write r and r when we should have written r;w0 and r;w0 , respectively. 24 They explicitly state two assumptions: Q 0 > 0, Q 00 < 0. However, in the next paragraph, they say that Q 00 > 0 for su¢ ciently small intervals. So, we make no assumptions on Q 00 . They explicitly state no further assumptions on Q and w. However, we believe our other assumptions on Q and w are in line with what they intend (see their equations (2) and (5) for the earlier, and simpler, versions of their model). 25 They explicitly state only (5.1) and (5.3) (with r = s). However, we believe that our other equations are in line with their framework. which is a generalization of the discount function (2.15) of Scholten and Read (2006a). Thus, RT-theory can incorporate the tradeo¤ model.

The certainty e¤ect
A test of a theory (T) is always a test of T plus auxiliary assumptions (O). Thus, a refutation of T&O may be a refutation of O rather than T. However, since O is often left implicit, a refutation of T&O may be misconstrued as a refutation of T rather than O. A case in point may be T = 'exponential discounting' and O = 'uncertainty is not relevant'. In testing the common di¤erence e¤ect, not only is it better if subjects are paid 'real money', the delays should be realistic too, i.e., quite long. Despite the experimenters' best e¤orts to eliminate uncertainty, there will always be a residual risk that the subjects will not receive their promised payo¤s. If subjects were expected utility (EU) maximizers, then risk would not matter (Example 5, below). However, if subjects overweight low probabilities and underweight high probabilities (as in many non-EU theories), then risk matters (Example 6, below). Moreover, the lower the residual risk the greater will be its e¤ect! (Example 7, below.) 26 Thus, Halevy (2007) argues that the common di¤erence e¤ect may, in fact, be a refutation of EU rather than exponential discounting. The above points are illustrated by the following three examples. They all involve a choice between receiving $1000 now or $1100 next year and, simultaneously, a choice between receiving these two sums 10 and 11 years from now, respectively. We use the SIE value function (4.6), so that v (1000) = 1000:5 and v (1100) = 1100:5. (5.14) Let the discount function be D (s; t) and the probability weighting function be w (p), where p is the probability that the payo¤ will actually be paid one year from now. We assume independence across years so that the probability of receiving the payo¤ t years from now is p t . Let (x; t) be the event $x is received in year t and let (x; s) (y; t) mean (y; t) is strictly preferred to (x; s). We take the current level of wealth, w 0 , and present time, r = 0, to be the reference points for wealth and time, respectively (and, to simplify notation, we have dropped the subscripts, w 0 ; r, from w 0 ;r ). We thus have: No common di¤erence e¤ect: (1100; 1) (1000; 0) ) (1100; 11) (1000; 10) (5.15)  From (5.18) and (5.20) we see that (1100; 1) (1000; 0) , (1100; 11) (1000; 10). Thus, exponential discounting together with expected utility 27 imply no common di¤erence e¤ect. Hence, the observation of a common di¤erence e¤ect is a rejection of the joint hypothesis of exponential discounting and expected utility. Thus, it would imply the rejection of one or the other (or both) but not, necessarily, exponential discounting.
Example 7 suggests that if the common di¤erence e¤ect is due to the certainty e¤ect alone, rather than a combination of the certainty e¤ect and non-exponential discounting, then the phenomenon should disappear for probabilities around 0:4.

Vague time preferences
Manzini and Mariotti (2006) develop a theory of vague time preferences and discuss the psychological foundations for such an approach. The intuition behind this theory is that the choice between, say, receiving $1000 now and $1100 next year is clearer than the choice between these two sums received 10 and 11 years from now, respectively. MM propose three criteria to choose between (x; t) and (y; s). The primary criterion is to choose whichever has the highest present utility value. If the two present values are not 'signi…cantly'di¤erent, then the subject chooses the one with the highest monetary value (secondary criteria). If they have the same monetary values, so that the secondary criterion fails, then the subject behaves according to the third criterion: 'choose the outcome that is delivered sooner'. If all three criteria fail, then the subject is indi¤erent. Thus, MM achieve a complete, though intransitive, ordering. In particular, indi¤erence here is not an equivalence relationship. Suppose that two present values are signi…cantly di¤erent if their di¤erence is greater than , where is positive real number. Then we can state these criteria formally as follows. (x; t) w 0 ;r (y; s) if, and only if, one of the following holds 29 : , and x < y, or 3. jv (y) D (r; s) v (x) D (r; t)j , x = y and s < t. 29 More generally, is a 'vagueness function', in which case 'jv (y) D (r; s) v (x) D (r; t)j ' is replaced by 'v (y) D (r; s) v (x) D (r; t) (x; r; t) and v (x) D (r; t) v (y) D (r; s) (y; r; s)' Obviously, if x and y are vectors, then extra criteria can be added. Present utility values whose di¤erence is less than are regarded as not signi…cantly di¤erent. This could be because, for example, the decision maker is not sure of the appropriate value function or discount function to use. Therefore, the decision maker does not want the decision to depend too critically on the choice of these functions. On the other hand, the decision maker might be absolutely sure that more is better than less and sooner is better than later. Example 8, below, shows how this theory can explain the common di¤erence e¤ect.
Example 8 : Consider the choice between receiving $1000 now and $1100 next year and the choice between these two sums received 10 and 11 years from now, respectively. As with the examples in subsection 5.2, we use the SIE value function (4.6), so that 5.14 holds. We use the exponential discount function (2.12) with = 0:1 and the reference time r = 0, D (0; t) = e 0:1t . We take = 3, so that present utility values whose di¤erence is less than 3 are regarded as not signi…cantly di¤erent. Using these values, we get v (1000) v (1100) e 0:1 = 1000:5 1100:5e 0:1 = 4: 726 4 > 3. Hence, the primary criterion holds and the decision maker prefers $1000 now to $1100 next year. Next, jv (1100) e 1:1 v (1000) e 1 j = j1100:5e 1:1 1000:5e 1 j = 1: 738 8 < 3. Hence, the primary criterion fails, and the decision maker considers the second criterion. Since 1000 < 1100, the second criterion holds. The decision maker prefers $1100 received 11 years from now to $1000 received 10 years from now. We have an illustration of the common di¤erence e¤ect.
Recall, from subsection 2.8 above, that the experimental results of Roelofsma and Read (2000) supported 'sooner is better than larger'against 'larger is better than sooner'. However, if the order of the secondary criteria is reversed, so that sooner is better than larger (in agreement with the experimental results of Roelofsma and Reed, 2000), then $1000 received 10 years from now would be better than $1100 received 11 years from now, and we would not get a common di¤erence e¤ect.
However, whether MM's explanation of the common di¤erence e¤ect is acceptable or not, to us the main contribution of their paper lies in the use of primary and secondary criteria. This appears to us to be a more accurate description of actual decision making compared to the assumption of a single criterion.

Intransitive preferences and relative discounting
Ok and Masatlioglu (2007) (henceforth OM) accommodate (apparent) intransitivity, such as (2.24), by regarding it as real. Thus, they develop a theory of intransitive time preferences. At time 0, a decision maker has a binary relationship, , on the set = X [0; 1), where X is a non-empty set. Let x; y 2 X and s; t 2 [0; 1), then (x; s) (y; t) is to be interpreted as 'y received at time t is (weakly) preferred to x received at time s'.
Let and be the symmetric and asymmetric parts of , respectively. For each t 2 [0; 1), t is the t-th time projection of onto X, i.e., x t y, if, and only if, (x; t) (y; t). In particular, 0 is the projection of onto X at time 0 (and, similarly, for t and t ).
If X is a metric space, then further structure can be imposed on . In particular (OM, p218): De…nition 14 (time preferences): Let X be a metric space, then is a time preference on if (i) is complete and continuous, (ii) 0 is complete and transitive, (iii) t = 0 for each t 2 [0; 1).
In De…nition 14, note that transitivity is imposed on 0 (and, hence, also on t ) but transitivity is not imposed on . Hence, neither nor are, necessarily, transitive. In particular, is not, in general, an equivalence relationship.
Let R be the set of real numbers, R + the set of non-negative reals and R ++ the set of positive reals. Recall that a homeomorphism is a mapping that is 1-1, onto, continuous and its inverse is also continuous. Then Suppose s t. Then (5.25) says that y received at time t is (weakly) preferred to x received at time s if, and only if, the (undated) utility of x is less or equal to the (undated) utility of y discounted from time t back to time s by the discount factor D (s; t). In this case, part (i) of Proposition 34 implies the following. Fix the time, s, at which x is received. Let the time, t, at which y is received, recede into the future. Then the value of the utility of y, discounted back to time s, decreases. In the limit, as the receipt of y is inde…nitely postponed, the value of its utility, discounted back to time s, approaches zero. Part (ii) of Proposition 34 says that compounding forward, from time s to time t, is the inverse of discounting backwards from time t to time s.
For each r 2 [0; 1), let r be the restriction of to X [r; 1), i.e., to times t r. Thus, for r s and r t, (x; s) r (y; t) if, and only if, (x; s) (y; t).
We can now point to the main di¤erences between RT and OM. First, note that U in Proposition 34 can take only positive values while v in (2.4)-(2.5) takes both positive and negative values. 31 To bypass this problem, we consider only the domain of strictly positive gains. Let w 0 be the reference point for wealth. Take X = fw w 0 : w > w 0 g = (0; 1) and let satisfy the conditions of Proposition 34. Let (U; D) be the representation of guaranteed by that Proposition. 32 From subsection 2.3 recall that, for each r 2 [0; 1), w 0 ;r is a complete transitive order on ( 1; 1) [r; 1) and, hence, also on X [r; 1). The second point we wish to make is that, in general, w 0 ;r , unlike r , is not the restriction to X [r; 1) of some complete binary relationship on X [0; 1). Thus OM and RT are not compatible and neither is a special case of the other.
Third, w 0 ;r is transitive while, in general, r is not transitive. To elaborate this point, consider (x; r), (y; s) and (z; t), where x; y; z 2 X and s; t 2 [r; 1). Suppose (x; r) w 0 ;r (y; s) and (y; s) w 0 ;r (z; t). Since w 0 ;r is transitive, we can conclude that (x; r) w 0 ;r (z; t). Now, suppose that (x; r) r (y; s) and (y; s) r (z; t). Since r is not, in general, transitive, we cannot conclude that (x; r) r (z; t). 33 More generally, given a compact subset C X [r; 1), there is no guarantee in OM that it has a maximum under r (i.e., an m 2 C such that c r m for all c 2 C). This, obviously, will cause great di¢ culty for any economic theory formulated in the OM framework. On the other hand, in RT theory, and if D is continuous, C will always have a maximum under w 0 ;r .
Fourth, and …nally, these problems with OM can all be resolved in the special case where is transitive. But then would also be additive. In this case, OM would reduce to the standard discounting model. 31 Hence, in its present formulation, OM cannot explain gain-loss asymmetry. However, it can explain the magnitude e¤ ect using the SIE value function (4.6). 32 Two alternatives are possible. The …rst is to extend U to a function on ( 1; 1) as follows. De…ne U (0) = 0 and, for x < 0, U (x) = U ( x). However, because of the separability assumption in OM, D = D + . The theory would then not be able to explain gain-loss asymmetry. The second alternative is to work with two representations: (U + ; D + ) for gains and (U ; D ) for losses. 33 As OM clearly explain, it is for this reason that we should think of D (s; t) in their theory as the relative discount function between times s and t.

Summary and conclusions
The exponential discounting model is known to be subject to a range of anomalies. We try to explain the 6 most important ones in one uni…ed model. These anomalies are gain-loss asymmetry, magnitude e¤ect, common di¤erence e¤ect, delay-speedup asymmetry, nonadditivity of time discounting and apparent intransitivity of time preferences. Furthermore, we show how recent work on intertemporal choice can be incorporated within our model. Our uni…ed model builds on the seminal works of Lowenstein and Prelec (1992) (LP); Phelps and Pollak (1968) and Laibson (1997) (PPL); Roelofsma and Read (2000), Read (2001) and Scholten and Read (2006a) (RRS). We follow LP in taking prospect theory Tversky, 1979 andKahneman, 1992) as our underlying decision theory. However, we depart from LP in having a reference point for time as well as a reference point for wealth. We also allow for di¤erent discount functions for gains and losses. We call our uni…ed theory, the reference-time theory of intertemporal choice, RT, for short (section 2).
LP showed that a value function that can explain the magnitude e¤ect must exhibit increasing elasticity. We show that this is incompatible with several popular classes of value functions including CARA (constant absolute risk aversion), CRRA (constant relative risk aversion), HARA (hyperbolic absolute risk aversion), logarithmic and quadratic. We develop a scheme for generating value functions that exhibit increasing elasticity, as required to explain the magnitude e¤ect. We call the simplest class that has this property the class of simple increasing elasticity value functions (SIE). Each member of this class is formed by a product of a HARA function and a CRRA function and, therefore, is quite tractable (sections 3 and 4).
LP explained the gain-loss asymmetry (also known as the sign e¤ect) by assuming di¤erent elasticities of the value function for gains and losses. We show that this implies a variable coe¢ cient of loss aversion, in particular, a coe¢ cient of loss aversion that is increasing with time (Example 4, section 4). We depart from LP in assuming constant loss aversion, which is consistent with the evidence. We explain gain loss asymmetry by assuming di¤erent discount functions for gains and losses.
LP provided an axiomatic derivation of their generalized hyperbolic discount function (which we called the LP-discount function). For this, they added the extra assumption of linear delay to that of the common di¤erence e¤ect (also known as preference reversal or the delay e¤ect). While there is considerable empirical evidence for the common di¤erence e¤ect, the assumption of linear delay is added purely for convenience. We extended their work as follows. At the most general level, which requires neither linear delay nor the common di¤erence e¤ect, we established our Representation Theorem 2 (Proposition 12). Given an arbitrary delay function (De…nition 7), Representation Theorem 2 characterizes all continuous discount functions with that delay function. We introduced a weaker notion of subadditivity, which we called -subadditivity (De…nition 12). According to our Characterization Theorem 4 (Proposition 21), preferences exhibit the common di¤erence e¤ect if, and only if, -subadditivity holds. We also introduce a generalization of the concept of linear delay of LP. We called this -delay. Our Proposition 23 then showed that -delay implies the common di¤erence e¤ect. Using -delay, we derived the RS-discount function as an f -extension for a suitable function, f (Proposition 25). On the other hand, imposing additivity, as well as -delay, gives our Proposition 24. The special case of the latter with = 1 gives the LP-discount function. In particular, as with RRS, we can explain the common di¤erence e¤ect as due to either declining impatience, subadditivity or a combination of both. However, our approach is more general because RT can also explain the common di¤erence e¤ect as due to the presence of a small amount of irremovable uncertainty, as in Halevy (2007), or as a consequence of multiple decision criteria, as in Manzini and Mariotti (2006) (see section 5). Thus, RT can accommodate all the known explanations of the common di¤erence e¤ect. In the spirit of RRS, we leave it to the empirical evidence to select the correct explanation. We also showed that delay-speedup asymmetry follows from our other assumptions (Proposition 30).
We showed how the RT theory can be extended to incorporate the attribute model of Scholten and Read (2006b) (section 5). Also in section 5, we compared and contrasted RT theory with Ok and Masatlioglu's (2007) model of intransitive time preferences. We showed that neither is a special case of the other. However, of the two, we believe that the RT theory of this paper is the more tractable theory.
Work over the last two decades has shown the importance of each of the following elements in explaining intertemporal choice: (1) Prospect theory and hyperbolic discounting, as in Lowenstein and Prelec (1992). (2) The interval discount function, as in Scholten and Read (2006a). (3) Multiple criteria, on the lines of Manzini and Mariotti (2006). (4) Uncertainty, as in Halevy (2007). We hope that this paper has shown how to incorporate all these elements, along with a reference point for time and an SIE value function, into a coherent and tractable model, which we called reference-time theory (RT).

Appendix: Proofs
The proofs of Propositions 26 and 27 are essentially adaptations of those in Loewenstein and Prelec (1992) to the model of this paper.
Proof of Proposition 1: Let r 2 [0; 1) and t 2 [r; 1). Let ft n g 1 n=1 be a sequence in [r; 1) converging to t. We want to show that fD (r; t n )g 1 n=1 converges to D (r; t). It is su¢ cient to show that any monotone subsequence of fD (r; t n )g 1 n=1 converges to D (r; t). In particular, let fD (r; t n i )g 1 i=1 be a decreasing subsequence of fD (r; t n )g 1 n=1 .
Since fD (r; t n i )g 1 i=1 is bounded below by D (r; t), it must converge to, say, q, where D (r; t) q D (r; t n i ), for all i. Since D is onto, there is a p 2 [r; 1) such that D (r; p) = q. Moreover, t n i p t, for each i. Suppose D (r; t) < q. Then t n i < p, for each i. Hence also t n i < t, for each i. But this cannot be, since ft n i g 1 i=1 , being a subsequence of the convergent sequence ft n g 1 n=1 , must also converge to the same limit, t. Hence, D (r; t) = q. Hence, fD (r; t n i )g 1 i=1 converges to D (r; t). Similarly, we can show that any increasing subsequence of fD (r; t n )g 1 n=1 converges to D (r; t). Hence, fD (r; t n )g 1 n=1 converges to D (r; t). Hence, D (r; t) is continuous in t.
To facilitate the proof of Propositions 8, below, and 22, later, we …rst establish Lemmas 1 and 2.
36 It is su¢ cient that h be strictly decreasing in some interval: (a; a + ), a 0; > 0.